Domain fusion analysis by applying relational algebra to protein sequence and domain databases
Truong, Kevin; Ikura, Mitsuhiko
2003-01-01
Background Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. Results This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at . Conclusion As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time. PMID:12734020
Upadhyay, Atul Kumar; Sowdhamini, Ramanathan
2016-01-01
3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.
Domain fusion analysis by applying relational algebra to protein sequence and domain databases.
Truong, Kevin; Ikura, Mitsuhiko
2003-05-06
Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.
Giudicelli, Véronique; Duroux, Patrice; Kossida, Sofia; Lefranc, Marie-Paule
2017-06-26
IMGT®, the international ImMunoGeneTics information system® ( http://www.imgt.org ), was created in 1989 in Montpellier, France (CNRS and Montpellier University) to manage the huge and complex diversity of the antigen receptors, and is at the origin of immunoinformatics, a science at the interface between immunogenetics and bioinformatics. Immunoglobulins (IG) or antibodies and T cell receptors (TR) are managed and described in the IMGT® databases and tools at the level of receptor, chain and domain. The analysis of the IG and TR variable (V) domain rearranged nucleotide sequences is performed by IMGT/V-QUEST (online since 1997, 50 sequences per batch) and, for next generation sequencing (NGS), by IMGT/HighV-QUEST, the high throughput version of IMGT/V-QUEST (portal begun in 2010, 500,000 sequences per batch). In vitro combinatorial libraries of engineered antibody single chain Fragment variable (scFv) which mimic the in vivo natural diversity of the immune adaptive responses are extensively screened for the discovery of novel antigen binding specificities. However the analysis of NGS full length scFv (~850 bp) represents a challenge as they contain two V domains connected by a linker and there is no tool for the analysis of two V domains in a single chain. The functionality "Analyis of single chain Fragment variable (scFv)" has been implemented in IMGT/V-QUEST and, for NGS, in IMGT/HighV-QUEST for the analysis of the two V domains of IG and TR scFv. It proceeds in five steps: search for a first closest V-REGION, full characterization of the first V-(D)-J-REGION, then search for a second V-REGION and full characterization of the second V-(D)-J-REGION, and finally linker delimitation. For each sequence or NGS read, positions of the 5'V-DOMAIN, linker and 3'V-DOMAIN in the scFv are provided in the 'V-orientated' sense. Each V-DOMAIN is fully characterized (gene identification, sequence description, junction analysis, characterization of mutations and amino changes). The functionality is generic and can analyse any IG or TR single chain nucleotide sequence containing two V domains, provided that the corresponding species IMGT reference directory is available. The "Analysis of single chain Fragment variable (scFv)" implemented in IMGT/V-QUEST and, for NGS, in IMGT/HighV-QUEST provides the identification and full characterization of the two V domains of full-length scFv (~850 bp) nucleotide sequences from combinatorial libraries. The analysis can also be performed on concatenated paired chains of expressed antigen receptor IG or TR repertoires.
Predicting PDZ domain mediated protein interactions from structure
2013-01-01
Background PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors. Results We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training–testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling. Conclusions We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on training–testing domain sequence similarity. Using both predictors, we defined a functional map of human PDZ domain biology and predict novel PDZ domain function. Users may access our structure-based and previous sequence-based predictors at http://webservice.baderlab.org/domains/POW. PMID:23336252
Using ProMED-Mail and MedWorm blogs for cross-domain pattern analysis in epidemic intelligence.
Stewart, Avaré; Denecke, Kerstin
2010-01-01
In this work we motivate the use of medical blog user generated content for gathering facts about disease reporting events to support biosurveillance investigation. Given the characteristics of blogs, the extraction of such events is made more difficult due to noise and data abundance. We address the problem of automatically inferring disease reporting event extraction patterns in this more noisy setting. The sublanguage used in outbreak reports is exploited to align with the sequences of disease reporting sentences in blogs. Based our Cross Domain Pattern Analysis Framework, experimental results show that Phase-Level sequences tend to produce more overlap across the domains than Word-Level sequences. The cross domain alignment process is effective at filtering noisy sequences from blogs and extracting good candidate sequence patterns from an abundance of text.
Analysis of sequence repeats of proteins in the PDB.
Mary Rajathei, David; Selvaraj, Samuel
2013-12-01
Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.
A domain-centric solution to functional genomics via dcGO Predictor
2013-01-01
Background Computational/manual annotations of protein functions are one of the first routes to making sense of a newly sequenced genome. Protein domain predictions form an essential part of this annotation process. This is due to the natural modularity of proteins with domains as structural, evolutionary and functional units. Sometimes two, three, or more adjacent domains (called supra-domains) are the operational unit responsible for a function, e.g. via a binding site at the interface. These supra-domains have contributed to functional diversification in higher organisms. Traditionally functional ontologies have been applied to individual proteins, rather than families of related domains and supra-domains. We expect, however, to some extent functional signals can be carried by protein domains and supra-domains, and consequently used in function prediction and functional genomics. Results Here we present a domain-centric Gene Ontology (dcGO) perspective. We generalize a framework for automatically inferring ontological terms associated with domains and supra-domains from full-length sequence annotations. This general framework has been applied specifically to primary protein-level annotations from UniProtKB-GOA, generating GO term associations with SCOP domains and supra-domains. The resulting 'dcGO Predictor', can be used to provide functional annotation to protein sequences. The functional annotation of sequences in the Critical Assessment of Function Annotation (CAFA) has been used as a valuable opportunity to validate our method and to be assessed by the community. The functional annotation of all completely sequenced genomes has demonstrated the potential for domain-centric GO enrichment analysis to yield functional insights into newly sequenced or yet-to-be-annotated genomes. This generalized framework we have presented has also been applied to other domain classifications such as InterPro and Pfam, and other ontologies such as mammalian phenotype and disease ontology. The dcGO and its predictor are available at http://supfam.org/SUPERFAMILY/dcGO including an enrichment analysis tool. Conclusions As functional units, domains offer a unique perspective on function prediction regardless of whether proteins are multi-domain or single-domain. The 'dcGO Predictor' holds great promise for contributing to a domain-centric functional understanding of genomes in the next generation sequencing era. PMID:23514627
Proteins with an Euonymus lectin-like domain are ubiquitous in Embryophyta
2009-01-01
Background Cloning of the Euonymus lectin led to the discovery of a novel domain that also occurs in some stress-induced plant proteins. The distribution and the diversity of proteins with an Euonymus lectin (EUL) domain were investigated using detailed analysis of sequences in publicly accessible genome and transcriptome databases. Results Comprehensive in silico analyses indicate that the recently identified Euonymus europaeus lectin domain represents a conserved structural unit of a novel family of putative carbohydrate-binding proteins, which will further be referred to as the Euonymus lectin (EUL) family. The EUL domain is widespread among plants. Analysis of retrieved sequences revealed that some sequences consist of a single EUL domain linked to an unrelated N-terminal domain whereas others comprise two in tandem arrayed EUL domains. A new classification system for these lectins is proposed based on the overall domain architecture. Evolutionary relationships among the sequences with EUL domains are discussed. Conclusion The identification of the EUL family provides the first evidence for the occurrence in terrestrial plants of a highly conserved plant specific domain. The widespread distribution of the EUL domain strikingly contrasts the more limited or even narrow distribution of most other lectin domains found in plants. The apparent omnipresence of the EUL domain is indicative for a universal role of this lectin domain in plants. Although there is unambiguous evidence that several EUL domains possess carbohydrate-binding activity further research is required to corroborate the carbohydrate-binding properties of different members of the EUL family. PMID:19930663
Phylogenetic analysis of the envelope protein (domain lll) of dengue 4 viruses
Mota, Javier; Ramos-Castañeda, José; Rico-Hesse, Rebeca; Ramos, Celso
2011-01-01
Objective To evaluate the genetic variability of domain III of envelope (E) protein and to estimate phylogenetic relationships of dengue 4 (Den-4) viruses isolated in Mexico and from other endemic areas of the world. Material and Methods A phylogenetic study of domain III of envelope (E) protein of Den-4 viruses was conducted in 1998 using virus strains from Mexico and other parts of the world, isolated in different years. Specific primers were used to amplify by RT-PCR the domain III and to obtain nucleotide sequence. Based on nucleotide and deduced aminoacid sequence, genetic variability was estimated and a phylogenetic tree was generated. To make an easy genetic analysis of domain III region, a Restriction Fragment Length Polymorphism (RFLP) assay was performed, using six restriction enzymes. Results Study results demonstrate that nucleotide and aminoacid sequence analysis of domain III are similar to those reported from the complete E protein gene. Based on the RFLP analysis of domain III using the restriction enzymes Nla III, Dde I and Cfo I, Den-4 viruses included in this study were clustered into genotypes 1 and 2 previously reported. Conclusions Study results suggest that domain III may be used as a genetic marker for phylogenetic and molecular epidemiology studies of dengue viruses. The English version of this paper is available too at: http://www.insp.mx/salud/index.html PMID:12132320
Lang, Tiange; Yin, Kangquan; Liu, Jinyu; Cao, Kunfang; Cannon, Charles H; Du, Fang K
2014-01-01
Predicting protein domains is essential for understanding a protein's function at the molecular level. However, up till now, there has been no direct and straightforward method for predicting protein domains in species without a reference genome sequence. In this study, we developed a functionality with a set of programs that can predict protein domains directly from genomic sequence data without a reference genome. Using whole genome sequence data, the programming functionality mainly comprised DNA assembly in combination with next-generation sequencing (NGS) assembly methods and traditional methods, peptide prediction and protein domain prediction. The proposed new functionality avoids problems associated with de novo assembly due to micro reads and small single repeats. Furthermore, we applied our functionality for the prediction of leucine rich repeat (LRR) domains in four species of Ficus with no reference genome, based on NGS genomic data. We found that the LRRNT_2 and LRR_8 domains are related to plant transpiration efficiency, as indicated by the stomata index, in the four species of Ficus. The programming functionality established in this study provides new insights for protein domain prediction, which is particularly timely in the current age of NGS data expansion.
Sequence Alignment to Predict Across Species Susceptibility ...
Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target, so it is amenable to variable degrees of protein characterization, depending on available information about the chemical/protein interaction and the molecular target itself. To allow for flexibility in the analysis, a layered strategy was adopted for the tool. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of candidate orthologs), the second level evaluates sequence similarity within selected domains (e.g., ligand-binding domain, DNA binding domain), and the third level of analysis compares individual amino acid residue positions identified as being of importance for protein conformation and/or ligand binding upon chemical perturbation. Each level of the SeqAPASS analysis provides increasing evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further ev
DOE R&D Accomplishments Database
Chandonia, John-Marc; Hon, Gary; Walker, Nigel S.; Lo Conte, Loredana; Koehl, Patrice; Levitt, Michael; Brenner, Steven E.
2003-09-15
The ASTRAL compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. Partially derived from the SCOP database of protein structure domains, it includes sequences for each domain and other resources useful for studying these sequences and domain structures. The current release of ASTRAL contains 54,745 domains, more than three times as many as the initial release four years ago. ASTRAL has undergone major transformations in the past two years. In addition to several complete updates each year, ASTRAL is now updated on a weekly basis with preliminary classifications of domains from newly released PDB structures. These classifications are available as a stand-alone database, as well as available integrated into other ASTRAL databases such as representative subsets. To enhance the utility of ASTRAL to structural biologists, all SCOP domains are now made available as PDB-style coordinate files as well as sequences. In addition to sequences and representative subsets based on SCOP domains, sequences and subsets based on PDB chains are newly included in ASTRAL. Several search tools have been added to ASTRAL to facilitate retrieval of data by individual users and automated methods.
Fanali, Gabriella; Ascenzi, Paolo; Bernardi, Giorgio; Fasano, Mauro
2012-01-01
Serum albumin (SA) is a circulating protein providing a depot and carrier for many endogenous and exogenous compounds. At least seven major binding sites have been identified by structural and functional investigations mainly in human SA. SA is conserved in vertebrates, with at least 49 entries in protein sequence databases. The multiple sequence analysis of this set of entries leads to the definition of a cladistic tree for the molecular evolution of SA orthologs in vertebrates, thus showing the clustering of the considered species, with lamprey SAs (Lethenteron japonicum and Petromyzon marinus) in a separate outgroup. Sequence analysis aimed at searching conserved domains revealed that most SA sequences are made up by three repeated domains (about 600 residues), as extensively characterized for human SA. On the contrary, lamprey SAs are giant proteins (about 1400 residues) comprising seven repeated domains. The phylogenetic analysis of the SA family reveals a stringent correlation with the taxonomic classification of the species available in sequence databases. A focused inspection of the sequences of ligand binding sites in SA revealed that in all sites most residues involved in ligand binding are conserved, although the versatility towards different ligands could be peculiar of higher organisms. Moreover, the analysis of molecular links between the different sites suggests that allosteric modulation mechanisms could be restricted to higher vertebrates.
Joint Sequence Analysis: Association and Clustering
ERIC Educational Resources Information Center
Piccarreta, Raffaella
2017-01-01
In its standard formulation, sequence analysis aims at finding typical patterns in a set of life courses represented as sequences. Recently, some proposals have been introduced to jointly analyze sequences defined on different domains (e.g., work career, partnership, and parental histories). We introduce measures to evaluate whether a set of…
König, Caroline; Alquézar, René; Vellido, Alfredo; Giraldo, Jesús
2018-03-01
G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.
Lerner, D R; Raikhel, N V
1992-06-05
Chitin-binding proteins are present in a wide range of plant species, including both monocots and dicots, even though these plants contain no chitin. To investigate the relationship between in vitro antifungal and insecticidal activities of chitin-binding proteins and their unknown endogenous functions, the stinging nettle lectin (Urtica dioica agglutinin, UDA) cDNA was cloned using a synthetic gene as the probe. The nettle lectin cDNA clone contained an open reading frame encoding 374 amino acids. Analysis of the deduced amino acid sequence revealed a 21-amino acid putative signal sequence and the 86 amino acids encoding the two chitin-binding domains of nettle lectin. These domains were fused to a 19-amino acid "spacer" domain and a 244-amino acid carboxyl extension with partial identity to a chitinase catalytic domain. The authenticity of the cDNA clone was confirmed by deduced amino acid sequence identity with sequence data obtained from tryptic digests, RNA gel blot, and polymerase chain reaction analyses. RNA gel blot analysis also showed the nettle lectin message was present primarily in rhizomes and inflorescence (with immature seeds) but not in leaves or stems. Chitinase enzymatic activity was found when the chitinase-like domain alone or the chitinase-like domain with the chitin-binding domains were expressed in Escherichia coli. This is the first example of a chitin-binding protein with both a duplication of the 43-amino acid chitin-binding domain and a fusion of the chitin-binding domains to a structurally unrelated domain, the chitinase domain.
Beintema, J J; Peumans, W J
1992-03-09
The primary structure of stinging nettle (Urtica dioica) agglutinin has been determined by sequence analysis of peptides obtained from three overlapping proteolytic digests. The sequence of 80 residues consists of two hevein-like domains with the same spacing of half-cystine residues and several other conserved residues as observed earlier in other proteins with hevein-like domains. The hinge region between the two domains is four residues longer than those between the four domains in cereal lectins like wheat germ agglutinin.
2014-01-01
Background Bacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism. Results BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications. Conclusions Structural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively. PMID:24742328
Identification and analysis of mutational hotspots in oncogenes and tumour suppressors.
Baeissa, Hanadi; Benstead-Hume, Graeme; Richardson, Christopher J; Pearl, Frances M G
2017-03-28
The key to interpreting the contribution of a disease-associated mutation in the development and progression of cancer is an understanding of the consequences of that mutation both on the function of the affected protein and on the pathways in which that protein is involved. Protein domains encapsulate function and position-specific domain based analysis of mutations have been shown to help elucidate their phenotypes. In this paper we examine the domain biases in oncogenes and tumour suppressors, and find that their domain compositions substantially differ. Using data from over 30 different cancers from whole-exome sequencing cancer genomic projects we mapped over one million mutations to their respective Pfam domains to identify which domains are enriched in any of three different classes of mutation; missense, indels or truncations. Next, we identified the mutational hotspots within domain families by mapping small mutations to equivalent positions in multiple sequence alignments of protein domainsWe find that gain of function mutations from oncogenes and loss of function mutations from tumour suppressors are normally found in different domain families and when observed in the same domain families, hotspot mutations are located at different positions within the multiple sequence alignment of the domain. By considering hotspots in tumour suppressors and oncogenes independently, we find that there are different specific positions within domain families that are particularly suited to accommodate either a loss or a gain of function mutation. The position is also dependent on the class of mutation.We find rare mutations co-located with well-known functional mutation hotspots, in members of homologous domain superfamilies, and we detect novel mutation hotspots in domain families previously unconnected with cancer. The results of this analysis can be accessed through the MOKCa database (http://strubiol.icr.ac.uk/extra/MOKCa).
Statistical analysis of life history calendar data.
Eerola, Mervi; Helske, Satu
2016-04-01
The life history calendar is a data-collection tool for obtaining reliable retrospective data about life events. To illustrate the analysis of such data, we compare the model-based probabilistic event history analysis and the model-free data mining method, sequence analysis. In event history analysis, we estimate instead of transition hazards the cumulative prediction probabilities of life events in the entire trajectory. In sequence analysis, we compare several dissimilarity metrics and contrast data-driven and user-defined substitution costs. As an example, we study young adults' transition to adulthood as a sequence of events in three life domains. The events define the multistate event history model and the parallel life domains in multidimensional sequence analysis. The relationship between life trajectories and excess depressive symptoms in middle age is further studied by their joint prediction in the multistate model and by regressing the symptom scores on individual-specific cluster indices. The two approaches complement each other in life course analysis; sequence analysis can effectively find typical and atypical life patterns while event history analysis is needed for causal inquiries. © The Author(s) 2012.
Dong, Zheng; Zhou, Hongyu; Tao, Peng
2018-02-01
PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.
Identification of two allelic IgG1 C(H) coding regions (Cgamma1) of cat.
Kanai, T H; Ueda, S; Nakamura, T
2000-01-31
Two types of cDNA encoding IgG1 heavy chain (gamma1) were isolated from a single domestic short-hair cat. Sequence analysis indicated a higher level of similarity of these Cgamma1 sequences to human Cgamma1 sequence (76.9 and 77.0%) than to mouse sequence (70.0 and 69.7%) at the nucleotide level. Predicted primary structures of both the feline Cgamma1 genes, designated as Cgamma1a and Cgamma1b, were similar to that of human Cgamma1 gene, for instance, as to the size of constant domains, the presence of six conserved cysteine residues involved in formation of the domain structure, and the location of a conserved N-linked glycosylation site. Sequence comparison between the two alleles showed that 7 out of 10 nucleotide differences were within the C(H)3 domain coding region, all leading to nonsynonymous changes in amino acid residues. Partial sequence analysis of genomic clones showed three nucleotide substitutions between the two Cgamma1 alleles in the intron between the CH2 and C(H)3 domain coding regions. In 12 domestic short-hair cats used in this study, the frequency of Cgamma1a allele (62.5%) was higher than that of the Cgamma1b allele (37.5%).
Sequence Analysis and Domain Motifs in the Porcine Skin Decorin Glycosaminoglycan Chain*
Zhao, Xue; Yang, Bo; Solakylidirim, Kemal; Joo, Eun Ji; Toida, Toshihiko; Higashi, Kyohei; Linhardt, Robert J.; Li, Lingyun
2013-01-01
Decorin proteoglycan is comprised of a core protein containing a single O-linked dermatan sulfate/chondroitin sulfate glycosaminoglycan (GAG) chain. Although the sequence of the decorin core protein is determined by the gene encoding its structure, the structure of its GAG chain is determined in the Golgi. The recent application of modern MS to bikunin, a far simpler chondroitin sulfate proteoglycans, suggests that it has a single or small number of defined sequences. On this basis, a similar approach to sequence the decorin of porcine skin much larger and more structurally complex dermatan sulfate/chondroitin sulfate GAG chain was undertaken. This approach resulted in information on the consistency/variability of its linkage region at the reducing end of the GAG chain, its iduronic acid-rich domain, glucuronic acid-rich domain, and non-reducing end. A general motif for the porcine skin decorin GAG chain was established. A single small decorin GAG chain was sequenced using MS/MS analysis. The data obtained in the study suggest that the decorin GAG chain has a small or a limited number of sequences. PMID:23423381
Bhasi, Ashwini; Philip, Philge; Manikandan, Vinu; Senapathy, Periannan
2009-01-01
We have developed ExDom, a unique database for the comparative analysis of the exon–intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon–intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon–intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon–intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/. PMID:18984624
Hemalatha, G. R.; Rao, D. Satyanarayana; Guruprasad, L.
2007-01-01
We have identified four repeats and ten domains that are novel in proteins encoded by the Bacillus anthracis str. Ames proteome using automated in silico methods. A “repeat” corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and sometimes present in tandem. A “domain” corresponds to a conserved region with greater than 55-amino-acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1) 57-amino-acid-residue PxV domain, (2) 122-amino-acid-residue FxF domain, (3) 111-amino-acid-residue YEFF domain, (4) 109-amino-acid-residue IMxxH domain, (5) 103-amino-acid-residue VxxT domain, (6) 84-amino-acid-residue ExW domain, (7) 104-amino-acid-residue NTGFIG domain, (8) 36-amino-acid-residue NxGK repeat, (9) 95-amino-acid-residue VYV domain, (10) 75-amino-acid-residue KEWE domain, (11) 59-amino-acid-residue AFL domain, (12) 53-amino-acid-residue RIDVK repeat, (13) (a) 41-amino-acid-residue AGQF repeat and (b) 42-amino-acid-residue GSAL repeat. A repeat or domain type is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure. PMID:17538688
Putaporntip, Chaturong; Thongaree, Siriporn; Jongwutiwes, Somchai
2013-08-01
To determine the genetic diversity and potential transmission routes of Plasmodium knowlesi, we analyzed the complete nucleotide sequence of the gene encoding the merozoite surface protein-1 of this simian malaria (Pkmsp-1), an asexual blood-stage vaccine candidate, from naturally infected humans and macaques in Thailand. Analysis of Pkmsp-1 sequences from humans (n=12) and monkeys (n=12) reveals five conserved and four variable domains. Most nucleotide substitutions in conserved domains were dimorphic whereas three of four variable domains contained complex repeats with extensive sequence and size variation. Besides purifying selection in conserved domains, evidence of intragenic recombination scattering across Pkmsp-1 was detected. The number of haplotypes, haplotype diversity, nucleotide diversity and recombination sites of human-derived sequences exceeded that of monkey-derived sequences. Phylogenetic networks based on concatenated conserved sequences of Pkmsp-1 displayed a character pattern that could have arisen from sampling process or the presence of two independent routes of P. knowlesi transmission, i.e. from macaques to human and from human to humans in Thailand. Copyright © 2013 Elsevier B.V. All rights reserved.
Badaut, Cyril; Bertin, Gwladys; Rustico, Tatiana; Fievet, Nadine; Massougbodji, Achille; Gaye, Alioune; Deloron, Philippe
2010-01-01
Background Placental malaria is a disease linked to the sequestration of Plasmodium falciparum infected red blood cells (IRBC) in the placenta, leading to reduced materno-fetal exchanges and to local inflammation. One of the virulence factors of P. falciparum involved in cytoadherence to chondroitin sulfate A, its placental receptor, is the adhesive protein VAR2CSA. Its localisation on the surface of IRBC makes it accessible to the immune system. VAR2CSA contains six DBL domains. The DBL6ε domain is the most variable. High variability constitutes a means for the parasite to evade the host immune response. The DBL6ε domain could constitute a very attractive basis for a vaccine candidate but its reported variability necessitates, for antigenic characterisations, identifying and classifying commonalities across isolates. Methodology/Principal Findings Local alignment analysis of the DBL6ε domain had revealed that it is not as variable as previously described. Variability is concentrated in seven regions present on the surface of the DBL6ε domain. The main goal of our work is to classify and group variable sequences that will simplify further research to determine dominant epitopes. Firstly, variable sequences were grouped following their average percent pairwise identity (APPI). Groups comprising many variable sequences sharing low variability were found. Secondly, ELISA experiments following the IgG recognition of a recombinant DBL6ε domain, and of peptides mimicking its seven variable blocks, allowed to determine an APPI cut-off and to isolate groups represented by a single consensus sequence. Conclusions/Significance A new sequence approach is used to compare variable regions in sequences that have extensive segmental gene relationship. Using this approach, the VAR2CSA DBL6 domain is composed of 7 variable blocks with limited polymorphism. Each variable block is composed of a limited number of consensus types. Based on peptide based ELISA, variable blocks with 85% or greater sequence identity are expected to be recognized equally well by antibody and can be considered the same consensus type. Therefore, the analysis of the antibody response against the classified small number of sequences should be helpful to determine epitopes. PMID:20585655
Panayotou, G; Bax, B; Gout, I; Federwisch, M; Wroblowski, B; Dhand, R; Fry, M J; Blundell, T L; Wollmer, A; Waterfield, M D
1992-01-01
Circular dichroism and fluorescence spectroscopy were used to investigate the structure of the p85 alpha subunit of the PI 3-kinase, a closely related p85 beta protein, and a recombinant SH2 domain-containing fragment of p85 alpha. Significant spectral changes, indicative of a conformational change, were observed on formation of a complex with a 17 residue peptide containing a phosphorylated tyrosine residue. The sequence of this peptide is identical to the sequence surrounding Tyr751 in the kinase-insert region of the platelet-derived growth factor beta-receptor (beta PDGFR). The rotational correlation times measured by fluorescence anisotropy decay indicated that phosphopeptide binding changed the shape of the SH2 domain-containing fragment. The CD and fluorescence spectroscopy data support the secondary structure prediction based on sequence analysis and provide evidence for flexible linker regions between the various domains of the p85 proteins. The significance of these results for SH2 domain-containing proteins is discussed. Images PMID:1330535
Hernández Torres, Jorge; Papandreou, Nikolaos; Chomilier, Jacques
2009-05-01
The co-chaperone Hop [heat shock protein (HSP) organising protein] is known to bind both Hsp70 and Hsp90. Hop comprises three repeats of a tetratricopeptide repeat (TPR) domain, each consisting of three TPR motifs. The first and last TPR domains are followed by a domain containing several dipeptide (DP) repeats called the DP domain. These analyses suggest that the hop genes result from successive recombination events of an ancestral TPR-DP module. From a hydrophobic cluster analysis of homologous Hop protein sequences derived from gene families, we can postulate that shifts in the open reading frames are at the origin of the present sequences. Moreover, these shifts can be related to the presence or absence of biological function. We propose to extend the family of Hop co-chaperons into the kingdom of bacteria, as several structurally related genes have been identified by hydrophobic cluster analysis. We also provide evidence of common structural characteristics between hop and hip genes, suggesting a shared precursor of ancestral TPR-DP domains.
Comprehensive analysis of orthologous protein domains using the HOPS database.
Storm, Christian E V; Sonnhammer, Erik L L
2003-10-01
One of the most reliable methods for protein function annotation is to transfer experimentally known functions from orthologous proteins in other organisms. Most methods for identifying orthologs operate on a subset of organisms with a completely sequenced genome, and treat proteins as single-domain units. However, it is well known that proteins are often made up of several independent domains, and there is a wealth of protein sequences from genomes that are not completely sequenced. A comprehensive set of protein domain families is found in the Pfam database. We wanted to apply orthology detection to Pfam families, but first some issues needed to be addressed. First, orthology detection becomes impractical and unreliable when too many species are included. Second, shorter domains contain less information. It is therefore important to assess the quality of the orthology assignment and avoid very short domains altogether. We present a database of orthologous protein domains in Pfam called HOPS: Hierarchical grouping of Orthologous and Paralogous Sequences. Orthology is inferred in a hierarchic system of phylogenetic subgroups using ortholog bootstrapping. To avoid the frequent errors stemming from horizontally transferred genes in bacteria, the analysis is presently limited to eukaryotic genes. The results are accessible in the graphical browser NIFAS, a Java tool originally developed for analyzing phylogenetic relations within Pfam families. The method was tested on a set of curated orthologs with experimentally verified function. In comparison to tree reconciliation with a complete species tree, our approach finds significantly more orthologs in the test set. Examples for investigating gene fusions and domain recombination using HOPS are given.
Celik, Nermin; Webb, Chaille T.; Leyton, Denisse L.; Holt, Kathryn E.; Heinz, Eva; Gorrell, Rebecca; Kwok, Terry; Naderer, Thomas; Strugnell, Richard A.; Speed, Terence P.; Teasdale, Rohan D.; Likić, Vladimir A.; Lithgow, Trevor
2012-01-01
Autotransporters are secreted proteins that are assembled into the outer membrane of bacterial cells. The passenger domains of autotransporters are crucial for bacterial pathogenesis, with some remaining attached to the bacterial surface while others are released by proteolysis. An enigma remains as to whether autotransporters should be considered a class of secretion system, or simply a class of substrate with peculiar requirements for their secretion. We sought to establish a sensitive search protocol that could identify and characterize diverse autotransporters from bacterial genome sequence data. The new sequence analysis pipeline identified more than 1500 autotransporter sequences from diverse bacteria, including numerous species of Chlamydiales and Fusobacteria as well as all classes of Proteobacteria. Interrogation of the proteins revealed that there are numerous classes of passenger domains beyond the known proteases, adhesins and esterases. In addition the barrel-domain-a characteristic feature of autotransporters-was found to be composed from seven conserved sequence segments that can be arranged in multiple ways in the tertiary structure of the assembled autotransporter. One of these conserved motifs overlays the targeting information required for autotransporters to reach the outer membrane. Another conserved and diagnostic motif maps to the linker region between the passenger domain and barrel-domain, indicating it as an important feature in the assembly of autotransporters. PMID:22905239
MDC9, a widely expressed cellular disintegrin containing cytoplasmic SH3 ligand domains
1996-01-01
Cellular disintegrins are a family of proteins that are related to snake venom integrin ligands and metalloproteases. We have cloned and sequenced the mouse and human homologue of a widely expressed cellular disintegrin, which we have termed MDC9 (for metalloprotease/disintegrin/cysteine-rich protein 9). The deduced mouse and human protein sequences are 82% identical. MDC9 contains several distinct protein domains: a signal sequence is followed by a prodomain and a domain with sequence similarity to snake venom metalloproteases, a disintegrin domain, a cysteine-rich region, an EGF repeat, a membrane anchor, and a cytoplasmic tail. The cytoplasmic tail of MDC9 has two proline-rich sequences which can bind the SH3 domain of Src, and may therefore function as SH3 ligand domains. Western blot analysis shows that MDC9 is an approximately 84-kD glycoprotein in all mouse tissues examined, and in NIH 3T3 fibroblast and C2C12 myoblast mouse cell lines. MDC9 can be both cell surface biotinylated and 125I-labeled in NIH 3T3 mouse fibroblasts, indicating that the protein is present on the plasma membrane. Expression of MDC9 in COS-7 cells yields an 84-kD protein, and immunofluorescence analysis of COS-7 cells expressing MDC9 shows a staining pattern that is consistent with a plasma membrane localization. The apparent molecular mass of 84 kD suggests that MDC9 contains a membrane-anchored metalloprotease and disintegrin domain. We propose that MDC9 might function as a membrane-anchored integrin ligand or metalloprotease, or that MDC9 may combine both activities in one protein. PMID:8647900
Cloning and characterization of a novel human STAR domain containing cDNA KHDRBS2.
Wang, Liu; Xu, Jian; Zeng, Li; Ye, Xin; Wu, Qihan; Dai, Jianfeng; Ji, Chaoneng; Gu, Shaohua; Zhao, Chunhua; Xie, Yi; Mao, Yumin
2002-12-01
KHDRBS2, KH domain containing, RNA binding, signal transduction associated 2, is an RNA-binding protein that is tyrosine phosphorylated by Src during mitosis. It contains a KH domain,which is embedded in a larger conserved domain called the STAR domain. This protein has a 99% sequence identity with rat SLM-1 (the Sam68-like mammalian protein 1) and 98% sequence identity with mouse SLM-1 in its STAR domain. KHDRBS2 has the characteristic Sam68 SH2 and SH3 domain binding sites. RT-PCR analysis showed its transcript is ubiquitously expressed. The characterization of KHDRBS2 indicates it may link tyrosine kinase signaling cascades with some aspect of RNA metabolism.
A family of cellular proteins related to snake venom disintegrins.
Weskamp, G; Blobel, C P
1994-03-29
Disintegrins are short soluble integrin ligands that were initially identified in snake venom. A previously recognized cellular protein with a disintegrin domain was the guinea pig sperm protein PH-30, a protein implicated in sperm-egg membrane binding and fusion. Here we present peptide sequences that are characteristic for several cellular disintegrin-domain proteins. These peptide sequences were deduced from cDNA sequence tags that were generated by polymerase chain reaction from various mouse tissue and a mouse muscle cell line. Northern blot analysis with four sequence tags revealed distinct mRNA expression patterns. Evidently, cellular proteins containing a disintegrin domain define a superfamily of potential integrin ligands that are likely to function in important cell-cell and cell-matrix interactions.
Improving pairwise comparison of protein sequences with domain co-occurrence
Gascuel, Olivier
2018-01-01
Comparing and aligning protein sequences is an essential task in bioinformatics. More specifically, local alignment tools like BLAST are widely used for identifying conserved protein sub-sequences, which likely correspond to protein domains or functional motifs. However, to limit the number of false positives, these tools are used with stringent sequence-similarity thresholds and hence can miss several hits, especially for species that are phylogenetically distant from reference organisms. A solution to this problem is then to integrate additional contextual information to the procedure. Here, we propose to use domain co-occurrence to increase the sensitivity of pairwise sequence comparisons. Domain co-occurrence is a strong feature of proteins, since most protein domains tend to appear with a limited number of other domains on the same protein. We propose a method to take this information into account in a typical BLAST analysis and to construct new domain families on the basis of these results. We used Plasmodium falciparum as a case study to evaluate our method. The experimental findings showed an increase of 14% of the number of significant BLAST hits and an increase of 25% of the proteome area that can be covered with a domain. Our method identified 2240 new domains for which, in most cases, no model of the Pfam database could be linked. Moreover, our study of the quality of the new domains in terms of alignment and physicochemical properties show that they are close to that of standard Pfam domains. Source code of the proposed approach and supplementary data are available at: https://gite.lirmm.fr/menichelli/pairwise-comparison-with-cooccurrence PMID:29293498
Oyola-Robles, Delise; Gay, Darren C; Trujillo, Uldaeliz; Sánchez-Parés, John M; Bermúdez, Mei-Ling; Rivera-Díaz, Mónica; Carballeira, Néstor M; Baerga-Ortiz, Abel
2013-07-01
Polyunsaturated fatty acids (PUFAs) are made in some strains of deep-sea bacteria by multidomain proteins that catalyze condensation, ketoreduction, dehydration, and enoyl-reduction. In this work, we have used the Udwary-Merski Algorithm sequence analysis tool to define the boundaries that enclose the dehydratase (DH) domains in a PUFA multienzyme. Sequence analysis revealed the presence of four areas of high structure in a region that was previously thought to contain only two DH domains as defined by FabA-homology. The expression of the protein fragment containing all four protein domains resulted in an active enzyme, while shorter protein fragments were not soluble. The tetradomain fragment was capable of catalyzing the conversion of crotonyl-CoA to β-hydroxybutyryl-CoA efficiently, as shown by UV absorbance change as well as by chromatographic retention of reaction products. Sequence alignments showed that the two novel domains contain as much sequence conservation as the FabA-homology domains, suggesting that they too may play a functional role in the overall reaction. Structure predictions revealed that all domains belong to the hotdog protein family: two of them contain the active site His70 residue present in FabA-like DHs, while the remaining two do not. Replacing the active site His residues in both FabA domains for Ala abolished the activity of the tetradomain fragment, indicating that the DH activity is contained within the FabA-homology regions. Taken together, these results provide a first glimpse into a rare arrangement of DH domains which constitute a defining feature of the PUFA synthases. Copyright © 2013 The Protein Society.
Oyola-Robles, Delise; Gay, Darren C; Trujillo, Uldaeliz; Sánchez-Parés, John M; Bermúdez, Mei-Ling; Rivera-Díaz, Mónica; Carballeira, Néstor M; Baerga-Ortiz, Abel
2013-01-01
Polyunsaturated fatty acids (PUFAs) are made in some strains of deep-sea bacteria by multidomain proteins that catalyze condensation, ketoreduction, dehydration, and enoyl-reduction. In this work, we have used the Udwary-Merski Algorithm sequence analysis tool to define the boundaries that enclose the dehydratase (DH) domains in a PUFA multienzyme. Sequence analysis revealed the presence of four areas of high structure in a region that was previously thought to contain only two DH domains as defined by FabA-homology. The expression of the protein fragment containing all four protein domains resulted in an active enzyme, while shorter protein fragments were not soluble. The tetradomain fragment was capable of catalyzing the conversion of crotonyl-CoA to β-hydroxybutyryl-CoA efficiently, as shown by UV absorbance change as well as by chromatographic retention of reaction products. Sequence alignments showed that the two novel domains contain as much sequence conservation as the FabA-homology domains, suggesting that they too may play a functional role in the overall reaction. Structure predictions revealed that all domains belong to the hotdog protein family: two of them contain the active site His70 residue present in FabA-like DHs, while the remaining two do not. Replacing the active site His residues in both FabA domains for Ala abolished the activity of the tetradomain fragment, indicating that the DH activity is contained within the FabA-homology regions. Taken together, these results provide a first glimpse into a rare arrangement of DH domains which constitute a defining feature of the PUFA synthases. PMID:23696301
d-Omix: a mixer of generic protein domain analysis tools.
Wichadakul, Duangdao; Numnark, Somrak; Ingsriswang, Supawadee
2009-07-01
Domain combination provides important clues to the roles of protein domains in protein function, interaction and evolution. We have developed a web server d-Omix (a Mixer of Protein Domain Analysis Tools) aiming as a unified platform to analyze, compare and visualize protein data sets in various aspects of protein domain combinations. With InterProScan files for protein sets of interest provided by users, the server incorporates four services for domain analyses. First, it constructs protein phylogenetic tree based on a distance matrix calculated from protein domain architectures (DAs), allowing the comparison with a sequence-based tree. Second, it calculates and visualizes the versatility, abundance and co-presence of protein domains via a domain graph. Third, it compares the similarity of proteins based on DA alignment. Fourth, it builds a putative protein network derived from domain-domain interactions from DOMINE. Users may select a variety of input data files and flexibly choose domain search tools (e.g. hmmpfam, superfamily) for a specific analysis. Results from the d-Omix could be interactively explored and exported into various formats such as SVG, JPG, BMP and CSV. Users with only protein sequences could prepare an InterProScan file using a service provided by the server as well. The d-Omix web server is freely available at http://www.biotec.or.th/isl/Domix.
Selvin, Joseph; Sathiyanarayanan, Ganesan; Lipton, Anuj N.; Al-Dhabi, Naif Abdullah; Valan Arasu, Mariadhas; Kiran, George S.
2016-01-01
The important biological macromolecules, such as lipopeptide and glycolipid biosurfactant producing marine actinobacteria were analyzed and their potential linkage between type II polyketide synthase (PKS) genes was explored. A unique feature of type II PKS genes is their high amino acid (AA) sequence homology and conserved gene organization. These enzymes mediate the biosynthesis of polyketide natural products with enormous structural complexity and chemical nature by combinatorial use of various domains. Therefore, deciphering the order of AA sequence encoded by PKS domains tailored the chemical structure of polyketide analogs still remains a great challenge. The present work deals with an in vitro and in silico analysis of PKS type II genes from five actinobacterial species to correlate KS domain architecture and structural features. Our present analysis reveals the unique protein domain organization of iterative type II PKS and KS domain of marine actinobacteria. The findings of this study would have implications in metabolic pathway reconstruction and design of semi-synthetic genomes to achieve rational design of novel natural products. PMID:26903957
Stolterfoht, Holly; Schwendenwein, Daniel; Sensen, Christoph W; Rudroff, Florian; Winkler, Margit
2017-09-10
Increasing demand for chemicals from renewable resources calls for the development of new biotechnological methods for the reduction of oxidized bio-based compounds. Enzymatic carboxylate reduction is highly selective, both in terms of chemo- and product selectivity, but not many carboxylate reductase enzymes (CARs) have been identified on the sequence level to date. Thus far, their phylogeny is unexplored and very little is known about their structure-function-relationship. CARs minimally contain an adenylation domain, a phosphopantetheinylation domain and a reductase domain. We have recently identified new enzymes of fungal origin, using similarity searches against genomic sequences from organisms in which aldehydes were detected upon incubation with carboxylic acids. Analysis of sequences with known CAR functionality and CAR enzymes recently identified in our laboratory suggests that the three-domain architecture mentioned above is modular. The construction of a distance tree with a subsequent 1000-replicate bootstrap analysis showed that the CAR sequences included in our study fall into four distinct subgroups (one of bacterial origin and three of fungal origin, respectively), each with a bootstrap value of 100%. The multiple sequence alignment of all experimentally confirmed CAR protein sequences revealed fingerprint sequences of residues which are likely to be involved in substrate and co-substrate binding and one of the three catalytic substeps, respectively. The fingerprint sequences broaden our understanding of the amino acids that might be essential for the reduction of organic acids to the corresponding aldehydes in CAR proteins. Copyright © 2017 Elsevier B.V. All rights reserved.
Liu, Liyun; Hastings, J. Woodland
2007-01-01
Noctiluca scintillans, a heterotrophic unarmored unicellular bioluminescent dinoflagellate, occurs widely in the oceans, often as a bloom. Molecular phylogenetic analysis based on 18S ribosomal DNA sequences consistently has placed this species on the basal branch of dinoflagellates. Here, we report that the structural organization of its luciferase gene is strikingly different from that of the seven luminous species previously characterized, all of which are photosynthetic. The Noctiluca gene codes for a polypeptide that consists of two distinct but contiguous domains. One, which is located in the N-terminal portion, is shorter than but similar in sequence to the individual domains of the three-domain luciferases found in all other luminous dinoflagellates studied. The other, situated in the C-terminal part, has sequence similarity to the luciferin-binding protein of the luminous dinoflagellate Lingulodinium polyedrum, encoded there by a separate gene. Western analysis shows that the native protein has the same size (≈100 kDa) as the heterologously expressed polypeptide, indicating that it is not a polyprotein. Thus, sequences found in two proteins in the L. polyedrum bioluminescence system are present in a single polypeptide in Noctiluca. PMID:17130452
A Protein Domain and Family Based Approach to Rare Variant Association Analysis.
Richardson, Tom G; Shihab, Hashem A; Rivas, Manuel A; McCarthy, Mark I; Campbell, Colin; Timpson, Nicholas J; Gaunt, Tom R
2016-01-01
It has become common practice to analyse large scale sequencing data with statistical approaches based around the aggregation of rare variants within the same gene. We applied a novel approach to rare variant analysis by collapsing variants together using protein domain and family coordinates, regarded to be a more discrete definition of a biologically functional unit. Using Pfam definitions, we collapsed rare variants (Minor Allele Frequency ≤ 1%) together in three different ways 1) variants within single genomic regions which map to individual protein domains 2) variants within two individual protein domain regions which are predicted to be responsible for a protein-protein interaction 3) all variants within combined regions from multiple genes responsible for coding the same protein domain (i.e. protein families). A conventional collapsing analysis using gene coordinates was also undertaken for comparison. We used UK10K sequence data and investigated associations between regions of variants and lipid traits using the sequence kernel association test (SKAT). We observed no strong evidence of association between regions of variants based on Pfam domain definitions and lipid traits. Quantile-Quantile plots illustrated that the overall distributions of p-values from the protein domain analyses were comparable to that of a conventional gene-based approach. Deviations from this distribution suggested that collapsing by either protein domain or gene definitions may be favourable depending on the trait analysed. We have collapsed rare variants together using protein domain and family coordinates to present an alternative approach over collapsing across conventionally used gene-based regions. Although no strong evidence of association was detected in these analyses, future studies may still find value in adopting these approaches to detect previously unidentified association signals.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zelinka, L.; McCann, S.; Budde, J.
2011-08-05
Highlights: {yields} Affinity purification of the autoimmune rippling muscle disease immunogenic domain of titin. {yields} Partial sequence analysis confirms that the peptides is in the I band region of titin. {yields} This region of the human titin shows high degree of homology to mouse titin N2-A. -- Abstract: Autoimmune rippling muscle disease (ARMD) is an autoimmune neuromuscular disease associated with myasthenia gravis (MG). Past studies in our laboratory recognized a very high molecular weight skeletal muscle protein antigen identified by ARMD patient antisera as the titin isoform. These past studies used antisera from ARMD and MG patients as probes tomore » screen a human skeletal muscle cDNA library and several pBluescript clones revealed supporting expression of immunoreactive peptides. This study characterizes the products of subcloning the titin immunoreactive domain into pGEX-3X and the subsequent fusion protein. Sequence analysis of the fusion gene indicates the cloned titin domain (GenBank ID: (EU428784)) is in frame and is derived from a sequence of N2-A spanning the exons 248-250 an area that encodes the fibronectin III domain. PCR and EcoR1 restriction mapping studies have demonstrated that the inserted cDNA is of a size that is predicted by bioinformatics analysis of the subclone. Expression of the fusion protein result in the isolation of a polypeptide of 52 kDa consistent with the predicted inferred amino acid sequence. Immunoblot experiments of the fusion protein, using rippling muscle/myasthenia gravis antisera, demonstrate that only the titin domain is immunoreactive.« less
Van Damme, Els J. M.; Nakamura-Tsuruta, Sachiko; Smith, David F.; Ongenaert, Maté; Winter, Harry C.; Rougé, Pierre; Goldstein, Irwin J.; Mo, Hanqing; Kominami, Junko; Culerrier, Raphaël; Barre, Annick; Hirabayashi, Jun; Peumans, Willy J.
2007-01-01
A re-investigation of the occurrence and taxonomic distribution of proteins built up of protomers consisting of two tandem arrayed domains equivalent to the GNA [Galanthus nivalis (snowdrop) agglutinin] revealed that these are widespread among monotyledonous plants. Phylogenetic analysis of the available sequences indicated that these proteins do not represent a monophylogenetic group but most probably result from multiple independent domain duplication/in tandem insertion events. To corroborate the relationship between inter-domain sequence divergence and the widening of specificity range, a detailed comparative analysis was made of the sequences and specificity of a set of two-domain GNA-related lectins. Glycan microarray analyses, frontal affinity chromatography and surface plasmon resonance measurements demonstrated that the two-domain GNA-related lectins acquired a marked diversity in carbohydrate-binding specificity that strikingly contrasts the canonical exclusive specificity of their single domain counterparts towards mannose. Moreover, it appears that most two-domain GNA-related lectins interact with both high mannose and complex N-glycans and that this dual specificity relies on the simultaneous presence of at least two different independently acting binding sites. The combined phylogenetic, specificity and structural data strongly suggest that plants used domain duplication followed by divergent evolution as a mechanism to generate multispecific lectins from a single mannose-binding domain. Taking into account that the shift in specificity of some binding sites from high mannose to complex type N-glycans implies that the two-domain GNA-related lectins are primarily directed against typical animal glycans, it is tempting to speculate that plants developed two-domain GNA-related lectins for defence purposes. PMID:17288538
Unique Structural Features and Sequence Motifs of Proline Utilization A (PutA)
Singh, Ranjan K.; Tanner, John J.
2013-01-01
Proline utilization A proteins (PutAs) are bifunctional enzymes that catalyze the oxidation of proline to glutamate using spatially separated proline dehydrogenase and pyrroline-5-carboxylate dehydrogenase active sites. Here we use the crystal structure of the minimalist PutA from Bradyrhizobium japonicum (BjPutA) along with sequence analysis to identify unique structural features of PutAs. This analysis shows that PutAs have secondary structural elements and domains not found in the related monofunctional enzymes. Some of these extra features are predicted to be important for substrate channeling in BjPutA. Multiple sequence alignment analysis shows that some PutAs have a 17-residue conserved motif in the C-terminal 20–30 residues of the polypeptide chain. The BjPutA structure shows that this motif helps seal the internal substrate-channeling cavity from the bulk medium. Finally, it is shown that some PutAs have a 100–200 residue domain of unknown function in the C-terminus that is not found in minimalist PutAs. Remote homology detection suggests that this domain is homologous to the oligomerization beta-hairpin and Rossmann fold domain of BjPutA. PMID:22201760
Structural diversity of domain superfamilies in the CATH database.
Reeves, Gabrielle A; Dallman, Timothy J; Redfern, Oliver C; Akpor, Adrian; Orengo, Christine A
2006-07-14
The CATH database of domain structures has been used to explore the structural variation of homologous domains in 294 well populated domain structure superfamilies, each containing at least three sequence diverse relatives. Our analyses confirm some previously detected trends relating sequence divergence to structural variation but for a much larger dataset and in some superfamilies the new data reveal exceptional structural variation. Use of a new algorithm (2DSEC) to analyse variability in secondary structure compositions across a superfamily sheds new light on how structures evolve. 2DSEC detects inserted secondary structures that embellish the core of conserved secondary structures found throughout the superfamily. Analysis showed that for 56% of highly populated superfamilies (>9 sequence diverse relatives), there are twofold or more increases in the numbers of secondary structures in some relatives. In some families fivefold increases occur, sometimes modifying the fold of the domain. Manual inspection of secondary structure insertions or embellishments in 48 particularly variable superfamilies revealed that although these insertions were usually discontiguous in the sequence they were often co-located in 3D resulting in a larger structural motif that often modified the geometry of the active site or the surface conformation promoting diverse domain partnerships and protein interactions. These observations, supported by automatic analysis of all well populated CATH families, suggest that accretion of small secondary structure insertions may provide a simple mechanism for evolving new functions in diverse relatives. Some layered domain architectures (e.g. mainly-beta and alpha-beta sandwiches) that recur highly in the genomes more frequently exploit these types of embellishments to modify function. In these architectures, aggregation occurs most often at the edges, top or bottom of the beta-sheets. Information on structural variability across domain superfamilies has been made available through the CATH Dictionary of Homologous Structures (DHS).
Papandreou, Nikolaos; Chomilier, Jacques
2008-01-01
The co-chaperone Hop [heat shock protein (HSP) organising protein] is known to bind both Hsp70 and Hsp90. Hop comprises three repeats of a tetratricopeptide repeat (TPR) domain, each consisting of three TPR motifs. The first and last TPR domains are followed by a domain containing several dipeptide (DP) repeats called the DP domain. These analyses suggest that the hop genes result from successive recombination events of an ancestral TPR–DP module. From a hydrophobic cluster analysis of homologous Hop protein sequences derived from gene families, we can postulate that shifts in the open reading frames are at the origin of the present sequences. Moreover, these shifts can be related to the presence or absence of biological function. We propose to extend the family of Hop co-chaperons into the kingdom of bacteria, as several structurally related genes have been identified by hydrophobic cluster analysis. We also provide evidence of common structural characteristics between hop and hip genes, suggesting a shared precursor of ancestral TPR–DP domains. Electronic supplementary material The online version of this article (doi:10.1007/s12192-008-0083-8) contains supplementary material, which is available to authorized users. PMID:18987995
DOE Office of Scientific and Technical Information (OSTI.GOV)
Borziak, Kirill; Jouline, Igor B
2007-01-01
Motivation: Sensory domains that are conserved among Bacteria, Archaea and Eucarya are important detectors of common signals detected by living cells. Due to their high sequence divergence, sensory domains are difficult to identify. We systematically look for novel sensory domains using sensitive profile-based searches initi-ated with regions of signal transduction proteins where no known domains can be identified by current domain models. Results: Using profile searches followed by multiple sequence alignment, structure prediction, and domain architecture analysis, we have identified a novel sensory domain termed FIST, which is present in signal transduction proteins from Bacteria, Archaea and Eucarya. Remote similaritymore » to a known ligand-binding fold and chromosomal proximity of FIST-encoding genes to those coding for proteins involved in amino acid metabolism and transport suggest that FIST domains bind small ligands, such as amino acids.« less
NASA Astrophysics Data System (ADS)
Rauf, Muhammad; Saeed, Nasir A.; Habib, Imran; Ahmed, Moddassir; Shahzad, Khurram; Mansoor, Shahid; Ali, Rashid
2017-02-01
Structure prediction can provide information about function and active sites of protein which helps to design new functional proteins. H+-pyrophosphatase is transmembrane protein involved in establishing proton motive force for active transport of Na+ across membrane by Na+/H+ antiporters. A full length novel H+-pyrophosphatase gene was isolated from halophytic grass Leptochloa fusca using RT-PCR and RACE method. Full length LfVP1 gene sequence of 2292 nucleotides encodes protein of 764 amino acids. DNA and protein sequences were used for characterization using bioinformatics tools. Various important potential sites were predicted by PROSITE webserver. Primary structural analysis showed LfVP1 as stable protein and Grand average hydropathy (GRAVY) indicated that LfVP1 protein has good hydrosolubility. Secondary structure analysis showed that LfVP1 protein sequence contains significant proportion of alpha helix and random coil. Protein membrane topology suggested the presence of 14 transmembrane domains and presence of catalytic domain in TM3. Three dimensional structure from LfVP1 protein sequence also indicated the presence of 14 transmembrane domains and hydrophobicity surface model showed amino acid hydrophobicity. Ramachandran plot showed that 98% amino acid residues were predicted in the favored region.
Classification and Lineage Tracing of SH2 Domains Throughout Eukaryotes.
Liu, Bernard A
2017-01-01
Today there exists a rapidly expanding number of sequenced genomes. Cataloging protein interaction domains such as the Src Homology 2 (SH2) domain across these various genomes can be accomplished with ease due to existing algorithms and predictions models. An evolutionary analysis of SH2 domains provides a step towards understanding how SH2 proteins integrated with existing signaling networks to position phosphotyrosine signaling as a crucial driver of robust cellular communication networks in metazoans. However organizing and tracing SH2 domain across organisms and understanding their evolutionary trajectory remains a challenge. This chapter describes several methodologies towards analyzing the evolutionary trajectory of SH2 domains including a global SH2 domain classification system, which facilitates annotation of new SH2 sequences essential for tracing the lineage of SH2 domains throughout eukaryote evolution. This classification utilizes a combination of sequence homology, protein domain architecture and the boundary positions between introns and exons within the SH2 domain or genes encoding these domains. Discrete SH2 families can then be traced across various genomes to provide insight into its origins. Furthermore, additional methods for examining potential mechanisms for divergence of SH2 domains from structural changes to alterations in the protein domain content and genome duplication will be discussed. Therefore a better understanding of SH2 domain evolution may enhance our insight into the emergence of phosphotyrosine signaling and the expansion of protein interaction domains.
Minakuchi, Kazunobu; Murata, Dai; Okubo, Yuji; Nakano, Yoshiyuki; Yoshida, Shinichi
2013-01-01
Protein A affinity chromatography is the standard purification process for the capture of therapeutic antibodies. The individual IgG-binding domains of protein A (E, D, A, B, C) have highly homologous amino acid sequences. From a previous report, it has been assumed that the C domain has superior resistance to alkaline conditions compared to the other domains. We investigated several properties of the C domain as an IgG-Fc capture ligand. Based on cleavage site analysis of a recombinant protein A using a protein sequencer, the C domain was found to be the only domain to have neither of the potential alkaline cleavage sites. Circular dichroism (CD) analysis also indicated that the C domain has good physicochemical stability. Additionally, we evaluated the amino acid substitutions at the Gly-29 position of the C domain, as the Z domain (an artificial B domain) acquired alkaline resistance through a G29A mutation. The G29A mutation proved to increase the alkaline resistance of the C domain, based on BIACORE analysis, although the improvement was significantly smaller than that observed for the B domain. Interestingly, a number of other amino acid mutations at the same position increased alkaline resistance more than did the G29A mutation. This result supports the notion that even a single mutation on the originally alkali-stable C domain would improve its alkaline stability. An engineered protein A based on this C domain is expected to show remarkable performance as an affinity ligand for immunoglobulin. PMID:23868198
Domain architecture conservation in orthologs
2011-01-01
Background As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs. Results The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent. Conclusions On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance. PMID:21819573
Directed evolution of the TALE N-terminal domain for recognition of all 5' bases.
Lamb, Brian M; Mercer, Andrew C; Barbas, Carlos F
2013-11-01
Transcription activator-like effector (TALE) proteins can be designed to bind virtually any DNA sequence. General guidelines for design of TALE DNA-binding domains suggest that the 5'-most base of the DNA sequence bound by the TALE (the N0 base) should be a thymine. We quantified the N0 requirement by analysis of the activities of TALE transcription factors (TALE-TF), TALE recombinases (TALE-R) and TALE nucleases (TALENs) with each DNA base at this position. In the absence of a 5' T, we observed decreases in TALE activity up to >1000-fold in TALE-TF activity, up to 100-fold in TALE-R activity and up to 10-fold reduction in TALEN activity compared with target sequences containing a 5' T. To develop TALE architectures that recognize all possible N0 bases, we used structure-guided library design coupled with TALE-R activity selections to evolve novel TALE N-terminal domains to accommodate any N0 base. A G-selective domain and broadly reactive domains were isolated and characterized. The engineered TALE domains selected in the TALE-R format demonstrated modularity and were active in TALE-TF and TALEN architectures. Evolved N-terminal domains provide effective and unconstrained TALE-based targeting of any DNA sequence as TALE binding proteins and designer enzymes.
Miller, Bradley R; Sundlov, Jesse A; Drake, Eric J; Makin, Thomas A; Gulick, Andrew M
2014-10-01
Nonribosomal peptide synthetases (NRPSs) are multimodular proteins capable of producing important peptide natural products. Using an assembly line process, the amino acid substrate and peptide intermediates are passed between the active sites of different catalytic domains of the NRPS while bound covalently to a peptidyl carrier protein (PCP) domain. Examination of the linker sequences that join the NRPS adenylation and PCP domains identified several conserved proline residues that are not found in standalone adenylation domains. We examined the roles of these proline residues and neighboring conserved sequences through mutagenesis and biochemical analysis of the reaction catalyzed by the adenylation domain and the fully reconstituted NRPS pathway. In particular, we identified a conserved LPxP motif at the start of the adenylation-PCP linker. The LPxP motif interacts with a region on the adenylation domain to stabilize a critical catalytic lysine residue belonging to the A10 motif that immediately precedes the linker. Further, this interaction with the C-terminal subdomain of the adenylation domain may coordinate movement of the PCP with the conformational change of the adenylation domain. Through this work, we extend the conserved A10 motif of the adenylation domain and identify residues that enable proper adenylation domain function. © 2014 Wiley Periodicals, Inc.
Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo
2003-01-01
To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979
Image encryption using random sequence generated from generalized information domain
NASA Astrophysics Data System (ADS)
Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu
2016-05-01
A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.
ERIC Educational Resources Information Center
Wu, Yann-Shya
The purpose of this paper is to provide guidance for instructional sequencing in emotional literacy curricula. First, the concepts of instructional sequence and the problems involved with instructional sequence in the affective domain of learning are addressed. Then, through the analysis of the emotional literacy curriculum, Promoting Alternative…
Nodal domains of a non-separable problem—the right-angled isosceles triangle
NASA Astrophysics Data System (ADS)
Aronovitch, Amit; Band, Ram; Fajman, David; Gnutzmann, Sven
2012-03-01
We study the nodal set of eigenfunctions of the Laplace operator on the right-angled isosceles triangle. A local analysis of the nodal pattern provides an algorithm for computing the number νn of nodal domains for any eigenfunction. In addition, an exact recursive formula for the number of nodal domains is found to reproduce all existing data. Eventually, we use the recursion formula to analyse a large sequence of nodal counts statistically. Our analysis shows that the distribution of nodal counts for this triangular shape has a much richer structure than the known cases of regular separable shapes or completely irregular shapes. Furthermore, we demonstrate that the nodal count sequence contains information about the periodic orbits of the corresponding classical ray dynamics.
Shin, Junha; Lee, Insuk
2015-01-01
Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life—Archaea, Bacteria, and Eukaryota—suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co-inheritance analysis within the domains of life will greatly potentiate the use of the expected onslaught of sequenced genomes in the study of molecular pathways in higher eukaryotes. PMID:26394049
Sammond, Deanne W.; Payne, Christina M.; Brunecky, Roman; Himmel, Michael E.; Crowley, Michael F.; Beckham, Gregg T.
2012-01-01
Cellulase enzymes deconstruct cellulose to glucose, and are often comprised of glycosylated linkers connecting glycoside hydrolases (GHs) to carbohydrate-binding modules (CBMs). Although linker modifications can alter cellulase activity, the functional role of linkers beyond domain connectivity remains unknown. Here we investigate cellulase linkers connecting GH Family 6 or 7 catalytic domains to Family 1 or 2 CBMs, from both bacterial and eukaryotic cellulases to identify conserved characteristics potentially related to function. Sequence analysis suggests that the linker lengths between structured domains are optimized based on the GH domain and CBM type, such that linker length may be important for activity. Longer linkers are observed in eukaryotic GH Family 6 cellulases compared to GH Family 7 cellulases. Bacterial GH Family 6 cellulases are found with structured domains in either N to C terminal order, and similar linker lengths suggest there is no effect of domain order on length. O-glycosylation is uniformly distributed across linkers, suggesting that glycans are required along entire linker lengths for proteolysis protection and, as suggested by simulation, for extension. Sequence comparisons show that proline content for bacterial linkers is more than double that observed in eukaryotic linkers, but with fewer putative O-glycan sites, suggesting alternative methods for extension. Conversely, near linker termini where linkers connect to structured domains, O-glycosylation sites are observed less frequently, whereas glycines are more prevalent, suggesting the need for flexibility to achieve proper domain orientations. Putative N-glycosylation sites are quite rare in cellulase linkers, while an N-P motif, which strongly disfavors the attachment of N-glycans, is commonly observed. These results suggest that linkers exhibit features that are likely tailored for optimal function, despite possessing low sequence identity. This study suggests that cellulase linkers may exhibit function in enzyme action, and highlights the need for additional studies to elucidate cellulase linker functions. PMID:23139804
Gupta, Anjali Bansal; Wee, Liang En; Zhou, Yi Ting; Hortsch, Michael; Low, Boon Chuan
2012-01-01
The CRAL_TRIO protein domain, which is unique to the Sec14 protein superfamily, binds to a diverse set of small lipophilic ligands. Similar domains are found in a range of different proteins including neurofibromatosis type-1, a Ras GTPase-activating Protein (RasGAP) and Rho guanine nucleotide exchange factors (RhoGEFs). Proteins containing this structural protein domain exhibit a low sequence similarity and ligand specificity while maintaining an overall characteristic three-dimensional structure. We have previously demonstrated that the BNIP-2 and Cdc42GAP Homology (BCH) protein domain, which shares a low sequence homology with the CRAL_TRIO domain, can serve as a regulatory scaffold that binds to Rho, RhoGEFs and RhoGAPs to control various cell signalling processes. In this work, we investigate 175 BCH domain-containing proteins from a wide range of different organisms. A phylogenetic analysis with ∼100 CRAL_TRIO and similar domains from eight representative species indicates a clear distinction of BCH-containing proteins as a novel subclass within the CRAL_TRIO/Sec14 superfamily. BCH-containing proteins contain a hallmark sequence motif R(R/K)h(R/K)(R/K)NL(R/K)xhhhhHPs (‘h’ is large and hydrophobic residue and ‘s’ is small and weekly polar residue) and can be further subdivided into three unique subtypes associated with BNIP-2-N, macro- and RhoGAP-type protein domains. A previously unknown group of genes encoding ‘BCH-only’ domains is also identified in plants and arthropod species. Based on an analysis of their gene-structure and their protein domain context we hypothesize that BCH domain-containing genes evolved through gene duplication, intron insertions and domain swapping events. Furthermore, we explore the point of divergence between BCH and CRAL-TRIO proteins in relation to their ability to bind small GTPases, GAPs and GEFs and lipid ligands. Our study suggests a need for a more extensive analysis of previously uncharacterized BCH, ‘BCH-like’ and CRAL_TRIO-containing proteins and their significance in regulating signaling events involving small GTPases. PMID:22479462
Structure and inhibition analysis of the mouse SAD-B C-terminal fragment.
Ma, Hui; Wu, Jing-Xiang; Wang, Jue; Wang, Zhi-Xin; Wu, Jia-Wei
2016-10-01
The SAD (synapses of amphids defective) kinases, including SAD-A and SAD-B, play important roles in the regulation of neuronal development, cell cycle, and energy metabolism. Our recent study of mouse SAD-A identified a unique autoinhibitory sequence (AIS), which binds at the junction of the kinase domain (KD) and the ubiquitin-associated (UBA) domain and exerts autoregulation in cooperation with UBA. Here, we report the crystal structure of the mouse SAD-B C-terminal fragment including the AIS and the kinase-associated domain 1 (KA1) at 2.8 Å resolution. The KA1 domain is structurally conserved, while the isolated AIS sequence is highly flexible and solvent-accessible. Our biochemical studies indicated that the SAD-B AIS exerts the same autoinhibitory role as that in SAD-A. We believe that the flexible isolated AIS sequence is readily available for interaction with KD-UBA and thus inhibits SAD-B activity.
Comparative analysis of the XopD T3S effector family in plant pathogenic bacteria
Kim, Jung-Gun; Taylor, Kyle W.; Mudgett, Mary Beth
2011-01-01
SUMMARY XopD is a type III effector protein that is required for Xanthomonas campestris pathovar vesicatoria (Xcv) growth in tomato. It is a modular protein consisting of an N-terminal DNA-binding domain, two EAR transcriptional repressor motifs, and a C-terminal SUMO protease. In tomato, XopD functions as a transcriptional repressor, resulting in the suppression of defense responses at late stages of infection. A survey of available genome sequences for phytopathogenic bacteria revealed that XopD homologs are limited to species within three Genera of Proteobacteria – Xanthomonas, Acidovorax, and Pseudomonas. While the EAR motif(s) and SUMO protease domain are conserved in all the XopD-like proteins, variation exists in the length and sequence identity of the N-terminal domains. Comparative analysis of the DNA sequences surrounding xopD and xopD-like genes led to revised annotation of the xopD gene. Edman degradation sequence analysis and functional complementation studies confirmed that the xopD gene from Xcv encodes a 760 amino acid protein with a longer N-terminal domain than previously predicted. None of the XopD-like proteins studied complemented Xcv ΔxopD mutant phenotypes in tomato leaves suggesting that the N-terminus of XopD defines functional specificity. Xcv ΔxopD strains expressing chimeric fusion proteins containing the N-terminus of XopD fused to the EAR motif(s) and SUMO protease domain of the XopD-like protein from Xanthomonas campestris pathovar campestris strain B100 were fully virulent in tomato demonstrating that the N-terminus of XopD controls specificity in tomato. PMID:21726373
Carlow, Chevonne E; Faultless, J Trent; Lee, Christine; Siddiqua, Mahbuba; Edge, Alison; Nassuth, Annette
2017-09-01
The highly conserved CBF pathway is crucial in the regulation of plant responses to low temperatures. Extensive analysis of Arabidopsis CBF proteins revealed that their functions rely on several conserved amino acid domains although the exact function of each domain is disputed. The question was what functions similar domains have in CBFs from other, overwintering woody plants such as Vitis, which likely have a more involved regulation than the model plant Arabidopsis. A total of seven CBF genes were cloned and sequenced from V. riparia and the less frost tolerant V. vinifera. The deduced species-specific amino acid sequences differ in only a few amino acids, mostly in non-conserved regions. Amino acid sequence comparison and phylogenetic analysis showed two distinct groups of Vitis CBFs. One group contains CBF1, CBF2, CBF3 and CBF8 and the other group contains CBF4, CBF5 and CBF6. Transient transactivation assays showed that all Vitis CBFs except CBF5 activate via a CRT or DRE promoter element, whereby Vitis CBF3 and 4 prefer a CRT element. The hydrophobic domains in the C-terminal end of VrCBF6 were shown to be important for how well it activates. The putative nuclear localization domain of Vitis CBF1 was shown to be sufficient for nuclear localization, in contrast to previous reports for AtCBF1, and also important for transactivation. The latter highlights the value of careful analysis of domain functions instead of reliance on computer predictions and published data for other related proteins. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Phylogenetic profiles reveal structural/functional determinants of TRPC3 signal-sensing antennae
Ko, Kyung Dae; Bhardwaj, Gaurav; Hong, Yoojin; Chang, Gue Su; Kiselyov, Kirill
2009-01-01
Biochemical assessment of channel structure/function is incredibly challenging. Developing computational tools that provide these data would enable translational research, accelerating mechanistic experimentation for the bench scientist studying ion channels. Starting with the premise that protein sequence encodes information about structure, function and evolution (SF&E), we developed a unified framework for inferring SF&E from sequence information using a knowledge-based approach. The Gestalt Domain Detection Algorithm-Basic Local Alignment Tool (GDDA-BLAST) provides phylogenetic profiles that can model, ab initio, SF&E relationships of biological sequences at the whole protein, single domain and single-amino acid level.1,2 In our recent paper,4 we have applied GDDA-BLAST analysis to study canonical TRP (TRPC) channels1 and empirically validated predicted lipid-binding and trafficking activities contained within the TRPC3 TRP_2 domain of unknown function. Overall, our in silico, in vitro, and in vivo experiments support a model in which TRPC3 has signal-sensing antennae which are adorned with lipid-binding, trafficking and calmodulin regulatory domains. In this Addendum, we correlate our functional domain analysis with the cryo-EM structure of TRPC3.3 In addition, we synthesize recent studies with our new findings to provide a refined model on the mechanism(s) of TRPC3 activation/deactivation. PMID:19704910
Goonesekere, Nalin C W; Shipely, Krysten; O'Connor, Kevin
2010-06-01
The Pfam database is an important tool in genome annotation, since it provides a collection of curated protein families. However, a subset of these families, known as domains of unknown function (DUFs), remains poorly characterized. We have related sequences from DUF404, DUF407, DUF482, DUF608, DUF810, DUF853, DUF976 and DUF1111 to homologs in PDB, within the midnight zone (9-20%) of sequence identity. These relationships were extended to provide functional annotation by sequence analysis and model building. Also described are examples of residue plasticity within enzyme active sites, and change of function within homologous sequences of a DUF. Copyright 2010 Elsevier Ltd. All rights reserved.
Directed evolution of the TALE N-terminal domain for recognition of all 5′ bases
Lamb, Brian M.; Mercer, Andrew C.; Barbas, Carlos F.
2013-01-01
Transcription activator-like effector (TALE) proteins can be designed to bind virtually any DNA sequence. General guidelines for design of TALE DNA-binding domains suggest that the 5′-most base of the DNA sequence bound by the TALE (the N0 base) should be a thymine. We quantified the N0 requirement by analysis of the activities of TALE transcription factors (TALE-TF), TALE recombinases (TALE-R) and TALE nucleases (TALENs) with each DNA base at this position. In the absence of a 5′ T, we observed decreases in TALE activity up to >1000-fold in TALE-TF activity, up to 100-fold in TALE-R activity and up to 10-fold reduction in TALEN activity compared with target sequences containing a 5′ T. To develop TALE architectures that recognize all possible N0 bases, we used structure-guided library design coupled with TALE-R activity selections to evolve novel TALE N-terminal domains to accommodate any N0 base. A G-selective domain and broadly reactive domains were isolated and characterized. The engineered TALE domains selected in the TALE-R format demonstrated modularity and were active in TALE-TF and TALEN architectures. Evolved N-terminal domains provide effective and unconstrained TALE-based targeting of any DNA sequence as TALE binding proteins and designer enzymes. PMID:23980031
Murray, R; Pederson, K; Prosser, H; Muller, D; Hutchison, C A; Frelinger, J A
1988-01-01
We have used random oligonucleotide mutagenesis (or saturation mutagenesis) to create a library of point mutations in the alpha 1 protein domain of a Major Histocompatibility Complex (MHC) molecule. This protein domain is critical for T cell and B cell recognition. We altered the MHC class I H-2DP gene sequence such that synthetic mutant alpha 1 exons (270 bp of coding sequence), which contain mutations identified by sequence analysis, can replace the wild type alpha 1 exon. The synthetic exons were constructed from twelve overlapping oligonucleotides which contained an average of 1.3 random point mutations per intact exon. DNA sequence analysis of mutant alpha 1 exons has shown a point mutant distribution that fits a Poisson distribution, and thus emphasizes the utility of this mutagenesis technique to "scan" a large protein sequence for important mutations. We report our use of saturation mutagenesis to scan an entire exon of the H-2DP gene, a cassette strategy to replace the wild type alpha 1 exon with individual mutant alpha 1 exons, and analysis of mutant molecules expressed on the surface of transfected mouse L cells. Images PMID:2903482
A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions
Abnousi, Armen; Broschat, Shira L.; Kalyanaraman, Ananth
2016-01-01
Background Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions. However, identifying conserved regions from large collections (millions) of protein sequences presents significant challenges. Methods In this paper we present a new, alignment-free method for detecting conserved regions in protein sequences called NADDA (No-Alignment Domain Detection Algorithm). Our method exploits the abundance of exact matching short subsequences (k-mers) to quickly detect conserved regions, and the power of machine learning is used to improve the prediction accuracy of detection. We present a parallel implementation of NADDA using the MapReduce framework and show that our method is highly scalable. Results We have compared NADDA with Pfam and InterPro databases. For known domains annotated by Pfam, accuracy is 83%, sensitivity 96%, and specificity 44%. For sequences with new domains not present in the training set an average accuracy of 63% is achieved when compared to Pfam. A boost in results in comparison with InterPro demonstrates the ability of NADDA to capture conserved regions beyond those present in Pfam. We have also compared NADDA with ADDA and MKDOM2, assuming Pfam as ground-truth. On average NADDA shows comparable accuracy, more balanced sensitivity and specificity, and being alignment-free, is significantly faster. Excluding the one-time cost of training, runtimes on a single processor were 49s, 10,566s, and 456s for NADDA, ADDA, and MKDOM2, respectively, for a data set comprised of approximately 2500 sequences. PMID:27552220
Spencermartinsiella europaea gen. nov., sp. nov., a new member of the family Trichomonascaceae
USDA-ARS?s Scientific Manuscript database
Ten strains of a novel heterothallic yeast species were isolated from rotten wood collected at different locations in Hungary. Analysis of gene sequences for the D1/D2 domain of the large subunit ribosomal RNA, as well as analysis of concatenated gene sequences for the nearly complete nuclear large...
Yue, Chen-Li; Shi, Jie-Ran; Shi, Chang-Hong; Zhang, Hai; Zhao, Lei; Zhang, Ting-Fen; Zhao, Yong; Xi, Li
2008-10-01
To express Micrococcus luteus resuscitation promoting factor (Rpf) domain and its mutants in prokaryotic cells, and to investigate their bioactivity. The gene of Rpf domain and its mutants (E54K, E54A) were amplified by polymerase chain reaction (PCR) from the genome of Micrococcus luteus and cloned into pMD18-T vector. After sequenced, the Rpf domain and its mutant gene were subcloned into expression vector PGEX-4T-1, and transfected into E. coli DH5alpha. The expressed product was purified by affinity chromatography using GST Fusion Protein Purification bead. The aim proteins were identified by SDS-PAGE analysis and by Western blot with monoclonal antibodies against Rpf domain (mAb). The bioactivity of the proteins was analyzed by stimulating the resuscitation of Mycobacterium smegmatis. The sequences of the PCR products were identical to those of the Rpf domain and its mutant gene in GenBank. The relative molecular mass identified by SDS-PAGE analysis was consistent with that had been reported, which was also confirmed by Western blot analysis that there were specific bindings at 32 000 with Rpf domain mAb. The purified GST-Rpf domain could stimulate resuscitation of Mycobacterium smegmatis. Replacements E54A and especially E54K resulted in inhibition of Rpf resuscitation activity. Rpf domain and two kinds of its mutant protein were obtained, and its effects on the resuscitation of dormant Mycobacterium smegmatis were clarified.
Wytynck, Pieter; Rougé, Pierre; Van Damme, Els J M
2017-11-01
Ribosome-inactivating proteins (RIPs) are cytotoxic enzymes capable of halting protein synthesis by irreversible modification of ribosomes. Although RIPs are widespread they are not ubiquitous in the plant kingdom. The physiological importance of RIPs is not fully elucidated, but evidence suggests a role in the protection of the plant against biotic and abiotic stresses. Searches in the rice genome revealed a large and highly complex family of proteins with a RIP domain. A comparative analysis retrieved 38 RIP sequences from the genome sequence of Oryza sativa subspecies japonica and 34 sequences from the subspecies indica. The RIP sequences are scattered over different chromosomes but are mostly found on the third chromosome. The phylogenetic tree revealed the pairwise clustering of RIPs from japonica and indica. Molecular modeling and sequence analysis yielded information on the catalytic site of the enzyme, and suggested that a large part of RIP domains probably possess N-glycosidase activity. Several RIPs are differentially expressed in plant tissues and in response to specific abiotic stresses. This study provides an overview of RIP motifs in rice and will help to understand their biological role(s) and evolutionary relationships. Copyright © 2017 Elsevier Ltd. All rights reserved.
Muangkram, Yuttamol; Amano, Akira; Wajjwalku, Worawidh; Pinyopummintr, Tanu; Thongtip, Nikorn; Kaolim, Nongnid; Sukmak, Manakorn; Kamolnorranath, Sumate; Siriaroonrat, Boripat; Tipkantha, Wanlaya; Maikaew, Umaporn; Thomas, Warisara; Polsrila, Kanda; Dongsaard, Kwanreaun; Sanannu, Saowaphang; Wattananorrasate, Anuwat
2017-07-01
The Asian tapir (Tapirus indicus) has been classified as Endangered on the IUCN Red List of Threatened Species (2008). Genetic diversity data provide important information for the management of captive breeding and conservation of this species. We analyzed mitochondrial control region (CR) sequences from 37 captive Asian tapirs in Thailand. Multiple alignments of the full-length CR sequences sized 1268 bp comprised three domains as described in other mammal species. Analysis of 16 parsimony-informative variable sites revealed 11 haplotypes. Furthermore, the phylogenetic analysis using median-joining network clearly showed three clades correlated with our earlier cytochrome b gene study in this endangered species. The repetitive motif is located between first and second conserved sequence blocks, similar to the Brazilian tapir. The highest polymorphic site was located in the extended termination associated sequences domain. The results could be applied for future genetic management based in captivity and wild that shows stable populations.
Orthologs in Arabidopsis thaliana of the Hsp70 interacting protein Hip
Webb, Mary Alice; Cavaletto, John M.; Klanrit, Preekamol; Thompson, Gary A.
2001-01-01
The Hsp70-interacting protein Hip binds to the adenosine triphosphatase domain of Hsp70, stabilizing it in the adenosine 5′-diphosphate–ligated conformation and promoting binding of target polypeptides. In mammalian cells, Hip is a component of the cytoplasmic chaperone heterocomplex that regulates signal transduction via interaction with hormone receptors and protein kinases. Analysis of the complete genome sequence of the model flowering plant Arabidopsis thaliana revealed 2 genes encoding Hip orthologs. The deduced sequence of AtHip-1 consists of 441 amino acid residues and is 42% identical to human Hip. AtHip-1 contains the same functional domains characterized in mammalian Hip, including an N-terminal dimerization domain, an acidic domain, 3 tetratricopeptide repeats flanked by a highly charged region, a series of degenerate GGMP repeats, and a C-terminal region similar to the Sti1/Hop/p60 protein. The deduced amino acid sequence of AtHip-2 consists of 380 amino acid residues. AtHip-2 consists of a truncated Hip-like domain that is 46% identical to human Hip, followed by a C-terminal domain related to thioredoxin. AtHip-2 is 63% identical to another Hip-thioredoxin protein recently identified in Vitis labrusca (grape). The truncated Hip domain in AtHip-2 includes the amino terminus, the acidic domain, and tetratricopeptide repeats with flanking charged region. Analyses of expressed sequence tag databases indicate that both AtHip-1 and AtHip-2 are expressed in A thaliana and that orthologs of Hip are also expressed widely in other plants. The similarity between AtHip-1 and its mammalian orthologs is consistent with a similar role in plant cells. The sequence of AtHip-2 suggests the possibility of additional unique chaperone functions. PMID:11599566
Jiang, W; Gupta, D; Gallagher, D; Davis, S; Bhavanandan, V P
2000-04-01
We previously elucidated five distinct protein domains (I-V) for bovine submaxillary mucin, which is encoded by two genes, BSM1 and BSM2. Using Southern blot analysis, genomic cloning and sequencing of the BSM1 gene, we now show that the central domain (V) consists of approximately 55 tandem repeats of 329 amino acids and that domains III-V are encoded by a 58.4-kb exon, the largest exon known for all genes to date. The BSM1 gene was mapped by fluorescence in situ hybridization to the proximal half of chromosome 5 at bands q2. 2-q2.3. The amino-acid sequence of six tandem repeats (two full and four partial) were found to have only 92-94% identities. We propose that the variability in the amino-acid sequences of the mucin tandem repeat is important for generating the combinatorial library of saccharides that are necessary for the protective function of mucins. The deduced peptide sequences of the central domain match those determined from the purified bovine submaxillary mucin and also show 68-94% identity to published peptide sequences of ovine submaxillary mucin. This indicates that the core protein of ovine submaxillary mucin is closely related to that of bovine submaxillary mucin and contains similar tandem repeats in the central domain. In contrast, the central domain of porcine submaxillary mucin is reported to consist of 81-amino-acid tandem repeats. However, both bovine submaxillary mucin and porcine submaxillary mucin contain similar N-terminal and C-terminal domains and the corresponding genes are in the conserved linkage regions of the respective genomes.
Yasukawa, Hiro; Sato, Aya; Kita, Ayaka; Kodaira, Ken-Ichi; Iseki, Mineo; Takahashi, Tetsuo; Shibusawa, Mami; Watanabe, Masakatsu; Yagita, Kenji
2013-01-01
Complete genome sequencing of Naegleria gruberi has revealed that the organism encodes polypeptides similar to photoactivated adenylyl cyclases (PACs). Screening in the N. australiensis genome showed that the organism also encodes polypeptides similar to PACs. Each of the Naegleria proteins consists of a "sensors of blue-light using FAD" domain (BLUF domain) and an adenylyl cyclase domain (AC domain). PAC activity of the Naegleria proteins was assayed by comparing sensitivities of Escherichia coli cells heterologously expressing the proteins to antibiotics in a dark condition and a blue light-irradiated condition. Antibiotics used in the assays were fosfomycin and fosmidomycin. E. coli cells expressing the Naegleria proteins showed increased fosfomycin sensitivity and fosmidomycin sensitivity when incubated under blue light, indicating that the proteins functioned as PACs in the bacterial cells. Analysis of the N. fowleri genome revealed that the organism encodes a protein bearing an amino acid sequence similar to that of BLUF. A plasmid expressing a chimeric protein consisting of the BLUF-like sequence found in N. fowleri and the adenylyl cyclase domain of N. gruberi PAC was constructed to determine whether the BLUF-like sequence functioned as a sensor of blue light. E. coli cells expressing a chimeric protein showed increased fosfomycin sensitivity and fosmidomycin sensitivity when incubated under blue light. These experimental results indicated that the sequence similar to the BLUF domain found in N. fowleri functioned as a sensor of blue light.
Toward rules relating zinc finger protein sequences and DNA binding site preferences.
Desjarlais, J R; Berg, J M
1992-08-15
Zinc finger proteins of the Cys2-His2 type consist of tandem arrays of domains, where each domain appears to contact three adjacent base pairs of DNA through three key residues. We have designed and prepared a series of variants of the central zinc finger within the DNA binding domain of Sp1 by using information from an analysis of a large data base of zinc finger protein sequences. Through systematic variations at two of the three contact positions (underlined), relatively specific recognition of sequences of the form 5'-GGGGN(G or T)GGG-3' has been achieved. These results provide the basis for rules that may develop into a code that will allow the design of zinc finger proteins with preselected DNA site specificity.
Matrix metalloproteinases: structures, evolution, and diversification.
Massova, I; Kotra, L P; Fridman, R; Mobashery, S
1998-09-01
A comprehensive sequence alignment of 64 members of the family of matrix metalloproteinases (MMPs) for the entire sequences, and subsequently the catalytic and the hemopexin-like domains, have been performed. The 64 MMPs were selected from plants, invertebrates, and vertebrates. The analyses disclosed that as many as 23 distinct subfamilies of these proteins are known to exist. Information from the sequence alignments was correlated with structures, both crystallographic as well as computational, of the catalytic domains for the 23 representative members of the MMP family. A survey of the metal binding sites and two loops containing variable sequences of amino acids, which are important for substrate interactions, are discussed. The collective data support the proposal that the assembly of the domains into multidomain enzymes was likely to be an early evolutionary event. This was followed by diversification, perhaps in parallel among the MMPs, in a subsequent evolutionary time scale. Analysis indicates that a retrograde structure simplification may have accounted for the evolution of MMPs with simple domain constituents, such as matrilysin, from the larger and more elaborate enzymes.
Jeon, Jouhyun; Arnold, Roland; Singh, Fateh; Teyra, Joan; Braun, Tatjana; Kim, Philip M
2016-04-01
The identification of structured units in a protein sequence is an important first step for most biochemical studies. Importantly for this study, the identification of stable structured region is a crucial first step to generate novel synthetic antibodies. While many approaches to find domains or predict structured regions exist, important limitations remain, such as the optimization of domain boundaries and the lack of identification of non-domain structured units. Moreover, no integrated tool exists to find and optimize structural domains within protein sequences. Here, we describe a new tool, PAT ( http://www.kimlab.org/software/pat ) that can efficiently identify both domains (with optimized boundaries) and non-domain putative structured units. PAT automatically analyzes various structural properties, evaluates the folding stability, and reports possible structural domains in a given protein sequence. For reliability evaluation of PAT, we applied PAT to identify antibody target molecules based on the notion that soluble and well-defined protein secondary and tertiary structures are appropriate target molecules for synthetic antibodies. PAT is an efficient and sensitive tool to identify structured units. A performance analysis shows that PAT can characterize structurally well-defined regions in a given sequence and outperforms other efforts to define reliable boundaries of domains. Specially, PAT successfully identifies experimentally confirmed target molecules for antibody generation. PAT also offers the pre-calculated results of 20,210 human proteins to accelerate common queries. PAT can therefore help to investigate large-scale structured domains and improve the success rate for synthetic antibody generation.
Albrecht, K. H.; Eicher, E. M.
1997-01-01
The Sry (sex determining region, Y chromosome) open reading frame from mice representing four species of the genus Mus was sequenced in an effort to understand the conditional dysfunction of some M. domesticus Sry alleles when present on the C57BL/6J inbred strain genetic background and to delimit the functionally important protein regions. Twenty-two Sry alleles were sequenced, most from wild-derived Y chromosomes, including 11 M. domesticus alleles, seven M. musculus alleles and two alleles each from the related species M. spicilegus and M. spretus. We found that the HMG domain (high mobility group DNA binding domain) and the unique regions are well conserved, while the glutamine repeat cluster (GRC) region is quite variable. No correlation was found between the predicted protein isoforms and the ability of a Sry allele to allow differentiation of ovarian tissue when on the C57BL/6J genetic background, strongly suggesting that the cause of this sex reversal is not the Sry protein itself, but rather the regulation of SRY expression. Furthermore, our interspecies sequence analysis provides compelling evidence that the M. musculus and M. domesticus SRY functional domain is contained in the first 143 amino acids, which includes the HMG domain and adjacent unique region (UR-2). PMID:9383069
1996-01-01
Mutations in the Caenorhabditis elegans gene unc-89 result in nematodes having disorganized muscle structure in which thick filaments are not organized into A-bands, and there are no M-lines. Beginning with a partial cDNA from the C. elegans sequencing project, we have cloned and sequenced the unc-89 gene. An unc-89 allele, st515, was found to contain an 84-bp deletion and a 10-bp duplication, resulting in an in- frame stop codon within predicted unc-89 coding sequence. Analysis of the complete coding sequence for unc-89 predicts a novel 6,632 amino acid polypeptide consisting of sequence motifs which have been implicated in protein-protein interactions. UNC-89 begins with 67 residues of unique sequences, SH3, dbl/CDC24, and PH domains, 7 immunoglobulins (Ig) domains, a putative KSP-containing multiphosphorylation domain, and ends with 46 Ig domains. A polyclonal antiserum raised to a portion of unc-89 encoded sequence reacts to a twitchin-sized polypeptide from wild type, but truncated polypeptides from st515 and from the amber allele e2338. By immunofluorescent microscopy, this antiserum localizes to the middle of A-bands, consistent with UNC-89 being a structural component of the M-line. Previous studies indicate that myofilament lattice assembly begins with positional cues laid down in the basement membrane and muscle cell membrane. We propose that the intracellular protein UNC-89 responds to these signals, localizes, and then participates in assembling an M-line. PMID:8603916
Conservation of tubulin-binding sequences in TRPV1 throughout evolution.
Sardar, Puspendu; Kumar, Abhishek; Bhandari, Anita; Goswami, Chandan
2012-01-01
Transient Receptor Potential Vanilloid sub type 1 (TRPV1), commonly known as capsaicin receptor can detect multiple stimuli ranging from noxious compounds, low pH, temperature as well as electromagnetic wave at different ranges. In addition, this receptor is involved in multiple physiological and sensory processes. Therefore, functions of TRPV1 have direct influences on adaptation and further evolution also. Availability of various eukaryotic genomic sequences in public domain facilitates us in studying the molecular evolution of TRPV1 protein and the respective conservation of certain domains, motifs and interacting regions that are functionally important. Using statistical and bioinformatics tools, our analysis reveals that TRPV1 has evolved about ∼420 million years ago (MYA). Our analysis reveals that specific regions, domains and motifs of TRPV1 has gone through different selection pressure and thus have different levels of conservation. We found that among all, TRP box is the most conserved and thus have functional significance. Our results also indicate that the tubulin binding sequences (TBS) have evolutionary significance as these stretch sequences are more conserved than many other essential regions of TRPV1. The overall distribution of positively charged residues within the TBS motifs is conserved throughout evolution. In silico analysis reveals that the TBS-1 and TBS-2 of TRPV1 can form helical structures and may play important role in TRPV1 function. Our analysis identifies the regions of TRPV1, which are important for structure-function relationship. This analysis indicates that tubulin binding sequence-1 (TBS-1) near the TRP-box forms a potential helix and the tubulin interactions with TRPV1 via TBS-1 have evolutionary significance. This interaction may be required for the proper channel function and regulation and may also have significance in the context of Taxol®-induced neuropathy.
Konami, Y; Yamamoto, K; Osawa, T; Irimura, T
1995-04-01
The complete amino acid sequence of a lactose-binding Cytisus sessilifolius anti-H(O) lectin II (CSA-II) was determined using a protein sequencer. After digestion of CSA-II with endoproteinase Lys-C or Asp-N, the resulting peptides were purified by reversed-phase high performance liquid chromatography (HPLC) and then subjected to sequence analysis. Comparison of the complete amino acid sequence of CSA-II with the sequences of other leguminous seed lectins revealed regions of extensive homology. The amino acid sequence of a putative carbohydrate-binding domain of CSA-II was found to be similar to those of several anti-H(O) leguminous lectins, especially to that of the L-fucose-binding Ulex europaeus lectin I (UEA-I).
Gaines, William A.; Marcotte, William R.
2010-01-01
Spider dragline silk is primarily composed of proteins called major ampullate spidroins (MaSp) that consist of a large repeat array flanked by non-repetitive N- and C-terminal domains. Until recently, there has been little evidence for more than one gene encoding each of the two major spidroin silk proteins, MaSp1 and MaSp2. Here, we report the deduced N-terminal domain sequences for two distinct MaSp1 genes from Nephila clavipes (MaSp1A and MaSp1B) and for MaSp2. All three MaSp genes are co-expressed in the major ampullate gland. A search of the GenBank database also revealed two distinct MaSp1 C-terminal domain sequences. Sequencing confirmed that both MaSp1 genes are present in all seven Nephila clavipes spiders examined. The presence of nucleotide polymorphisms in these genes confirmed that MaSp1A and MaSp1B are distinct genetic loci and not merely alleles of the same gene. We have experimentally determined the transcription start sites for all three MaSp genes and established preliminary pairing between the two MaSp1 N- and C-terminal domains. Phylogenetic analysis of these new sequences and other published MaSp N- and C-terminal domain sequences illustrated that duplications of MaSp genes may be widespread among spider species. PMID:18828837
Faure, Guilhem; Callebaut, Isabelle
2013-07-15
Describing domain architecture is a critical step in the functional characterization of proteins. However, some orphan domains do not match any profile stored in dedicated domain databases and are thereby difficult to analyze. We present here an original novel approach, called TREMOLO-HCA, for the analysis of orphan domain sequences and inspired from our experience in the use of Hydrophobic Cluster Analysis (HCA). Hidden relationships between protein sequences can be more easily identified from the PSI-BLAST results, using information on domain architecture, HCA plots and the conservation degree of amino acids that may participate in the protein core. This can lead to reveal remote relationships with known families of domains, as illustrated here with the identification of a hidden Tudor tandem in the human BAHCC1 protein and a hidden ET domain in the Saccharomyces cerevisiae Taf14p and human AF9 proteins. The results obtained in such a way are consistent with those provided by HHPRED, based on pairwise comparisons of HHMs. Our approach can, however, be applied even in absence of domain profiles or known 3D structures for the identification of novel families of domains. It can also be used in a reverse way for refining domain profiles, by starting from known protein domain families and identifying highly divergent members, hitherto considered as orphan. We provide a possible integration of this approach in an open TREMOLO-HCA package, which is fully implemented in python v2.7 and is available on request. Instructions are available at http://www.impmc.upmc.fr/∼callebau/tremolohca.html. isabelle.callebaut@impmc.upmc.fr Supplementary Data are available at Bioinformatics online.
Novel methodologies for spectral classification of exon and intron sequences
NASA Astrophysics Data System (ADS)
Kwan, Hon Keung; Kwan, Benjamin Y. M.; Kwan, Jennifer Y. Y.
2012-12-01
Digital processing of a nucleotide sequence requires it to be mapped to a numerical sequence in which the choice of nucleotide to numeric mapping affects how well its biological properties can be preserved and reflected from nucleotide domain to numerical domain. Digital spectral analysis of nucleotide sequences unfolds a period-3 power spectral value which is more prominent in an exon sequence as compared to that of an intron sequence. The success of a period-3 based exon and intron classification depends on the choice of a threshold value. The main purposes of this article are to introduce novel codes for 1-sequence numerical representations for spectral analysis and compare them to existing codes to determine appropriate representation, and to introduce novel thresholding methods for more accurate period-3 based exon and intron classification of an unknown sequence. The main findings of this study are summarized as follows: Among sixteen 1-sequence numerical representations, the K-Quaternary Code I offers an attractive performance. A windowed 1-sequence numerical representation (with window length of 9, 15, and 24 bases) offers a possible speed gain over non-windowed 4-sequence Voss representation which increases as sequence length increases. A winner threshold value (chosen from the best among two defined threshold values and one other threshold value) offers a top precision for classifying an unknown sequence of specified fixed lengths. An interpolated winner threshold value applicable to an unknown and arbitrary length sequence can be estimated from the winner threshold values of fixed length sequences with a comparable performance. In general, precision increases as sequence length increases. The study contributes an effective spectral analysis of nucleotide sequences to better reveal embedded properties, and has potential applications in improved genome annotation.
Insights into Hox protein function from a large scale combinatorial analysis of protein domains.
Merabet, Samir; Litim-Mecheri, Isma; Karlsson, Daniel; Dixit, Richa; Saadaoui, Mehdi; Monier, Bruno; Brun, Christine; Thor, Stefan; Vijayraghavan, K; Perrin, Laurent; Pradel, Jacques; Graba, Yacine
2011-10-01
Protein function is encoded within protein sequence and protein domains. However, how protein domains cooperate within a protein to modulate overall activity and how this impacts functional diversification at the molecular and organism levels remains largely unaddressed. Focusing on three domains of the central class Drosophila Hox transcription factor AbdominalA (AbdA), we used combinatorial domain mutations and most known AbdA developmental functions as biological readouts to investigate how protein domains collectively shape protein activity. The results uncover redundancy, interactivity, and multifunctionality of protein domains as salient features underlying overall AbdA protein activity, providing means to apprehend functional diversity and accounting for the robustness of Hox-controlled developmental programs. Importantly, the results highlight context-dependency in protein domain usage and interaction, allowing major modifications in domains to be tolerated without general functional loss. The non-pleoitropic effect of domain mutation suggests that protein modification may contribute more broadly to molecular changes underlying morphological diversification during evolution, so far thought to rely largely on modification in gene cis-regulatory sequences.
Insights into Hox Protein Function from a Large Scale Combinatorial Analysis of Protein Domains
Karlsson, Daniel; Dixit, Richa; Saadaoui, Mehdi; Monier, Bruno; Brun, Christine; Thor, Stefan; Vijayraghavan, K.; Perrin, Laurent; Pradel, Jacques; Graba, Yacine
2011-01-01
Protein function is encoded within protein sequence and protein domains. However, how protein domains cooperate within a protein to modulate overall activity and how this impacts functional diversification at the molecular and organism levels remains largely unaddressed. Focusing on three domains of the central class Drosophila Hox transcription factor AbdominalA (AbdA), we used combinatorial domain mutations and most known AbdA developmental functions as biological readouts to investigate how protein domains collectively shape protein activity. The results uncover redundancy, interactivity, and multifunctionality of protein domains as salient features underlying overall AbdA protein activity, providing means to apprehend functional diversity and accounting for the robustness of Hox-controlled developmental programs. Importantly, the results highlight context-dependency in protein domain usage and interaction, allowing major modifications in domains to be tolerated without general functional loss. The non-pleoitropic effect of domain mutation suggests that protein modification may contribute more broadly to molecular changes underlying morphological diversification during evolution, so far thought to rely largely on modification in gene cis-regulatory sequences. PMID:22046139
Sun, Chia-Tsen; Chiang, Austin W T; Hwang, Ming-Jing
2017-10-27
Proteome-scale bioinformatics research is increasingly conducted as the number of completely sequenced genomes increases, but analysis of protein domains (PDs) usually relies on similarity in their amino acid sequences and/or three-dimensional structures. Here, we present results from a bi-clustering analysis on presence/absence data for 6,580 unique PDs in 2,134 species with a sequenced genome, thus covering a complete set of proteins, for the three superkingdoms of life, Bacteria, Archaea, and Eukarya. Our analysis revealed eight distinctive PD clusters, which, following an analysis of enrichment of Gene Ontology functions and CATH classification of protein structures, were shown to exhibit structural and functional properties that are taxa-characteristic. For examples, the largest cluster is ubiquitous in all three superkingdoms, constituting a set of 1,472 persistent domains created early in evolution and retained in living organisms and characterized by basic cellular functions and ancient structural architectures, while an Archaea and Eukarya bi-superkingdom cluster suggests its PDs may have existed in the ancestor of the two superkingdoms, and others are single superkingdom- or taxa (e.g. Fungi)-specific. These results contribute to increase our appreciation of PD diversity and our knowledge of how PDs are used in species, yielding implications on species evolution.
Bhore, Subhash J; Kassim, Amelia; Loh, Chye Ying; Shah, Farida H
2010-01-01
It is well known that the nutritional quality of the American oil-palm (Elaeis oleifera) mesocarp oil is superior to that of African oil-palm (Elaeis guineensis Jacq. Tenera) mesocarp oil. Therefore, it is of important to identify the genetic features for its superior value. This could be achieved through the genome sequencing of the oil-palm. However, the genome sequence is not available in the public domain due to commercial secrecy. Hence, we constructed a cDNA library and generated expressed sequence tags (3,205) from the mesocarp tissue of the American oil-palm. We continued to annotate each of these cDNAs after submitting to GenBank/DDBJ/EMBL. A rough analysis turned our attention to the beta-carotene hydroxylase (Chyb) enzyme encoding cDNA. Then, we completed the full sequencing of cDNA clone for its both strands using M13 forward and reverse primers. The full nucleotide and protein sequence was further analyzed and annotated using various Bioinformatics tools. The analysis results showed the presence of fatty acid hydroxylase superfamily domain in the protein sequence. The multiple sequence alignment of selected Chyb amino acid sequences from other plant species and algal members with E. oleifera Chyb using ClustalW and its phylogenetic analysis suggest that Chyb from monocotyledonous plant species, Lilium hubrid, Crocus sativus and Zea mays are the most evolutionary related with E. oleifera Chyb. This study reports the annotation of E. oleifera Chyb. Abbreviations ESTs - expressed sequence tags, EoChyb - Elaeis oleifera beta-carotene hydroxylase, MC - main cluster PMID:21364789
Sequence analysis of DBL2β domain of vargene of Indonesian Plasmodium falciparum
NASA Astrophysics Data System (ADS)
Sulistyaningsih, E.; Romadhon, B. D.; Palupi, I.; Hidayah, F.; Dewi, R.; Prasetyo, A.
2018-03-01
Malaria is a major health problem in tropical countries including Indonesia. The most deadly agent is Plasmodium falciparum. In P. falciparum infection, PfEMP1 is supposed to play an important role in the pathogenesis of malaria. PfEMP1 is encoded by var gene family, it is a polymorphic protein where the extra-cellular portion contains of three distinct binding domains: Duffy binding-like (DBL), Cysteine-rich interdomain regions (CIDR) and C2. PfEMP1 varies in domain composition and binding specificity. The study explored the characteristic of Indonesian DBL2β-var genes and investigated its role to the malaria outcome. Twenty blood samples from clinically mild to severe malaria patients in Jember, East Java were collected for DNA extraction. Diagnosis was confirmed by Giemsa-stained thick blood smear. PCR was conducted using specific primer targeting on the full-length of DBL2ß and resulted approximately single band of 1,7 kb in a sample. This band was observed only from severe malaria sample. Sequence analysis directly from PCR product showed 74-99% similarities with previous sequences in Gene Bank. In conclusion, the DBL2β domain of vargene of Indonesian isolates was 1603 nucleotides in length and there was a possible association of the existence of DBL2β domain with the severity of malaria outcome.
Munde, Manoj; Poon, Gregory M. K.; Wilson, W. David
2013-01-01
Members of the ETS family of transcription factors regulate a functionally diverse array of genes. All ETS proteins share a structurally-conserved but sequence-divergent DNA-binding domain, known as the ETS domain. Although the structure and thermodynamics of the ETS-DNA complexes are well known, little is known about the kinetics of sequence recognition, a facet that offers potential insight into its molecular mechanism. We have characterized DNA binding by the ETS domain of PU.1 by biosensor-surface plasmon resonance (SPR). SPR analysis revealed a striking kinetic profile for DNA binding by the PU.1 ETS domain. At low salt concentrations, it binds high-affinity cognate DNA with a very slow association rate constant (≤105 M−1 s−1), compensated by a correspondingly small dissociation rate constant. The kinetics are strongly salt-dependent but mutually balance to produce a relatively weak dependence in the equilibrium constant. This profile contrasts sharply with reported data for other ETS domains (e.g., Ets-1, TEL) for which high-affinity binding is driven by rapid association (>107 M−1 s−1). We interpret this difference in terms of the hydration properties of ETS-DNA binding and propose that at least two mechanisms of sequence recognition are employed by this family of DNA-binding domain. Additionally, we use SPR to demonstrate the potential for pharmacological inhibition of sequence-specific ETS-DNA binding, using the minor groove-binding distamycin as a model compound. Our work establishes SPR as a valuable technique for extending our understanding of the molecular mechanisms of ETS-DNA interactions as well as developing potential small-molecule agents for biotechnological and therapeutic purposes. PMID:23416556
Molecular Cloning of Drebrin: Progress and Perspectives.
Kojima, Nobuhiko
2017-01-01
Chicken drebrin isoforms were first identified in the optic tectum of developing brain. Although the time course of protein expression was different in each drebrin isoform, the similarity between their protein structures was suggested by biochemical analysis of purified protein. To determine their protein structures, the cloning of drebrin cDNAs was conducted. Comparison between the cDNA sequences shows that all drebrin cDNAs are identical except that the internal insertion sequences are present or absent in their sequences. Chicken drebrin are now classified into three isoforms, namely, drebrins E1, E2, and A. Genomic cloning demonstrated that the three isoforms are generated by an alternative splicing of individual exons encoding the insertion sequences from single drebrin gene. The mechanism should be precisely regulated in cell-type-specific and developmental stage-specific fashion. Drebrin protein, which is well conserved in various vertebrate species, although mammalian drebrin has only two isoforms, namely, drebrin E and drebrin A, is different from chicken drebrin that has three isoforms. Drebrin belongs to an actin-depolymerizing factor homology (ADF-H) domain protein family. Besides the ADF-H domain, drebrin has other domains, including the actin-binding domain and Homer-binding motifs. Diversity of protein isoform and multiple domains of drebrin could interact differentially with the actin cytoskeleton and other intracellular proteins and regulate diverse cellular processes.
The Replication Focus Targeting Sequence (RFTS) Domain Is a DNA-competitive Inhibitor of Dnmt1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Syeda, Farisa; Fagan, Rebecca L.; Wean, Matthew
Dnmt1 (DNA methyltransferase 1) is the principal enzyme responsible for maintenance of cytosine methylation at CpG dinucleotides in the mammalian genome. The N-terminal replication focus targeting sequence (RFTS) domain of Dnmt1 has been implicated in subcellular localization, protein association, and catalytic function. However, progress in understanding its function has been limited by the lack of assays for and a structure of this domain. Here, we show that the naked DNA- and polynucleosome-binding activities of Dnmt1 are inhibited by the RFTS domain, which functions by virtue of binding the catalytic domain to the exclusion of DNA. Kinetic analysis with a fluorogenicmore » DNA substrate established the RFTS domain as a 600-fold inhibitor of Dnmt1 enzymatic activity. The crystal structure of the RFTS domain reveals a novel fold and supports a mechanism in which an RFTS-targeted Dnmt1-binding protein, such as Uhrf1, may activate Dnmt1 for DNA binding.« less
Comprehensive comparative analysis of kinesins in photosynthetic eukaryotes
Richardson, Dale N; Simmons, Mark P; Reddy, Anireddy SN
2006-01-01
Background Kinesins, a superfamily of molecular motors, use microtubules as tracks and transport diverse cellular cargoes. All kinesins contain a highly conserved ~350 amino acid motor domain. Previous analysis of the completed genome sequence of one flowering plant (Arabidopsis) has resulted in identification of 61 kinesins. The recent completion of genome sequencing of several photosynthetic and non-photosynthetic eukaryotes that belong to divergent lineages offers a unique opportunity to conduct a comprehensive comparative analysis of kinesins in plant and non-plant systems and infer their evolutionary relationships. Results We used the kinesin motor domain to identify kinesins in the completed genome sequences of 19 species, including 13 newly sequenced genomes. Among the newly analyzed genomes, six represent photosynthetic eukaryotes. A total of 529 kinesins was used to perform comprehensive analysis of kinesins and to construct gene trees using the Bayesian and parsimony approaches. The previously recognized 14 families of kinesins are resolved as distinct lineages in our inferred gene tree. At least three of the 14 kinesin families are not represented in flowering plants. Chlamydomonas, a green alga that is part of the lineage that includes land plants, has at least nine of the 14 known kinesin families. Seven of ten families present in flowering plants are represented in Chlamydomonas, indicating that these families were retained in both the flowering-plant and green algae lineages. Conclusion The increase in the number of kinesins in flowering plants is due to vast expansion of the Kinesin-14 and Kinesin-7 families. The Kinesin-14 family, which typically contains a C-terminal motor, has many plant kinesins that have the motor domain at the N terminus, in the middle, or the C terminus. Several domains in kinesins are present exclusively either in plant or animal lineages. Addition of novel domains to kinesins in lineage-specific groups contributed to the functional diversification of kinesins. Results from our gene-tree analyses indicate that there was tremendous lineage-specific duplication and diversification of kinesins in eukaryotes. Since the functions of only a few plant kinesins are reported in the literature, this comprehensive comparative analysis will be useful in designing functional studies with photosynthetic eukaryotes. PMID:16448571
Ievlev, Anton; Kalinin, Sergei V.
2015-05-28
Ferroelectric materials are broadly considered for information storage due to extremely high storage and information processing densities they enable. To date, ferroelectric based data storage has invariably relied on formation of cylindrical domains, allowing for binary information encoding. Here we demonstrate and explore the potential of high-density encoding based on domain morphology. We explore the domain morphogenesis during the tip-induced polarization switching by sequences of positive and negative pulses in a lithium niobate single-crystal and demonstrate the principal of information coding by shape and size of the domains. We applied cross-correlation and neural network approaches for recognition of the switchingmore » sequence by the shape of the resulting domains and establish optimal parameters for domain shape recognition. These studies both provide insight into the highly non-trivial mechanism of domain switching and potentially establish a new paradigm for multilevel information storage and content retrieval memories. Furthermore, this approach opens a pathway to exploration of domain switching mechanisms via shape analysis.« less
Tu, Z; Hagedorn, H H
1997-02-01
Pyruvate carboxylase (PC, pyruvate: carbon dioxide ligase [ADP-forming], EC 6.4.1.1) was purified from the yellow fever mosquito, Aedes aegypti. The purified PC showed two polypeptides of similar M(r) (133 and 128 k). The N-terminal sequences of both polypeptides were shown to be very similar, if not identical. A polyclonal antiserum against the 133 kDa polypeptide cross-reacted strongly with the 128 kDa polypeptide. PC was found in all tissues examined. Using a semi-quantitative Western blot assay, PC was shown to be concentrated in the indirect flight muscles and fat body preparations. The ratios of the 133 to 128 kDa polypeptides were shown to differ in various tissues and an Aedes albopictus cell line. The indirect flight muscle was the only tissue in which the 128 kDa polypeptide was more abundant, while both the midgut and the cell line showed almost exclusively the 133 kDa polypeptide. Both peptides were present in varying amounts in brain, malpighian tubule, ovary and fat body preparation. The two isoforms of PC could play different roles in the flight muscle and other tissues. Clones covering a complete cDNA of PC of A. aegypti were obtained using a directional approach. The 3952 bp nucleotide sequence, including a 3585 bp coding region, was determined from these cDNA clones. The deduced 1195 amino acid sequence has a calculated M(r) of 132,200. A putative mitochondrial targeting sequence was determined by comparing the deduced amino acid sequence to the N-terminal sequences of the mature protein. The presence of a mitochondrial targeting sequence indicates that the mosquito PC encoded by the cloned cDNA may be localized in the mitochondria. After the targeting sequence, three functional domains were identified in the following order; biotin carboxylase (BC), carboxyltransferase (CT) and biotin carboxyl carrier protein (BCCP). The mosquito PC showed very high similarity to PCs from other sources (55.1-75.2% identity). Genomic Southern analysis indicated that there could be two similar PC genes or a single PC gene with allelic polymorphism in the A. aegypti genome. The evolutionary relationship of PCs among different organisms was consistent with the accepted evolutionary relationship of their host organisms. The evolution of the domain structures of the biotin-dependent carboxylases including PC was also investigated. This analysis indicates that biotin-dependent carboxylases evolved from a common origin. The analysis also provides evidence for early gene duplication events that shaped the family of biotin-dependent carboxylases. Clear evidence for the coevolution of BC and BCCP domains is presented, although they are associated with very different CT domains and the relative position of the three functional domains varies between members of the biotin-dependent carboxylases.
Transcriptomics of the Bed Bug (Cimex lectularius)
Rajarapu, Swapna P.; Jones, Susan C.; Mittapalli, Omprakash
2011-01-01
Background Bed bugs (Cimex lectularius) are blood-feeding insects poised to become one of the major pests in households throughout the United States. Resistance of C. lectularius to insecticides/pesticides is one factor thought to be involved in its sudden resurgence. Despite its high-impact status, scant knowledge exists at the genomic level for C. lectularius. Hence, we subjected the C. lectularius transcriptome to 454 pyrosequencing in order to identify potential genes involved in pesticide resistance. Methodology and Principal Findings Using 454 pyrosequencing, we obtained a total of 216,419 reads with 79,596,412 bp, which were assembled into 35,646 expressed sequence tags (3902 contigs and 31744 singletons). Nearly 85.9% of the C. lectularius sequences showed similarity to insect sequences, but 44.8% of the deduced proteins of C. lectularius did not show similarity with sequences in the GenBank non-redundant database. KEGG analysis revealed putative members of several detoxification pathways involved in pesticide resistance. Lamprin domains, Protein Kinase domains, Protein Tyrosine Kinase domains and cytochrome P450 domains were among the top Pfam domains predicted for the C. lectularius sequences. An initial assessment of putative defense genes, including a cytochrome P450 and a glutathione-S-transferase (GST), revealed high transcript levels for the cytochrome P450 (CYP9) in pesticide-exposed versus pesticide-susceptible C. lectularius populations. A significant number of single nucleotide polymorphisms (296) and microsatellite loci (370) were predicted in the C. lectularius sequences. Furthermore, 59 putative sequences of Wolbachia were retrieved from the database. Conclusions To our knowledge this is the first study to elucidate the genetic makeup of C. lectularius. This pyrosequencing effort provides clues to the identification of potential detoxification genes involved in pesticide resistance of C. lectularius and lays the foundation for future functional genomics studies. PMID:21283830
Cerenius, Lage; Liu, Haipeng; Zhang, Yanjiao; Rimphanitchayakit, Vichien; Tassanakajon, Anchalee; Gunnar Andersson, M; Söderhäll, Kenneth; Söderhäll, Irene
2010-01-01
Crustacean hemocytes were found to produce a large number of transcripts coding for Kazal-type proteinase inhibitors (KPIs). A detailed study performed with the crayfish Pacifastacus leniusculus and the shrimp Penaeus monodon revealed the presence of at least 26 and 20 different Kazal domains from the hemocyte KPIs, respectively. Comparisons with KPIs from other taxa indicate that the sequences of these domains evolve rapidly. A few conserved positions, e.g. six invariant cysteines were present in all domain sequences whereas the position of P1 amino acid, a determinant for substrate specificity, varied highly. A study with a single crayfish animal suggested that even at the individual level considerable sequence variability among hemocyte KPIs produced exist. Expression analysis of four crayfish KPI transcripts in hematopoietic tissue cells and different hemocyte types suggest that some of these KPIs are likely to be involved in hematopoiesis or hemocyte release as they were produced in particular hemocyte types or maturation stages only.
Casillas, Rosario; Tabernero, David; Gregori, Josep; Belmonte, Irene; Cortese, Maria Francesca; González, Carolina; Riveiro-Barciela, Mar; López, Rosa Maria; Quer, Josep; Esteban, Rafael; Buti, Maria; Rodríguez-Frías, Francisco
2018-01-01
AIM To determine the variability/conservation of the domain of hepatitis B virus (HBV) preS1 region that interacts with sodium-taurocholate cotransporting polypeptide (hereafter, NTCP-interacting domain) and the prevalence of the rs2296651 polymorphism (S267F, NTCP variant) in a Spanish population. METHODS Serum samples from 246 individuals were included and divided into 3 groups: patients with chronic HBV infection (CHB) (n = 41, 73% Caucasians), patients with resolved HBV infection (n = 100, 100% Caucasians) and an HBV-uninfected control group (n = 105, 100% Caucasians). Variability/conservation of the amino acid (aa) sequences of the NTCP-interacting domain, (aa 2-48 in viral genotype D) and a highly conserved preS1 domain associated with virion morphogenesis (aa 92-103 in viral genotype D) were analyzed by next-generation sequencing and compared in 18 CHB patients with viremia > 4 log IU/mL. The rs2296651 polymorphism was determined in all individuals in all 3 groups using an in-house real-time PCR melting curve analysis. RESULTS The HBV preS1 NTCP-interacting domain showed a high degree of conservation among the examined viral genomes especially between aa 9 and 21 (in the genotype D consensus sequence). As compared with the virion morphogenesis domain, the NTCP-interacting domain had a smaller proportion of HBV genotype-unrelated changes comprising > 1% of the quasispecies (25.5% vs 31.8%), but a larger proportion of genotype-associated viral polymorphisms (34% vs 27.3%), according to consensus sequences from GenBank patterns of HBV genotypes A to H. Variation/conservation in both domains depended on viral genotype, with genotype C being the most highly conserved and genotype E the most variable (limited finding, only 2 genotype E included). Of note, proline residues were highly conserved in both domains, and serine residues showed changes only to threonine or tyrosine in the virion morphogenesis domain. The rs2296651 polymorphism was not detected in any participant. CONCLUSION In our CHB population, the NTCP-interacting domain was highly conserved, particularly the proline residues and essential amino acids related with the NTCP interaction, and the prevalence of rs2296651 was low/null. PMID:29456407
Characterization of Urtica dioica agglutinin isolectins and the encoding gene family.
Does, M P; Ng, D K; Dekker, H L; Peumans, W J; Houterman, P M; Van Damme, E J; Cornelissen, B J
1999-01-01
Urtica dioica agglutinin (UDA) has previously been found in roots and rhizomes of stinging nettles as a mixture of UDA-isolectins. Protein and cDNA sequencing have shown that mature UDA is composed of two hevein domains and is processed from a precursor protein. The precursor contains a signal peptide, two in-tandem hevein domains, a hinge region and a carboxyl-terminal chitinase domain. Genomic fragments encoding precursors for UDA-isolectins have been amplified by five independent polymerase chain reactions on genomic DNA from stinging nettle ecotype Weerselo. One amplified gene was completely sequenced. As compared to the published cDNA sequence, the genomic sequence contains, besides two basepair substitutions, two introns located at the same positions as in other plant chitinases. By partial sequence analysis of 40 amplified genes, 16 different genes were identified which encode seven putative UDA-isolectins. The deduced amino acid sequences share 78.9-98.9% identity. In extracts of roots and rhizomes of stinging nettle ecotype Weerselo six out of these seven isolectins were detected by mass spectrometry. One of them is an acidic form, which has not been identified before. Our results demonstrate that UDA is encoded by a large gene family.
Pan-Cancer Analysis of Mutation Hotspots in Protein Domains.
Miller, Martin L; Reznik, Ed; Gauthier, Nicholas P; Aksoy, Bülent Arman; Korkut, Anil; Gao, Jianjiong; Ciriello, Giovanni; Schultz, Nikolaus; Sander, Chris
2015-09-23
In cancer genomics, recurrence of mutations in independent tumor samples is a strong indicator of functional impact. However, rare functional mutations can escape detection by recurrence analysis owing to lack of statistical power. We enhance statistical power by extending the notion of recurrence of mutations from single genes to gene families that share homologous protein domains. Domain mutation analysis also sharpens the functional interpretation of the impact of mutations, as domains more succinctly embody function than entire genes. By mapping mutations in 22 different tumor types to equivalent positions in multiple sequence alignments of domains, we confirm well-known functional mutation hotspots, identify uncharacterized rare variants in one gene that are equivalent to well-characterized mutations in another gene, detect previously unknown mutation hotspots, and provide hypotheses about molecular mechanisms and downstream effects of domain mutations. With the rapid expansion of cancer genomics projects, protein domain hotspot analysis will likely provide many more leads linking mutations in proteins to the cancer phenotype. Copyright © 2015 Elsevier Inc. All rights reserved.
Voels, Brent; Wang, Liping; Sens, Donald A; Garrett, Scott H; Zhang, Ke; Somji, Seema
2017-05-25
The 3rd isoform of the metallothionein (MT3) gene family has been shown to be overexpressed in most ductal breast cancers. A previous study has shown that the stable transfection of MCF-7 cells with the MT3 gene inhibits cell growth. The goal of the present study was to determine the role of the unique C-terminal and N-terminal sequences of MT3 on phenotypic properties and gene expression profiles of MCF-7 cells. MCF-7 cells were transfected with various metallothionein gene constructs which contain the insertion or the removal of the unique MT3 C- and N-terminal domains. Global gene expression analysis was performed on the MCF-7 cells containing the various constructs and the expression of the unique C- and N- terminal domains of MT3 was correlated to phenotypic properties of the cells. The results of the present study demonstrate that the C-terminal sequence of MT3, in the absence of the N-terminal sequence, induces dome formation in MCF-7 cells, which in cell cultures is the phenotypic manifestation of a cell's ability to perform vectorial active transport. Global gene expression analysis demonstrated that the increased expression of the GAGE gene family correlated with dome formation. Expression of the C-terminal domain induced GAGE gene expression, whereas the N-terminal domain inhibited GAGE gene expression and that the effect of the N-terminal domain inhibition was dominant over the C-terminal domain of MT3. Transfection with the metallothionein 1E gene increased the expression of GAGE genes. In addition, both the C- and the N-terminal sequences of the MT3 gene had growth inhibitory properties, which correlated to an increased expression of the interferon alpha-inducible protein 6. Our study shows that the C-terminal domain of MT3 confers dome formation in MCF-7 cells and the presence of this domain induces expression of the GAGE family of genes. The differential effects of MT3 and metallothionein 1E on the expression of GAGE genes suggests unique roles of these genes in the development and progression of breast cancer. The finding that interferon alpha-inducible protein 6 expression is associated with the ability of MT3 to inhibit growth needs further investigation.
Structural and sequencing analysis of local target DNA recognition by MLV integrase.
Aiyer, Sriram; Rossi, Paolo; Malani, Nirav; Schneider, William M; Chandar, Ashwin; Bushman, Frederic D; Montelione, Gaetano T; Roth, Monica J
2015-06-23
Target-site selection by retroviral integrase (IN) proteins profoundly affects viral pathogenesis. We describe the solution nuclear magnetic resonance structure of the Moloney murine leukemia virus IN (M-MLV) C-terminal domain (CTD) and a structural homology model of the catalytic core domain (CCD). In solution, the isolated MLV IN CTD adopts an SH3 domain fold flanked by a C-terminal unstructured tail. We generated a concordant MLV IN CCD structural model using SWISS-MODEL, MMM-tree and I-TASSER. Using the X-ray crystal structure of the prototype foamy virus IN target capture complex together with our MLV domain structures, residues within the CCD α2 helical region and the CTD β1-β2 loop were predicted to bind target DNA. The role of these residues was analyzed in vivo through point mutants and motif interchanges. Viable viruses with substitutions at the IN CCD α2 helical region and the CTD β1-β2 loop were tested for effects on integration target site selection. Next-generation sequencing and analysis of integration target sequences indicate that the CCD α2 helical region, in particular P187, interacts with the sequences distal to the scissile bonds whereas the CTD β1-β2 loop binds to residues proximal to it. These findings validate our structural model and disclose IN-DNA interactions relevant to target site selection. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Pudupakam, Raghavendra Sumanth; Raghunath, Shobana; Pudupakam, Meghanath; Daggupati, Sreenivasulu
2017-03-01
Sequence analysis and phylogenetic studies based on non-structural protein-3 (NS3) gene are important in understanding the evolution and epidemiology of bluetongue virus (BTV). This study was aimed at characterizing the NS3 gene sequence of Indian BTV serotype-2 (BTV2) to elucidate its genetic relationship to global BTV isolates. The NS3 gene of BTV2 was amplified from infected BHK-21 cell cultures, cloned and subjected to sequence analysis. The generated NS3 gene sequence was compared with the corresponding sequences of different BTV serotypes across the world, and a phylogenetic relationship was established. The NS3 gene of BTV2 showed moderate levels of variability in comparison to different BTV serotypes, with nucleotide sequence identities ranging from 81% to 98%. The region showed high sequence homology of 93-99% at amino acid level with various BTV serotypes. The PPXY/PTAP late domain motifs, glycosylation sites, hydrophobic domains, and the amino acid residues critical for virus-host interactions were conserved in NS3 protein. Phylogenetic analysis revealed that BTV isolates segregate into four topotypes and that the Indian BTV2 in subclade IA is closely related to Asian and Australian origin strains. Analysis of the NS3 gene indicated that Indian BTV2 isolate is closely related to strains from Asia and Australia, suggesting a common origin of infection. Although the pattern of evolution of BTV2 isolate is different from other global isolates, the deduced amino acid sequence of NS3 protein demonstrated high molecular stability.
Pudupakam, Raghavendra Sumanth; Raghunath, Shobana; Pudupakam, Meghanath; Daggupati, Sreenivasulu
2017-01-01
Aim: Sequence analysis and phylogenetic studies based on non-structural protein-3 (NS3) gene are important in understanding the evolution and epidemiology of bluetongue virus (BTV). This study was aimed at characterizing the NS3 gene sequence of Indian BTV serotype-2 (BTV2) to elucidate its genetic relationship to global BTV isolates. Materials and Methods: The NS3 gene of BTV2 was amplified from infected BHK-21 cell cultures, cloned and subjected to sequence analysis. The generated NS3 gene sequence was compared with the corresponding sequences of different BTV serotypes across the world, and a phylogenetic relationship was established. Results: The NS3 gene of BTV2 showed moderate levels of variability in comparison to different BTV serotypes, with nucleotide sequence identities ranging from 81% to 98%. The region showed high sequence homology of 93-99% at amino acid level with various BTV serotypes. The PPXY/PTAP late domain motifs, glycosylation sites, hydrophobic domains, and the amino acid residues critical for virus-host interactions were conserved in NS3 protein. Phylogenetic analysis revealed that BTV isolates segregate into four topotypes and that the Indian BTV2 in subclade IA is closely related to Asian and Australian origin strains. Conclusion: Analysis of the NS3 gene indicated that Indian BTV2 isolate is closely related to strains from Asia and Australia, suggesting a common origin of infection. Although the pattern of evolution of BTV2 isolate is different from other global isolates, the deduced amino acid sequence of NS3 protein demonstrated high molecular stability. PMID:28435199
Serine protease-related proteins in the malaria mosquito, Anopheles gambiae.
Cao, Xiaolong; Gulati, Mansi; Jiang, Haobo
2017-09-01
Insect serine proteases (SPs) and serine protease homologs (SPHs) participate in digestion, defense, development, and other physiological processes. In mosquitoes, some clip-domain SPs and SPHs (i.e. CLIPs) have been investigated for possible roles in antiparasitic responses. In a recent test aimed at improving quality of gene models in the Anopheles gambiae genome using RNA-seq data, we observed various discrepancies between gene models in AgamP4.5 and corresponding sequences selected from those modeled by Cufflinks, Trinity and Bridger. Here we report a comparative analysis of the 337 SP-related proteins in A. gambiae by examining their domain structures, sequence diversity, chromosomal locations, and expression patterns. One hundred and ten CLIPs contain 1 to 5 clip domains in addition to their protease domains (PDs) or non-catalytic, protease-like domains (PLDs). They are divided into five subgroups: CLIPAs (22) are clip 1-5 -PLD; CLIPBs (29), CLIPCs (12) and CLIPDs (14) are mainly clip-PD; most CLIPEs (33) have a domain structure of PD/PLD-PLD-clip-PLD 0-1 . While expression of the CLIP genes in group-1 is generally low and detected in various tissue- and stage-specific RNA-seq libraries, some putative GPs/GPHs (i.e. single domain gut SPs/SPHs) in group-2 are highly expressed in midgut, whole larva or whole adult libraries. In comparison, 46 SPs, 26 SPHs, and 37 multi-domain SPs/SPHs (i.e. PD/PLD-PLD ≥1 ) in group-3 do not seem to be specifically expressed in digestive tract. There are 16 SPs and 2 SPH containing other types of putative regulatory domains (e.g. LDLa, CUB, Gd). Of the 337 SP and SPH genes, 159 were sorted into 46 groups (2-8 members/group) based on similar phylogenetic tree position, chromosomal location, and expression profile. This information and analysis, including improved gene models and protein sequences, constitute a solid foundation for functional analysis of the SP-related proteins in A. gambiae. Copyright © 2017 Elsevier Ltd. All rights reserved.
Evolutionary versatility of eukaryotic protein domains revealed by their bigram networks
2011-01-01
Background Protein domains are globular structures of independently folded polypeptides that exert catalytic or binding activities. Their sequences are recognized as evolutionary units that, through genome recombination, constitute protein repertoires of linkage patterns. Via mutations, domains acquire modified functions that contribute to the fitness of cells and organisms. Recent studies have addressed the evolutionary selection that may have shaped the functions of individual domains and the emergence of particular domain combinations, which led to new cellular functions in multi-cellular animals. This study focuses on modeling domain linkage globally and investigates evolutionary implications that may be revealed by novel computational analysis. Results A survey of 77 completely sequenced eukaryotic genomes implies a potential hierarchical and modular organization of biological functions in most living organisms. Domains in a genome or multiple genomes are modeled as a network of hetero-duplex covalent linkages, termed bigrams. A novel computational technique is introduced to decompose such networks, whereby the notion of domain "networking versatility" is derived and measured. The most and least "versatile" domains (termed "core domains" and "peripheral domains" respectively) are examined both computationally via sequence conservation measures and experimentally using selected domains. Our study suggests that such a versatility measure extracted from the bigram networks correlates with the adaptivity of domains during evolution, where the network core domains are highly adaptive, significantly contrasting the network peripheral domains. Conclusions Domain recombination has played a major part in the evolution of eukaryotes attributing to genome complexity. From a system point of view, as the results of selection and constant refinement, networks of domain linkage are structured in a hierarchical modular fashion. Domains with high degree of networking versatility appear to be evolutionary adaptive, potentially through functional innovations. Domain bigram networks are informative as a model of biological functions. The networking versatility indices extracted from such networks for individual domains reflect the strength of evolutionary selection that the domains have experienced. PMID:21849086
Evolutionary versatility of eukaryotic protein domains revealed by their bigram networks.
Xie, Xueying; Jin, Jing; Mao, Yongyi
2011-08-18
Protein domains are globular structures of independently folded polypeptides that exert catalytic or binding activities. Their sequences are recognized as evolutionary units that, through genome recombination, constitute protein repertoires of linkage patterns. Via mutations, domains acquire modified functions that contribute to the fitness of cells and organisms. Recent studies have addressed the evolutionary selection that may have shaped the functions of individual domains and the emergence of particular domain combinations, which led to new cellular functions in multi-cellular animals. This study focuses on modeling domain linkage globally and investigates evolutionary implications that may be revealed by novel computational analysis. A survey of 77 completely sequenced eukaryotic genomes implies a potential hierarchical and modular organization of biological functions in most living organisms. Domains in a genome or multiple genomes are modeled as a network of hetero-duplex covalent linkages, termed bigrams. A novel computational technique is introduced to decompose such networks, whereby the notion of domain "networking versatility" is derived and measured. The most and least "versatile" domains (termed "core domains" and "peripheral domains" respectively) are examined both computationally via sequence conservation measures and experimentally using selected domains. Our study suggests that such a versatility measure extracted from the bigram networks correlates with the adaptivity of domains during evolution, where the network core domains are highly adaptive, significantly contrasting the network peripheral domains. Domain recombination has played a major part in the evolution of eukaryotes attributing to genome complexity. From a system point of view, as the results of selection and constant refinement, networks of domain linkage are structured in a hierarchical modular fashion. Domains with high degree of networking versatility appear to be evolutionary adaptive, potentially through functional innovations. Domain bigram networks are informative as a model of biological functions. The networking versatility indices extracted from such networks for individual domains reflect the strength of evolutionary selection that the domains have experienced.
Alcántara, Cristina; Sarmiento-Rubiano, Luz Adriana; Monedero, Vicente; Deutscher, Josef; Pérez-Martínez, Gaspar; Yebra, María J.
2008-01-01
Sequence analysis of the five genes (gutRMCBA) downstream from the previously described sorbitol-6-phosphate dehydrogenase-encoding Lactobacillus casei gutF gene revealed that they constitute a sorbitol (glucitol) utilization operon. The gutRM genes encode putative regulators, while the gutCBA genes encode the EIIC, EIIBC, and EIIA proteins of a phosphoenolpyruvate-dependent sorbitol phosphotransferase system (PTSGut). The gut operon is transcribed as a polycistronic gutFRMCBA messenger, the expression of which is induced by sorbitol and repressed by glucose. gutR encodes a transcriptional regulator with two PTS-regulated domains, a galactitol-specific EIIB-like domain (EIIBGat domain) and a mannitol/fructose-specific EIIA-like domain (EIIAMtl domain). Its inactivation abolished gut operon transcription and sorbitol uptake, indicating that it acts as a transcriptional activator. In contrast, cells carrying a gutB mutation expressed the gut operon constitutively, but they failed to transport sorbitol, indicating that EIIBCGut negatively regulates GutR. A footprint analysis showed that GutR binds to a 35-bp sequence upstream from the gut promoter. A sequence comparison with the presumed promoter region of gut operons from various firmicutes revealed a GutR consensus motif that includes an inverted repeat. The regulation mechanism of the L. casei gut operon is therefore likely to be operative in other firmicutes. Finally, gutM codes for a conserved protein of unknown function present in all sequenced gut operons. A gutM mutant, the first constructed in a firmicute, showed drastically reduced gut operon expression and sorbitol uptake, indicating a regulatory role also for GutM. PMID:18676710
GFam: a platform for automatic annotation of gene families.
Sasidharan, Rajkumar; Nepusz, Tamás; Swarbreck, David; Huala, Eva; Paccanaro, Alberto
2012-10-01
We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam's capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.
Kathiravan, P; Goyal, S; Kataria, R S; Mishra, B P; Jayakumar, S; Joshi, B K
2011-01-01
The present study was undertaken to characterize the structure of S100A8 gene and its promoter in water buffalo and yak. Sequence data of 2.067 kb, 2.071 kb, and 2.052 kb with respect to complete S100A8 gene including 5' flanking region was generated in river buffalo, swamp buffalo, and yak, respectively. BLAST analysis of coding DNA sequences (CDS) of S100A8 gene revealed 95% homology of buffalo sequence with cattle, 85% with pig and horse, 83% with dog, 72-73% with murines, and around 79% with primates and humans. Phylogenetic analysis of predicted CDS revealed distinct clustering of murines, primates, and domestic animals with bovines and bubalines forming a subcluster among farm animals. In silico translation of predicted CDS revealed a sequence of 89 amino acids with 7 amino acid changes between cattle and buffalo and 2 changes between cattle and yak. The search for Pfam family revealed the N-terminal calcium binding domain and the noncanonical EF hand domain in the carboxy terminus, with more variations being observed in the N-terminal domain among different species. Two amino acid changes observed in carboxy terminal EF hand domain resulted in altered secondary structure of yak S100A8 protein. Analysis of S100A8 gene promoter revealed 14 putative motifs for transcriptional factor binding sites. Two putative motifs viz. C/EBP and v-Myb were found to be absent in swamp buffalo as compared to river buffalo and cattle. Differences in the structure of S100A8 protein and the transcriptional factor binding sites identified in the present study need to be analyzed further for their functional significance in yak and swamp buffalo respectively. Copyright © Taylor & Francis Group, LLC
Rybarczyk-Mydłowska, Katarzyna; Maboreke, Hazel Ruvimbo; van Megen, Hanny; van den Elsen, Sven; Mooyman, Paul; Smant, Geert; Bakker, Jaap; Helder, Johannes
2012-11-21
Plant parasitic nematodes are unusual Metazoans as they are equipped with genes that allow for symbiont-independent degradation of plant cell walls. Among the cell wall-degrading enzymes, glycoside hydrolase family 5 (GHF5) cellulases are relatively well characterized, especially for high impact parasites such as root-knot and cyst nematodes. Interestingly, ancestors of extant nematodes most likely acquired these GHF5 cellulases from a prokaryote donor by one or multiple lateral gene transfer events. To obtain insight into the origin of GHF5 cellulases among evolutionary advanced members of the order Tylenchida, cellulase biodiversity data from less distal family members were collected and analyzed. Single nematodes were used to obtain (partial) genomic sequences of cellulases from representatives of the genera Meloidogyne, Pratylenchus, Hirschmanniella and Globodera. Combined Bayesian analysis of ≈ 100 cellulase sequences revealed three types of catalytic domains (A, B, and C). Represented by 84 sequences, type B is numerically dominant, and the overall topology of the catalytic domain type shows remarkable resemblance with trees based on neutral (= pathogenicity-unrelated) small subunit ribosomal DNA sequences. Bayesian analysis further suggested a sister relationship between the lesion nematode Pratylenchus thornei and all type B cellulases from root-knot nematodes. Yet, the relationship between the three catalytic domain types remained unclear. Superposition of intron data onto the cellulase tree suggests that types B and C are related, and together distinct from type A that is characterized by two unique introns. All Tylenchida members investigated here harbored one or multiple GHF5 cellulases. Three types of catalytic domains are distinguished, and the presence of at least two types is relatively common among plant parasitic Tylenchida. Analysis of coding sequences of cellulases suggests that root-knot and cyst nematodes did not acquire this gene directly by lateral genes transfer. More likely, these genes were passed on by ancestors of a family nowadays known as the Pratylenchidae.
The Metarhizium anisopliae trp1 gene: cloning and regulatory analysis.
Staats, Charley Christian; Silva, Marcia Suzana Nunes; Pinto, Paulo Marcos; Vainstein, Marilene Henning; Schrank, Augusto
2004-07-01
The trp1 gene from the entomopathogenic fungus Metarhizium anisopliae, cloned by heterologous hybridization with the plasmid carrying the trpC gene from Aspergillus nidulans, was sequence characterized. The predicted translation product has the conserved catalytic domains of glutamine amidotransferase (G domain), indoleglycerolphosphate synthase (C domain), and phosphoribosyl anthranilate isomerase (F domain) organized as NH2-G-C-F-COOH. The ORF is interrupted by a single intron of 60 nt that is position conserved in relation to trp genes from Ascomycetes and length conserved in relation to Basidiomycetes species. RT-PCR analysis suggests constitutive expression of trp1 gene in M. anisopliae.
Chandra, Saket; Kazmi, Andaleeb Z; Ahmed, Zainab; Roychowdhury, Gargi; Kumari, Veena; Kumar, Manish; Mukhopadhyay, Kunal
2017-07-01
NB-ARC domain-containing resistance genes from the wheat genome were identified, characterized and localized on chromosome arms that displayed differential yet positive response during incompatible and compatible leaf rust interactions. Wheat (Triticum aestivum L.) is an important cereal crop; however, its production is affected severely by numerous diseases including rusts. An efficient, cost-effective and ecologically viable approach to control pathogens is through host resistance. In wheat, high numbers of resistance loci are present but only few have been identified and cloned. A comprehensive analysis of the NB-ARC-containing genes in complete wheat genome was accomplished in this study. Complete NB-ARC encoding genes were mined from the Ensembl Plants database to predict 604 NB-ARC containing sequences using the HMM approach. Genome-wide analysis of orthologous clusters in the NB-ARC-containing sequences of wheat and other members of the Poaceae family revealed maximum homology with Oryza sativa indica and Brachypodium distachyon. The identification of overlap between orthologous clusters enabled the elucidation of the function and evolution of resistance proteins. The distributions of the NB-ARC domain-containing sequences were found to be balanced among the three wheat sub-genomes. Wheat chromosome arms 4AL and 7BL had the most NB-ARC domain-containing contigs. The spatio-temporal expression profiling studies exemplified the positive role of these genes in resistant and susceptible wheat plants during incompatible and compatible interaction in response to the leaf rust pathogen Puccinia triticina. Two NB-ARC domain-containing sequences were modelled in silico, cloned and sequenced to analyze their fine structures. The data obtained in this study will augment isolation, characterization and application NB-ARC resistance genes in marker-assisted selection based breeding programs for improving rust resistance in wheat.
Vidal, R; González, R; Gil, F
2015-06-10
Innate pathway activation is fundamental for early anti-viral defense in fish, but currently there is insufficient understanding of how salmonid fish identify viral molecules and activate these pathways. The Toll-like receptor (TLR) is believed to play a crucial role in host defense of pathogenic microbes in the innate immune system. In the present study, the full-length cDNA of Salmo salar TLR3 (ssTLR3) was cloned. The ssTLR3 cDNA sequence was 6071 bp long, containing an open reading frame of 2754 bp and encoding 971 amino acids. The TLR group motifs, such as leucine-rich repeat (LRR) domains and Toll-interleukin-1 receptor (TIR) domains, were maintained in ssTLR3, with sixteen LRR domains and one TIR domain. In contrast to descriptions of the TLR3 in rainbow trout and the murine (TATA-less), we found a putative TATA box in the proximal promoter region 29 bp upstream of the transcription start point of ssTLR3. Multiple-sequence alignment analysis of the ssTLR3 protein-coding sequence with other known TLR3 sequences showed the sequence to be conserved among all species analyzed, implying that the function of the TLR3 had been sustained throughout evolution. The ssTLR3 mRNA expression patterns were measured using real-time PCR. The results revealed that TLR3 is widely expressed in various healthy tissues. Individuals challenged with infectious pancreatic necrosis virus and immunostimulated with polyinosinic:polycytidylic acid exhibited increased expression of TLR3 at the mRNA level, indicating that ssTLR3 may be involved in pathogen recognition in the early innate immune system.
Cloning and analysis of DnaJ family members in the silkworm, Bombyx mori.
Li, Yinü; Bu, Cuiyu; Li, Tiantian; Wang, Shibao; Jiang, Feng; Yi, Yongzhu; Yang, Huipeng; Zhang, Zhifang
2016-01-15
Heat shock proteins (Hsps) are involved in a variety of critical biological functions, including protein folding, degradation, and translocation and macromolecule assembly, act as molecular chaperones during periods of stress by binding to other proteins. Using expressed sequence tag (EST) and silkworm (Bombyx mori) transcriptome databases, we identified 27 cDNA sequences encoding the conserved J domain, which is found in DnaJ-type Hsps. Of the 27 J domain-containing sequences, 25 were complete cDNA sequences. We divided them into three types according to the number and presence of conserved domains. By analyzing the gene structures, intron numbers, and conserved domains and constructing a phylogenetic tree, we found that the DnaJ family had undergone convergent evolution, obtaining new domains to expand the diversity of its family members. The acquisition of the new DnaJ domains most likely occurred prior to the evolutionary divergence of prokaryotes and eukaryotes. The expression of DnaJ genes in the silkworm was generally higher in the fat body. The tissue distribution of DnaJ1 proteins was detected by western blotting, demonstrating that in the fifth-instar larvae, the DnaJ1 proteins were expressed at their highest levels in hemocytes, followed by the fat body and head. We also found that the DnaJ1 transcripts were likely differentially translated in different tissues. Using immunofluorescence cytochemistry, we revealed that in the blood cells, DnaJ1 was mainly localized in the cytoplasm. Copyright © 2015 Elsevier B.V. All rights reserved.
Evolutionary analysis of a novel zinc ribbon in the N-terminal region of threonine synthase.
Kaur, Gurmeet; Subramanian, Srikrishna
2017-10-18
Threonine synthase (TS) catalyzes the terminal reaction in the biosynthetic pathway of threonine and requires pyridoxal phosphate as a cofactor. TSs share a common catalytic domain with other fold type II PALP dependent enzymes. TSs are broadly grouped into two classes based on their sequence, quaternary structure, and enzyme regulation. We report the presence of a novel zinc ribbon domain in the N-terminal region preceding the catalytic core in TS. The zinc ribbon domain is present in TSs belonging to both classes. Our sequence analysis reveals that archaeal TSs possess all zinc chelating residues to bind a metal ion that are lacking in the structurally characterized homologs. Phylogenetic analysis suggests that TSs with an N-terminal zinc ribbon likely represents the ancestral state of the enzyme while TSs without a zinc ribbon must have diverged later in specific lineages. The zinc ribbon and its N- and C-terminal extensions are important for enzyme stability, activity and regulation. It is likely that the zinc ribbon domain is involved in higher order oligomerization or mediating interactions with other biomolecules leading to formation of larger metabolic complexes.
The murine Cd48 gene: allelic polymorphism in the IgV-like region.
Cabrero, J G; Freeman, G J; Reiser, H
1998-12-01
The murine CD48 molecule is a member of the immunoglobulin superfamily which regulates the activation of T lymphocytes. prior cloning experiments using mRNA from two different mouse strains had yielded discrepant sequences within the IgV-like domain of murine CD48. To resolve this issue, we have directly sequenced genomic DNA of 10 laboratory strains and two inbred strains of wild origin. The results of our analysis reveal an allelic polymorphism within the IgV-like domain of murine CD48.
Common fold in helix–hairpin–helix proteins
Shao, Xuguang; Grishin, Nick V.
2000-01-01
Helix–hairpin–helix (HhH) is a widespread motif involved in non-sequence-specific DNA binding. The majority of HhH motifs function as DNA-binding modules, however, some of them are used to mediate protein–protein interactions or have acquired enzymatic activity by incorporating catalytic residues (DNA glycosylases). From sequence and structural analysis of HhH-containing proteins we conclude that most HhH motifs are integrated as a part of a five-helical domain, termed (HhH)2 domain here. It typically consists of two consecutive HhH motifs that are linked by a connector helix and displays pseudo-2-fold symmetry. (HhH)2 domains show clear structural integrity and a conserved hydrophobic core composed of seven residues, one residue from each α-helix and each hairpin, and deserves recognition as a distinct protein fold. In addition to known HhH in the structures of RuvA, RadA, MutY and DNA-polymerases, we have detected new HhH motifs in sterile alpha motif and barrier-to-autointegration factor domains, the α-subunit of Escherichia coli RNA-polymerase, DNA-helicase PcrA and DNA glycosylases. Statistically significant sequence similarity of HhH motifs and pronounced structural conservation argue for homology between (HhH)2 domains in different protein families. Our analysis helps to clarify how non-symmetric protein motifs bind to the double helix of DNA through the formation of a pseudo-2-fold symmetric (HhH)2 functional unit. PMID:10908318
Comparative analyses of putative toxin gene homologs from an Old World viper, Daboia russelii
Krishnan, Neeraja M.
2017-01-01
Availability of snake genome sequences has opened up exciting areas of research on comparative genomics and gene diversity. One of the challenges in studying snake genomes is the acquisition of biological material from live animals, especially from the venomous ones, making the process cumbersome and time-consuming. Here, we report comparative sequence analyses of putative toxin gene homologs from Russell’s viper (Daboia russelii) using whole-genome sequencing data obtained from shed skin. When compared with the major venom proteins in Russell’s viper studied previously, we found 45–100% sequence similarity between the venom proteins and their putative homologs in the skin. Additionally, comparative analyses of 20 putative toxin gene family homologs provided evidence of unique sequence motifs in nerve growth factor (NGF), platelet derived growth factor (PDGF), Kunitz/Bovine pancreatic trypsin inhibitor (Kunitz BPTI), cysteine-rich secretory proteins, antigen 5, andpathogenesis-related1 proteins (CAP) and cysteine-rich secretory protein (CRISP). In those derived proteins, we identified V11 and T35 in the NGF domain; F23 and A29 in the PDGF domain; N69, K2 and A5 in the CAP domain; and Q17 in the CRISP domain to be responsible for differences in the largest pockets across the protein domain structures in crotalines, viperines and elapids from the in silico structure-based analysis. Similarly, residues F10, Y11 and E20 appear to play an important role in the protein structures across the kunitz protein domain of viperids and elapids. Our study highlights the usefulness of shed skin in obtaining good quality high-molecular weight DNA for comparative genomic studies, and provides evidence towards the unique features and evolution of putative venom gene homologs in vipers. PMID:29230357
Sequence Diversity Diagram for comparative analysis of multiple sequence alignments.
Sakai, Ryo; Aerts, Jan
2014-01-01
The sequence logo is a graphical representation of a set of aligned sequences, commonly used to depict conservation of amino acid or nucleotide sequences. Although it effectively communicates the amount of information present at every position, this visual representation falls short when the domain task is to compare between two or more sets of aligned sequences. We present a new visual presentation called a Sequence Diversity Diagram and validate our design choices with a case study. Our software was developed using the open-source program called Processing. It loads multiple sequence alignment FASTA files and a configuration file, which can be modified as needed to change the visualization. The redesigned figure improves on the visual comparison of two or more sets, and it additionally encodes information on sequential position conservation. In our case study of the adenylate kinase lid domain, the Sequence Diversity Diagram reveals unexpected patterns and new insights, for example the identification of subgroups within the protein subfamily. Our future work will integrate this visual encoding into interactive visualization tools to support higher level data exploration tasks.
Cingulin Contains Globular and Coiled-Coil Domains and Interacts with Zo-1, Zo-2, Zo-3, and Myosin
Cordenonsi, Michelangelo; D'Atri, Fabio; Hammar, Eva; Parry, David A.D.; Kendrick-Jones, John; Shore, David; Citi, Sandra
1999-01-01
We characterized the sequence and protein interactions of cingulin, an M r 140–160-kD phosphoprotein localized on the cytoplasmic surface of epithelial tight junctions (TJ). The derived amino acid sequence of a full-length Xenopus laevis cingulin cDNA shows globular head (residues 1–439) and tail (1,326–1,368) domains and a central α-helical rod domain (440–1,325). Sequence analysis, electron microscopy, and pull-down assays indicate that the cingulin rod is responsible for the formation of coiled-coil parallel dimers, which can further aggregate through intermolecular interactions. Pull-down assays from epithelial, insect cell, and reticulocyte lysates show that an NH2-terminal fragment of cingulin (1–378) interacts in vitro with ZO-1 (K d ∼5 nM), ZO-2, ZO-3, myosin, and AF-6, but not with symplekin, and a COOH-terminal fragment (377–1,368) interacts with myosin and ZO-3. ZO-1 and ZO-2 immunoprecipitates contain cingulin, suggesting in vivo interactions. Full-length cingulin, but not NH2-terminal and COOH-terminal fragments, colocalizes with endogenous cingulin in transfected MDCK cells, indicating that sequences within both head and rod domains are required for TJ localization. We propose that cingulin is a functionally important component of TJ, linking the submembrane plaque domain of TJ to the actomyosin cytoskeleton. PMID:10613913
Jahandideh, Samad; Srinivasasainagendra, Vinodh; Zhi, Degui
2012-11-07
RNA-protein interaction plays an important role in various cellular processes, such as protein synthesis, gene regulation, post-transcriptional gene regulation, alternative splicing, and infections by RNA viruses. In this study, using Gene Ontology Annotated (GOA) and Structural Classification of Proteins (SCOP) databases an automatic procedure was designed to capture structurally solved RNA-binding protein domains in different subclasses. Subsequently, we applied tuned multi-class SVM (TMCSVM), Random Forest (RF), and multi-class ℓ1/ℓq-regularized logistic regression (MCRLR) for analysis and classifying RNA-binding protein domains based on a comprehensive set of sequence and structural features. In this study, we compared prediction accuracy of three different state-of-the-art predictor methods. From our results, TMCSVM outperforms the other methods and suggests the potential of TMCSVM as a useful tool for facilitating the multi-class prediction of RNA-binding protein domains. On the other hand, MCRLR by elucidating importance of features for their contribution in predictive accuracy of RNA-binding protein domains subclasses, helps us to provide some biological insights into the roles of sequences and structures in protein-RNA interactions.
Pagès, Sandrine; Bélaïch, Anne; Fierobe, Henri-Pierre; Tardif, Chantal; Gaudin, Christian; Bélaïch, Jean-Pierre
1999-01-01
The gene encoding the scaffolding protein of the cellulosome from Clostridium cellulolyticum, whose partial sequence was published earlier (S. Pagès, A. Bélaïch, C. Tardif, C. Reverbel-Leroy, C. Gaudin, and J.-P. Bélaïch, J. Bacteriol. 178:2279–2286, 1996; C. Reverbel-Leroy, A. Bélaïch, A. Bernadac, C. Gaudin, J. P. Bélaïch, and C. Tardif, Microbiology 142:1013–1023, 1996), was completely sequenced. The corresponding protein, CipC, is composed of a cellulose binding domain at the N terminus followed by one hydrophilic domain (HD1), seven highly homologous cohesin domains (cohesin domains 1 to 7), a second hydrophilic domain, and a final cohesin domain (cohesin domain 8) which is only 57 to 60% identical to the seven other cohesin domains. In addition, a second gene located 8.89 kb downstream of cipC was found to encode a three-domain protein, called ORFXp, which includes a cohesin domain. By using antiserum raised against the latter, it was observed that ORFXp is associated with the membrane of C. cellulolyticum and is not detected in the cellulosome fraction. Western blot and BIAcore experiments indicate that cohesin domains 1 and 8 from CipC recognize the same dockerins and have similar affinity for CelA (Ka = 4.8 × 109 M−1) whereas the cohesin from ORFXp, although it is also able to bind all cellulosome components containing a dockerin, has a 19-fold lower Ka for CelA (2.6 × 108 M−1). Taken together, these data suggest that ORFXp may play a role in cellulosome assembly. PMID:10074072
Pagès, S; Bélaïch, A; Fierobe, H P; Tardif, C; Gaudin, C; Bélaïch, J P
1999-03-01
The gene encoding the scaffolding protein of the cellulosome from Clostridium cellulolyticum, whose partial sequence was published earlier (S. Pagès, A. Bélaïch, C. Tardif, C. Reverbel-Leroy, C. Gaudin, and J.-P. Bélaïch, J. Bacteriol. 178:2279-2286, 1996; C. Reverbel-Leroy, A. Bélaïch, A. Bernadac, C. Gaudin, J. P. Bélaïch, and C. Tardif, Microbiology 142:1013-1023, 1996), was completely sequenced. The corresponding protein, CipC, is composed of a cellulose binding domain at the N terminus followed by one hydrophilic domain (HD1), seven highly homologous cohesin domains (cohesin domains 1 to 7), a second hydrophilic domain, and a final cohesin domain (cohesin domain 8) which is only 57 to 60% identical to the seven other cohesin domains. In addition, a second gene located 8.89 kb downstream of cipC was found to encode a three-domain protein, called ORFXp, which includes a cohesin domain. By using antiserum raised against the latter, it was observed that ORFXp is associated with the membrane of C. cellulolyticum and is not detected in the cellulosome fraction. Western blot and BIAcore experiments indicate that cohesin domains 1 and 8 from CipC recognize the same dockerins and have similar affinity for CelA (Ka = 4.8 x 10(9) M-1) whereas the cohesin from ORFXp, although it is also able to bind all cellulosome components containing a dockerin, has a 19-fold lower Ka for CelA (2.6 x 10(8) M-1). Taken together, these data suggest that ORFXp may play a role in cellulosome assembly.
Phylogenetic and Protein Sequence Analysis of Bacterial Chemoreceptors.
Ortega, Davi R; Zhulin, Igor B
2018-01-01
Identifying chemoreceptors in sequenced bacterial genomes, revealing their domain architecture, inferring their evolutionary relationships, and comparing them to chemoreceptors of known function become important steps in genome annotation and chemotaxis research. Here, we describe bioinformatics procedures that enable such analyses, using two closely related bacterial genomes as examples.
Detecting Coevolution in and among Protein Domains
Yeang, Chen-Hsiang; Haussler, David
2007-01-01
Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level. PMID:17983264
Bai, Donglin
2016-02-01
A gap junction (GJ) channel is formed by docking of two GJ hemichannels and each of these hemichannels is a hexamer of connexins. All connexin genes have been identified in human, mouse, and rat genomes and their homologous genes in many other vertebrates are available in public databases. The protein sequences of these connexins align well with high sequence identity in the same connexin across different species. Domains in closely related connexins and several residues in all known connexins are also well-conserved. These conserved residues form signatures (also known as sequence logos) in these domains and are likely to play important biological functions. In this review, the sequence logos of individual connexins, groups of connexins with common ancestors, and all connexins are analyzed to visualize natural evolutionary variations and the hot spots for human disease-linked mutations. Several gap junction domains are homologous, likely forming similar structures essential for their function. The availability of a high resolution Cx26 GJ structure and the subsequently-derived homology structure models for other connexin GJ channels elevated our understanding of sequence logos at the three-dimensional GJ structure level, thus facilitating the understanding of how disease-linked connexin mutants might impair GJ structure and function. This knowledge will enable the design of complementary variants to rescue disease-linked mutants. Copyright © 2015 Elsevier Ltd. All rights reserved.
Conservation and diversification of Msx protein in metazoan evolution.
Takahashi, Hirokazu; Kamiya, Akiko; Ishiguro, Akira; Suzuki, Atsushi C; Saitou, Naruya; Toyoda, Atsushi; Aruga, Jun
2008-01-01
Msx (/msh) family genes encode homeodomain (HD) proteins that control ontogeny in many animal species. We compared the structures of Msx genes from a wide range of Metazoa (Porifera, Cnidaria, Nematoda, Arthropoda, Tardigrada, Platyhelminthes, Mollusca, Brachiopoda, Annelida, Echiura, Echinodermata, Hemichordata, and Chordata) to gain an understanding of the role of these genes in phylogeny. Exon-intron boundary analysis suggested that the position of the intron located N-terminally to the HDs was widely conserved in all the genes examined, including those of cnidarians. Amino acid (aa) sequence comparison revealed 3 new evolutionarily conserved domains, as well as very strong conservation of the HDs. Two of the three domains were associated with Groucho-like protein binding in both a vertebrate and a cnidarian Msx homolog, suggesting that the interaction between Groucho-like proteins and Msx proteins was established in eumetazoan ancestors. Pairwise comparison among the collected HDs and their C-flanking aa sequences revealed that the degree of sequence conservation varied depending on the animal taxa from which the sequences were derived. Highly conserved Msx genes were identified in the Vertebrata, Cephalochordata, Hemichordata, Echinodermata, Mollusca, Brachiopoda, and Anthozoa. The wide distribution of the conserved sequences in the animal phylogenetic tree suggested that metazoan ancestors had already acquired a set of conserved domains of the current Msx family genes. Interestingly, although strongly conserved sequences were recovered from the Vertebrata, Cephalochordata, and Anthozoa, the sequences from the Urochordata and Hydrozoa showed weak conservation. Because the Vertebrata-Cephalochordata-Urochordata and Anthozoa-Hydrozoa represent sister groups in the Chordata and Cnidaria, respectively, Msx sequence diversification may have occurred differentially in the course of evolution. We speculate that selective loss of the conserved domains in Msx family proteins contributed to the diversification of animal body organization.
Xu, Xinran; Chen, Xiangdong; Yu, Wumengxiao; Liu, Yu; Zhang, Weiwei; Lan, Jin
2017-08-01
Blue light plays an important role during the growth of Ganoderma lucidum, one of the best-known medicinal macrofungi in China. In the present study, we cloned Glwc-1 and Glwc-2, the homologue of the blue light photoreceptors Ncwc-1 and Ncwc-2 of Neurospora crassa, from G. lucidum. The deduced amino acid sequence of Glwc-1 contained the similar function domains as NcWC-1 including LOV, PAS B, PAS C, and PAC domains. The deduced amino acid sequence of Glwc-2 contained PAS domain and GATA-type zinc finger (Znf) domain as well as NcWC-2. Phylogenetic analysis based on fungal WC-1 and WC-2 supported GlWC-1 and GlWC-2 were blue light receptors. The expression of Glwc-1 and Glwc-2 indicated that they might play an important role during the primordium differentiation process of G. lucidum, and the external blue light stimulation increased the expression of Glwc-1 and Glwc-2. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Basic Tilted Helix Bundle - a new protein fold in human FKBP25/FKBP3 and HectD1.
Helander, Sara; Montecchio, Meri; Lemak, Alexander; Farès, Christophe; Almlöf, Jonas; Yi, Yanjun; Yee, Adelinda; Arrowsmith, Cheryl; DhePaganon, Sirano; Sunnerhagen, Maria
2014-04-25
In this paper, we describe the structure of a N-terminal domain motif in nuclear-localized FKBP251-73, a member of the FKBP family, together with the structure of a sequence-related subdomain of the E3 ubiquitin ligase HectD1 that we show belongs to the same fold. This motif adopts a compact 5-helix bundle which we name the Basic Tilted Helix Bundle (BTHB) domain. A positively charged surface patch, structurally centered around the tilted helix H4, is present in both FKBP25 and HectD1 and is conserved in both proteins, suggesting a conserved functional role. We provide detailed comparative analysis of the structures of the two proteins and their sequence similarities, and analysis of the interaction of the proposed FKBP25 binding protein YY1. We suggest that the basic motif in BTHB is involved in the observed DNA binding of FKBP25, and that the function of this domain can be affected by regulatory YY1 binding and/or interactions with adjacent domains. Copyright © 2014 Elsevier Inc. All rights reserved.
Subramanian, Sundar Raman; Singam, Ettayapuram Ramaprasad Azhagiya; Berinski, Michael; Subramanian, Venkatesan; Wade, Rebecca C
2016-08-25
Sequence-specific cleavage of collagen by mammalian collagenase plays a pivotal role in cell function. Collagenases are matrix metalloproteinases that cleave the peptide bond at a specific position on fibrillar collagen. The collagenase Hemopexin-like (HPX) domain has been proposed to be responsible for substrate recognition, but the mechanism by which collagenases identify the cleavage site on fibrillar collagen is not clearly understood. In this study, Brownian dynamics simulations coupled with atomic-detail and coarse-grained molecular dynamics simulations were performed to dock matrix metalloproteinase-1 (MMP-1) on a collagen IIIα1 triple helical peptide. We find that the HPX domain recognizes the collagen triple helix at a conserved R-X11-R motif C-terminal to the cleavage site to which the HPX domain of collagen is guided electrostatically. The binding of the HPX domain between the two arginine residues is energetically stabilized by hydrophobic contacts with collagen. From the simulations and analysis of the sequences and structural flexibility of collagen and collagenase, a mechanistic scheme by which MMP-1 can recognize and bind collagen for proteolysis is proposed.
Miguel, Célia; Simões, Marta; Oliveira, Maria Margarida; Rocheta, Margarida
2008-11-01
Retroviruses differ from retrotransposons due to their infective capacity, which depends critically on the encoded envelope. Some plant retroelements contain domains reminiscent of the env of animal retroviruses but the number of such elements described to date is restricted to angiosperms. We show here the first evidence of the presence of putative env-like gene sequences in a gymnosperm species, Pinus pinaster (maritime pine). Using a degenerate primer approach for conserved domains of RNaseH gene, three clones from putative envelope-like retrotransposons (PpRT2, PpRT3, and PpRT4) were identified. The env-like sequences of P. pinaster clones are predicted to encode proteins with transmembrane domains. These sequences showed identity scores of up to 30% with env-like sequences belonging to different organisms. A phylogenetic analysis based on protein alignment of deduced aminoacid sequences revealed that these clones clustered with env-containing plant retrotransposons, as well as with retrotransposons from invertebrate organisms. The differences found among the sequences of maritime pine clones isolated here suggest the existence of different putative classes of env-like retroelements. The identification for the first time of env-like genes in a gymnosperm species may support the ancestrality of retroviruses among plants shedding light on their role in plant evolution.
Domain atrophy creates rare cases of functional partial protein domains.
Prakash, Ananth; Bateman, Alex
2015-04-30
Protein domains display a range of structural diversity, with numerous additions and deletions of secondary structural elements between related domains. We have observed a small number of cases of surprising large-scale deletions of core elements of structural domains. We propose a new concept called domain atrophy, where protein domains lose a significant number of core structural elements. Here, we implement a new pipeline to systematically identify new cases of domain atrophy across all known protein sequences. The output of this pipeline was carefully checked by hand, which filtered out partial domain instances that were unlikely to represent true domain atrophy due to misannotations or un-annotated sequence fragments. We identify 75 cases of domain atrophy, of which eight cases are found in a three-dimensional protein structure and 67 cases have been inferred based on mapping to a known homologous structure. Domains with structural variations include ancient folds such as the TIM-barrel and Rossmann folds. Most of these domains are observed to show structural loss that does not affect their functional sites. Our analysis has significantly increased the known cases of domain atrophy. We discuss specific instances of domain atrophy and see that there has often been a compensatory mechanism that helps to maintain the stability of the partial domain. Our study indicates that although domain atrophy is an extremely rare phenomenon, protein domains under certain circumstances can tolerate extreme mutations giving rise to partial, but functional, domains.
A high level interface to SCOP and ASTRAL implemented in python.
Casbon, James A; Crooks, Gavin E; Saqi, Mansoor A S
2006-01-10
Benchmarking algorithms in structural bioinformatics often involves the construction of datasets of proteins with given sequence and structural properties. The SCOP database is a manually curated structural classification which groups together proteins on the basis of structural similarity. The ASTRAL compendium provides non redundant subsets of SCOP domains on the basis of sequence similarity such that no two domains in a given subset share more than a defined degree of sequence similarity. Taken together these two resources provide a 'ground truth' for assessing structural bioinformatics algorithms. We present a small and easy to use API written in python to enable construction of datasets from these resources. We have designed a set of python modules to provide an abstraction of the SCOP and ASTRAL databases. The modules are designed to work as part of the Biopython distribution. Python users can now manipulate and use the SCOP hierarchy from within python programs, and use ASTRAL to return sequences of domains in SCOP, as well as clustered representations of SCOP from ASTRAL. The modules make the analysis and generation of datasets for use in structural genomics easier and more principled.
A de novo redesign of the WW domain
Kraemer-Pecore, Christina M.; Lecomte, Juliette T.J.; Desjarlais, John R.
2003-01-01
We have used a sequence prediction algorithm and a novel sampling method to design protein sequences for the WW domain, a small β-sheet motif. The procedure, referred to as SPANS, designs sequences to be compatible with an ensemble of closely related polypeptide backbones, mimicking the inherent flexibility of proteins. Two designed sequences (termed SPANS-WW1 and SPANS-WW2), using only naturally occurring l-amino acids, were selected for study and the corresponding polypeptides were prepared in Escherichia coli. Circular dichroism data suggested that both purified polypeptides adopted secondary structure features related to those of the target without the aid of disulfide bridges or bound cofactors. The structure exhibited by SPANS-WW2 melted cooperatively by raising the temperature of the solution. Further analysis of this polypeptide by proton nuclear magnetic resonance spectroscopy demonstrated that at 5°C, it folds into a structure closely resembling a natural WW domain. This achievement constitutes one of a small number of successful de novo protein designs through fully automated computational methods and highlights the feasibility of including backbone flexibility in the design strategy. PMID:14500877
A de novo redesign of the WW domain.
Kraemer-Pecore, Christina M; Lecomte, Juliette T J; Desjarlais, John R
2003-10-01
We have used a sequence prediction algorithm and a novel sampling method to design protein sequences for the WW domain, a small beta-sheet motif. The procedure, referred to as SPANS, designs sequences to be compatible with an ensemble of closely related polypeptide backbones, mimicking the inherent flexibility of proteins. Two designed sequences (termed SPANS-WW1 and SPANS-WW2), using only naturally occurring L-amino acids, were selected for study and the corresponding polypeptides were prepared in Escherichia coli. Circular dichroism data suggested that both purified polypeptides adopted secondary structure features related to those of the target without the aid of disulfide bridges or bound cofactors. The structure exhibited by SPANS-WW2 melted cooperatively by raising the temperature of the solution. Further analysis of this polypeptide by proton nuclear magnetic resonance spectroscopy demonstrated that at 5 degrees C, it folds into a structure closely resembling a natural WW domain. This achievement constitutes one of a small number of successful de novo protein designs through fully automated computational methods and highlights the feasibility of including backbone flexibility in the design strategy.
Structure-Based Phylogenetic Analysis of the Lipocalin Superfamily.
Lakshmi, Balasubramanian; Mishra, Madhulika; Srinivasan, Narayanaswamy; Archunan, Govindaraju
2015-01-01
Lipocalins constitute a superfamily of extracellular proteins that are found in all three kingdoms of life. Although very divergent in their sequences and functions, they show remarkable similarity in 3-D structures. Lipocalins bind and transport small hydrophobic molecules. Earlier sequence-based phylogenetic studies of lipocalins highlighted that they have a long evolutionary history. However the molecular and structural basis of their functional diversity is not completely understood. The main objective of the present study is to understand functional diversity of the lipocalins using a structure-based phylogenetic approach. The present study with 39 protein domains from the lipocalin superfamily suggests that the clusters of lipocalins obtained by structure-based phylogeny correspond well with the functional diversity. The detailed analysis on each of the clusters and sub-clusters reveals that the 39 lipocalin domains cluster based on their mode of ligand binding though the clustering was performed on the basis of gross domain structure. The outliers in the phylogenetic tree are often from single member families. Also structure-based phylogenetic approach has provided pointers to assign putative function for the domains of unknown function in lipocalin family. The approach employed in the present study can be used in the future for the functional identification of new lipocalin proteins and may be extended to other protein families where members show poor sequence similarity but high structural similarity.
Rivero, Francisco; Muramoto, Tetsuya; Meyer, Ann-Kathrin; Urushihara, Hideko; Uyeda, Taro QP; Kitayama, Chikako
2005-01-01
Background Formins are multidomain proteins defined by a conserved FH2 (formin homology 2) domain with actin nucleation activity preceded by a proline-rich FH1 (formin homology 1) domain. Formins act as profilin-modulated processive actin nucleators conserved throughout a wide range of eukaryotes. Results We present a detailed sequence analysis of the 10 formins (ForA to J) identified in the genome of the social amoeba Dictyostelium discoideum. With the exception of ForI and ForC all other formins conform to the domain structure GBD/FH3-FH1-FH2-DAD, where DAD is the Diaphanous autoinhibition domain and GBD/FH3 is the Rho GTPase-binding domain/formin homology 3 domain that we propose to represent a single domain. ForC lacks a FH1 domain, ForI lacks recognizable GBD/FH3 and DAD domains and ForA, E and J have additional unique domains. To establish the relationship between formins of Dictyostelium and other organisms we constructed a phylogenetic tree based on the alignment of FH2 domains. Real-time PCR was used to study the expression pattern of formin genes. Expression of forC, D, I and J increased during transition to multi-cellular stages, while the rest of genes displayed less marked developmental variations. During sexual development, expression of forH and forI displayed a significant increase in fusion competent cells. Conclusion Our analysis allows some preliminary insight into the functionality of Dictyostelium formins: all isoforms might display actin nucleation activity and, with the exception of ForI, might also be susceptible to autoinhibition and to regulation by Rho GTPases. The architecture GBD/FH3-FH1-FH2-DAD appears common to almost all Dictyostelium, fungal and metazoan formins, for which we propose the denomination of conventional formins, and implies a common regulatory mechanism. PMID:15740615
Rivero, Francisco; Muramoto, Tetsuya; Meyer, Ann-Kathrin; Urushihara, Hideko; Uyeda, Taro Q P; Kitayama, Chikako
2005-03-01
Formins are multidomain proteins defined by a conserved FH2 (formin homology 2) domain with actin nucleation activity preceded by a proline-rich FH1 (formin homology 1) domain. Formins act as profilin-modulated processive actin nucleators conserved throughout a wide range of eukaryotes. We present a detailed sequence analysis of the 10 formins (ForA to J) identified in the genome of the social amoeba Dictyostelium discoideum. With the exception of ForI and ForC all other formins conform to the domain structure GBD/FH3-FH1-FH2-DAD, where DAD is the Diaphanous autoinhibition domain and GBD/FH3 is the Rho GTPase-binding domain/formin homology 3 domain that we propose to represent a single domain. ForC lacks a FH1 domain, ForI lacks recognizable GBD/FH3 and DAD domains and ForA, E and J have additional unique domains. To establish the relationship between formins of Dictyostelium and other organisms we constructed a phylogenetic tree based on the alignment of FH2 domains. Real-time PCR was used to study the expression pattern of formin genes. Expression of forC, D, I and J increased during transition to multi-cellular stages, while the rest of genes displayed less marked developmental variations. During sexual development, expression of forH and forI displayed a significant increase in fusion competent cells. Our analysis allows some preliminary insight into the functionality of Dictyostelium formins: all isoforms might display actin nucleation activity and, with the exception of ForI, might also be susceptible to autoinhibition and to regulation by Rho GTPases. The architecture GBD/FH3-FH1-FH2-DAD appears common to almost all Dictyostelium, fungal and metazoan formins, for which we propose the denomination of conventional formins, and implies a common regulatory mechanism.
Formin homology 2 domains occur in multiple contexts in angiosperms
Cvrčková, Fatima; Novotný, Marian; Pícková, Denisa; Žárský, Viktor
2004-01-01
Background Involvement of conservative molecular modules and cellular mechanisms in the widely diversified processes of eukaryotic cell morphogenesis leads to the intriguing question: how do similar proteins contribute to dissimilar morphogenetic outputs. Formins (FH2 proteins) play a central part in the control of actin organization and dynamics, providing a good example of evolutionarily versatile use of a conserved protein domain in the context of a variety of lineage-specific structural and signalling interactions. Results In order to identify possible plant-specific sequence features within the FH2 protein family, we performed a detailed analysis of angiosperm formin-related sequences available in public databases, with particular focus on the complete Arabidopsis genome and the nearly finished rice genome sequence. This has led to revision of the current annotation of half of the 22 Arabidopsis formin-related genes. Comparative analysis of the two plant genomes revealed a good conservation of the previously described two subfamilies of plant formins (Class I and Class II), as well as several subfamilies within them that appear to predate the separation of monocot and dicot plants. Moreover, a number of plant Class II formins share an additional conserved domain, related to the protein phosphatase/tensin/auxilin fold. However, considerable inter-species variability sets limits to generalization of any functional conclusions reached on a single species such as Arabidopsis. Conclusions The plant-specific domain context of the conserved FH2 domain, as well as plant-specific features of the domain itself, may reflect distinct functional requirements in plant cells. The variability of formin structures found in plants far exceeds that known from both fungi and metazoans, suggesting a possible contribution of FH2 proteins in the evolution of the plant type of multicellularity. PMID:15256004
Peng, Fred Y; Weselake, Randall J
2013-05-01
The plant-specific B3 superfamily of transcription factors has diverse functions in plant growth and development. Using a genome-wide domain analysis, we identified 92, 187, 58, 90, 81, 55, and 77 B3 transcription factor genes in the sequenced genome of Arabidopsis, Brassica rapa, castor bean (Ricinus communis), cocoa (Theobroma cacao), soybean (Glycine max), maize (Zea mays), and rice (Oryza sativa), respectively. The B3 superfamily has substantially expanded during the evolution in eudicots particularly in Brassicaceae, as compared to monocots in the analysis. We observed domain duplication in some of these B3 proteins, forming more complex domain architectures than currently understood. We found that the length of B3 domains exhibits a large variation, which may affect their exact number of α-helices and β-sheets in the core structure of B3 domains, and possibly have functional implications. Analysis of the public microarray data indicated that most of the B3 gene pairs encoding Arabidopsis-rice orthologs are preferentially expressed in different tissues, suggesting their different roles in these two species. Using ESTs in crops, we identified many B3 genes preferentially expressed in reproductive tissues. In a sequence-based quantitative trait loci analysis in rice and maize, we have found many B3 genes associated with traits such as grain yield, seed weight and number, and protein content. Our results provide a framework for future studies into the function of B3 genes in different phases of plant development, especially the ones related to traits in major crops.
Taggart, David J.; Dayeh, Daniel M.; Fredrickson, Saul W.; Suo, Zucai
2014-01-01
The X-family DNA polymerases λ (Polλ) and β (Polβ) possess similar 5′-2-deoxyribose-5-phosphatelyase (dRPase) and polymerase domains. Besides these domains, Polλ also possesses a BRCA1 C-terminal (BRCT) domain and a proline-rich domain at its N terminus. However, it is unclear how these non-enzymatic domains contribute to the unique biological functions of Polλ. Here, we used primer extension assays and a newly developed high-throughput short oligonucleotide sequencing assay (HT-SOSA) to compare the efficiency of lesion bypass and fidelity of human Polβ, Polλ and two N-terminal deletion constructs of Polλ during the bypass of either an abasic site or a 8-oxo-7,8-dihydro-2′-deoxyguanosine (8-oxodG) lesion. We demonstrate that the BRCT domain of Polλ enhances the efficiency of abasic site bypass by approximately 1.6-fold. In contrast, deletion of the N-terminal domains of Polλ did not affect the efficiency of 8-oxodG bypass relative to nucleotide incorporations opposite undamaged dG. HT-SOSA analysis demonstrated that Polλ and Polβ preferentially generated −1 or −2 frameshift mutations when bypassing an abasic site and the single or double base deletion frequency was highly sequence dependent. Interestingly, the BRCT and proline-rich domains of Polλ cooperatively promoted the generation of −2 frameshift mutations when the abasic site was situated within a sequence context that was susceptible to homology-driven primer realignment. Furthermore, both N-terminal domains of Polλ increased the generation of −1 frameshift mutations during 8-oxodG bypass and influenced the frequency of substitution mutations produced by Polλ opposite the 8-oxodG lesion. Overall, our data support a model wherein the BRCT and proline-rich domains of Polλ act cooperatively to promote primer/template realignment between DNA strands of limited sequence homology. This function of the N-terminal domains may facilitate the role of Polλ as a gap-filling polymerase within the non-homologous end joining pathway. PMID:25108835
Strickland, Michelle; Tudorica, Victor; Řezáč, Milan; Thomas, Neil R; Goodacre, Sara L
2018-06-01
Spiders produce multiple silks with different physical properties that allow them to occupy a diverse range of ecological niches, including the underwater environment. Despite this functional diversity, past molecular analyses show a high degree of amino acid sequence similarity between C-terminal regions of silk genes that appear to be independent of the physical properties of the resulting silks; instead, this domain is crucial to the formation of silk fibers. Here, we present an analysis of the C-terminal domain of all known types of spider silk and include silk sequences from the spider Argyroneta aquatica, which spins the majority of its silk underwater. Our work indicates that spiders have retained a highly conserved mechanism of silk assembly, despite the extraordinary diversification of species, silk types and applications of silk over 350 million years. Sequence analysis of the silk C-terminal domain across the entire gene family shows the conservation of two uncommon amino acids that are implicated in the formation of a salt bridge, a functional bond essential to protein assembly. This conservation extends to the novel sequences isolated from A. aquatica. This finding is relevant to research regarding the artificial synthesis of spider silk, suggesting that synthesis of all silk types will be possible using a single process.
Exploring the dark foldable proteome by considering hydrophobic amino acids topology
Bitard-Feildel, Tristan; Callebaut, Isabelle
2017-01-01
The protein universe corresponds to the set of all proteins found in all organisms. A way to explore it is by taking into account the domain content of the proteins. However, some part of sequences and many entire sequences remain un-annotated despite a converging number of domain families. The un-annotated part of the protein universe is referred to as the dark proteome and remains poorly characterized. In this study, we quantify the amount of foldable domains within the dark proteome by using the hydrophobic cluster analysis methodology. These un-annotated foldable domains were grouped using a combination of remote homology searches and domain annotations, leading to define different levels of darkness. The dark foldable domains were analyzed to understand what make them different from domains stored in databases and thus difficult to annotate. The un-annotated domains of the dark proteome universe display specific features relative to database domains: shorter length, non-canonical content and particular topology in hydrophobic residues, higher propensity for disorder, and a higher energy. These features make them hard to relate to known families. Based on these observations, we emphasize that domain annotation methodologies can still be improved to fully apprehend and decipher the molecular evolution of the protein universe. PMID:28134276
From Binding-Induced Dynamic Effects in SH3 Structures to Evolutionary Conserved Sectors.
Zafra Ruano, Ana; Cilia, Elisa; Couceiro, José R; Ruiz Sanz, Javier; Schymkowitz, Joost; Rousseau, Frederic; Luque, Irene; Lenaerts, Tom
2016-05-01
Src Homology 3 domains are ubiquitous small interaction modules known to act as docking sites and regulatory elements in a wide range of proteins. Prior experimental NMR work on the SH3 domain of Src showed that ligand binding induces long-range dynamic changes consistent with an induced fit mechanism. The identification of the residues that participate in this mechanism produces a chart that allows for the exploration of the regulatory role of such domains in the activity of the encompassing protein. Here we show that a computational approach focusing on the changes in side chain dynamics through ligand binding identifies equivalent long-range effects in the Src SH3 domain. Mutation of a subset of the predicted residues elicits long-range effects on the binding energetics, emphasizing the relevance of these positions in the definition of intramolecular cooperative networks of signal transduction in this domain. We find further support for this mechanism through the analysis of seven other publically available SH3 domain structures of which the sequences represent diverse SH3 classes. By comparing the eight predictions, we find that, in addition to a dynamic pathway that is relatively conserved throughout all SH3 domains, there are dynamic aspects specific to each domain and homologous subgroups. Our work shows for the first time from a structural perspective, which transduction mechanisms are common between a subset of closely related and distal SH3 domains, while at the same time highlighting the differences in signal transduction that make each family member unique. These results resolve the missing link between structural predictions of dynamic changes and the domain sectors recently identified for SH3 domains through sequence analysis.
From Binding-Induced Dynamic Effects in SH3 Structures to Evolutionary Conserved Sectors
Ruiz Sanz, Javier; Schymkowitz, Joost; Rousseau, Frederic
2016-01-01
Src Homology 3 domains are ubiquitous small interaction modules known to act as docking sites and regulatory elements in a wide range of proteins. Prior experimental NMR work on the SH3 domain of Src showed that ligand binding induces long-range dynamic changes consistent with an induced fit mechanism. The identification of the residues that participate in this mechanism produces a chart that allows for the exploration of the regulatory role of such domains in the activity of the encompassing protein. Here we show that a computational approach focusing on the changes in side chain dynamics through ligand binding identifies equivalent long-range effects in the Src SH3 domain. Mutation of a subset of the predicted residues elicits long-range effects on the binding energetics, emphasizing the relevance of these positions in the definition of intramolecular cooperative networks of signal transduction in this domain. We find further support for this mechanism through the analysis of seven other publically available SH3 domain structures of which the sequences represent diverse SH3 classes. By comparing the eight predictions, we find that, in addition to a dynamic pathway that is relatively conserved throughout all SH3 domains, there are dynamic aspects specific to each domain and homologous subgroups. Our work shows for the first time from a structural perspective, which transduction mechanisms are common between a subset of closely related and distal SH3 domains, while at the same time highlighting the differences in signal transduction that make each family member unique. These results resolve the missing link between structural predictions of dynamic changes and the domain sectors recently identified for SH3 domains through sequence analysis. PMID:27213566
Structure, organization and expression of common carp (Cyprinus carpio L.) SLP-76 gene.
Huang, Rong; Sun, Xiao-Feng; Hu, Wei; Wang, Ya-Ping; Guo, Qiong-Lin
2008-05-01
SLP-76 is an important member of the SLP-76 family of adapters, and it plays a key role in TCR signaling and T cell function. Partial cDNA sequence of SLP-76 of common carp (Cyprinus carpio L.) was isolated from thymus cDNA library by the method of suppression subtractive hybridization (SSH). Subsequently, the full length cDNA of carp SLP-76 was obtained by means of 3' RACE and 5' RACE, respectively. The full length cDNA of carp SLP-76 was 2007 bp, consisting of a 5'-terminal untranslated region (UTR) of 285 bp, a 3'-terminal UTR of 240 bp, and an open reading frame of 1482 bp. Sequence comparison showed that the deduced amino acid sequence of carp SLP-76 had an overall similarity of 34-73% to that of other species homologues, and it was composed of an NH2-terminal domain, a central proline-rich domain, and a C-terminal SH2 domain. Amino acid sequence analysis indicated the existence of a Gads binding site R-X-X-K, a 10-aa-long sequence which binds to the SH3 domain of LCK in vitro, and three conserved tyrosine-containing sequence in the NH2-terminal domain. Then we used PCR to obtain a genomic DNA which covers the entire coding region of carp SLP-76. In the 9.2k-long genomic sequence, twenty one exons and twenty introns were identified. RT-PCR results showed that carp SLP-76 was expressed predominantly in hematopoietic tissues, and was upregulated in thymus tissue of four-month carp compared to one-year old carp. RT-PCR and virtual northern hybridization results showed that carp SLP-76 was also upregulated in thymus tissue of GH transgenic carp at the age of four-months. These results suggest that the expression level of SLP-76 gene may be related to thymocyte development in teleosts.
Domain Evolution and Functional Diversification of Sulfite Reductases
NASA Astrophysics Data System (ADS)
Dhillon, Ashita; Goswami, Sulip; Riley, Monica; Teske, Andreas; Sogin, Mitchell
2005-02-01
Sulfite reductases are key enzymes of assimilatory and dissimilatory sulfur metabolism, which occur in diverse bacterial and archaeal lineages. They share a highly conserved domain "C-X5-C-n-C-X3-C" for binding siroheme and iron-sulfur clusters that facilitate electron transfer to the substrate. For each sulfite reductase cluster, the siroheme-binding domain is positioned slightly differently at the N-terminus of dsrA and dsrB, while in the assimilatory proteins the siroheme domain is located at the C-terminus. Our sequence and phylogenetic analysis of the siroheme-binding domain shows that sulfite reductase sequences diverged from a common ancestor into four separate clusters (aSir, alSir, dsr, and asrC) that are biochemically distinct; each serves a different assimilatory or dissimilatory role in sulfur metabolism. The phylogenetic distribution and functional grouping in sulfite reductase clusters (dsrA and dsrB vs. aSiR, asrC, and alSir) suggest that their functional diversification during evolution may have preceded the bacterial/archaeal divergence.
Sumer, Huseyin; Craig, Jeffrey M.; Sibson, Mandy; Choo, K.H. Andy
2003-01-01
Human neocentromeres are fully functional centromeres that arise at previously noncentromeric regions of the genome. We have tested a rapid procedure of genomic array analysis of chromosome scaffold/matrix attachment regions (S/MARs), involving the isolation of S/MAR DNA and hybridization of this DNA to a genomic BAC/PAC array. Using this procedure, we have defined a 2.5-Mb domain of S/MAR-enriched chromatin that fully encompasses a previously mapped centromere protein-A (CENP-A)-associated domain at a human neocentromere. We have independently verified this procedure using a previously established fluorescence in situ hybridization method on salt-treated metaphase chromosomes. In silico sequence analysis of the S/MAR-enriched and surrounding regions has revealed no outstanding sequence-related predisposition. This study defines the S/MAR-enriched domain of a higher eukaryotic centromere and provides a method that has broad application for the mapping of S/MAR attachment sites over large genomic regions or throughout a genome. PMID:12840048
Distribution, genetic diversity and recombination analysis of Citrus tristeza virus of India
USDA-ARS?s Scientific Manuscript database
Citrus tristeza virus (CTV) isolates representing all the citrus growing geographical zones of India were analyzed for sequence of the 5'ORF1a fragments of the partial LProI domain and for the coat protein (CP) gene. The sequences were compared with previously reported Indian and CTV genotypes from...
He, Ping; Tan, De-Li; Liu, Hong-Xiang; Lv, Feng-Lin; Wu, Wei
2015-04-01
The short isoform of Rho guanine nucleotide exchange factor ARHGEF5 is known as TIM, which plays diverse roles in, for example, tumorigenesis, neuronal development and Src-induced podosome formation through the activation of its substrates, the Rho family of GTPases. The activation is auto-inhibited by a putative helix N-terminal to the DH domain of TIM, which is stabilized by the intramolecular interaction of C-terminal SH3 domain with a poly-proline sequence between the putative helix and the DH domain. In this study, we systematically investigated the structural basis, energetic landscape and biological implication underlying TIM auto-inhibition by using atomistic molecular dynamics simulations and binding free energy analysis. The computational study revealed that the binding of SH3 domain to poly-proline sequence is the prerequisite for the stabilization of TIM auto-inhibition. Thus, it is suggested that targeting SH3 domain with competitors of the poly-proline sequence would be a promising strategy to relieve the auto-inhibitory state of TIM. In this consideration, we rationally designed a number of peptide aptamers for competitively inhibiting the SH3 domain based on modeled TIM structure and computationally generated data. Peptide binding test and guanine nucleotide exchange analysis solidified that these designed peptides can both bind to the SH3 domain potently and activate TIM-catalyzed RhoA exchange reaction effectively. Interestingly, a positive correlation between the peptide affinity and induced exchange activity was observed. In addition, separate mutation of three conserved residues Pro49, Pro52 and Lys54 - they are required for peptide recognition by SH3 domain -- in a designed peptide to Ala would completely abolish the capability of this peptide activating TIM. All these come together to suggest an intrinsic relationship between peptide binding to SH3 domain and the activation of TIM. Copyright © 2015 Elsevier B.V. and Société Française de Biochimie et Biologie Moléculaire (SFBBM). All rights reserved.
Sahnoun, Mouna; Jemli, Sonia; Trabelsi, Sahar; Ayadi, Leila; Bejar, Samir
2016-01-01
We previously reported that Aspergillus oryzae strain S2 had produced two α-amylase isoforms named AmyA and AmyB. The apparent molecular masses revealed by SDS-PAGE were 50 and 42 kDa, respectively. Yet AmyB has a higher catalytic efficiency. Based on a monitoring study of the α-amylase production in both the presence and absence of different protease inhibitors, a chymotrypsin proteolysis process was detected in vivo generating AmyB. A. oryzae S2 α-amylase gene was amplified, cloned and sequenced. The sequence analysis revealed nine exons, eight introns and an encoding open reading frame of 1500 bp corresponding to AmyA isoform. The amino-acid sequence analysis revealed aY371 potential chymotrypsin cleaving site, likely to be the AmyB C-Terminal end and two other potential sites at Y359, and F379. A zymogram with a high acrylamide concentration was used. It highlighted two other closed apparent molecular mass α-amylases termed AmyB1 and AmyB2 reaching40 kDa and 43 kDa. These isoforms could be possibly generated fromY359, and F379secondary cut, respectively. The molecular modeling study showed that AmyB preserved the (β/α)8 barrel domain and the domain B but lacked the C-terminal domain C. The contact map analysis and the docking studies strongly suggested a higher activity and substrate binding affinity for AmyB than AmyA which was previously experimentally exhibited. This could be explained by the easy catalytic cleft accessibility. PMID:27101008
Matthews, R J; Cahir, E D; Thomas, M L
1990-01-01
Protein-tyrosine-phosphatases (protein-tyrosine-phosphate phosphohydrolase, EC 3.13.48) have been implicated in the regulation of cell growth; however, to date few tyrosine phosphatases have been characterized. To identify additional family members, the cDNA for the human tyrosine phosphatase leukocyte common antigen (LCA; CD45) was used to screen, under low stringency, a mouse pre-B-cell cDNA library. Two cDNA clones were isolated and sequence analysis predicts a protein sequence of 793 amino acids. We have named the molecule LRP (LCA-related phosphatase). RNA transfer analysis indicates that the cDNAs were derived from a 3.2-kilobase mRNA. The LRP mRNA is transcribed in a wide variety of tissues. The predicted protein structure can be divided into the following structural features: a short 19-amino acid leader sequence, an exterior domain of 123 amino acids that is predicted to be highly glycosylated, a 24-amino acid membrane-spanning region, and a 627-amino acid cytoplasmic region. The cytoplasmic region contains two approximately 260-amino acid domains, each with homology to the tyrosine phosphatase family. One of the cDNA clones differed in that it had a 108-base-pair insertion that, while preserving the reading frame, would disrupt the first protein-tyrosine-phosphatase domain. Analysis of genomic DNA indicates that the insertion is due to an alternatively spliced exon. LRP appears to be evolutionarily conserved as a putative homologue has been identified in the invertebrate Styela plicata. Images PMID:2162042
Auditory sequence analysis and phonological skill
Grube, Manon; Kumar, Sukhbinder; Cooper, Freya E.; Turton, Stuart; Griffiths, Timothy D.
2012-01-01
This work tests the relationship between auditory and phonological skill in a non-selected cohort of 238 school students (age 11) with the specific hypothesis that sound-sequence analysis would be more relevant to phonological skill than the analysis of basic, single sounds. Auditory processing was assessed across the domains of pitch, time and timbre; a combination of six standard tests of literacy and language ability was used to assess phonological skill. A significant correlation between general auditory and phonological skill was demonstrated, plus a significant, specific correlation between measures of phonological skill and the auditory analysis of short sequences in pitch and time. The data support a limited but significant link between auditory and phonological ability with a specific role for sound-sequence analysis, and provide a possible new focus for auditory training strategies to aid language development in early adolescence. PMID:22951739
Origins and Structural Properties of Novel and De Novo Protein Domains During Insect Evolution.
Klasberg, Steffen; Bitard-Feildel, Tristan; Callebaut, Isabelle; Bornberg-Bauer, Erich
2018-05-26
Over long time scales, protein evolution is characterised by modular rearrangements of protein domains. Such rearrangements are mainly caused by gene duplication, fusion and terminal losses. To better understand domain emergence mechanisms we investigated 32 insect genomes covering a speciation gradient ranging from ~ 2 to ~ 390 my. We use established domain models and foldable domains delineated by Hydrophobic-Cluster-Analysis (HCA), which does not require homologous sequences, to also identify domains which have likely arisen de novo, i.e. from previously non-coding DNA. Our results indicate that most novel domains emerge terminally as they originate from ORF extensions while fewer arise in middle arrangements, resulting from exonisation of intronic or intergenic regions. Many novel domains rapidly migrate between terminal or middle positions and single- and multi-domain arrangements. Young domains, such as most HCA defined domains, are under strong selection pressure as they show signals of purifying selection. De novo domains, linked to ancient domains or defined by HCA, have higher degrees of intrinsic disorder and disorder-to-order transition upon binding than ancient domains. However, the corresponding DNA sequences of the novel domains of denovo origins could only rarely be found in sister genomes. We conclude that novel domains are often recruited by other proteins and undergo important structural modifications shortly after their emergence, but evolve too fast to be characterised by cross-species comparisons alone. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Ndhlovu, Andrew; Durand, Pierre M.; Hazelhurst, Scott
2015-01-01
The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. Database URL: http://www.bioinf.wits.ac.za/software/fire/evodb PMID:26140928
Ndhlovu, Andrew; Durand, Pierre M; Hazelhurst, Scott
2015-01-01
The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. © The Author(s) 2015. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Liao, Zhijun; Wang, Xinrui; Zeng, Yeting; Zou, Quan
2016-12-01
The Dishevelled/EGL-10/Pleckstrin (DEP) domain-containing (DEPDC) proteins have seven members. However, whether this superfamily can be distinguished from other proteins based only on the amino acid sequences, remains unknown. Here, we describe a computational method to segregate DEPDCs and non-DEPDCs. First, we examined the Pfam numbers of the known DEPDCs and used the longest sequences for each Pfam to construct a phylogenetic tree. Subsequently, we extracted 188-dimensional (188D) and 20D features of DEPDCs and non-DEPDCs and classified them with random forest classifier. We also mined the motifs of human DEPDCs to find the related domains. Finally, we designed experimental verification methods of human DEPDC expression at the mRNA level in hepatocellular carcinoma (HCC) and adjacent normal tissues. The phylogenetic analysis showed that the DEPDCs superfamily can be divided into three clusters. Moreover, the 188D and 20D features can both be used to effectively distinguish the two protein types. Motif analysis revealed that the DEP and RhoGAP domain was common in human DEPDCs, human HCC and the adjacent tissues that widely expressed DEPDCs. However, their regulation was not identical. In conclusion, we successfully constructed a binary classifier for DEPDCs and experimentally verified their expression in human HCC tissues.
Zhao, Chunqing; Feng, Xiaoyun; Tang, Tao; Qiu, Lihong
2015-01-01
Cytochrome P450 monooxygenases (CYPs), as an enzyme superfamily, is widely distributed in organisms and plays a vital function in the metabolism of exogenous and endogenous compounds by interacting with its obligatory redox partner, CYP reductase (CPR). A novel CYP gene (CYP9A11) and CPR gene from the agricultural pest insect Spodoptera exigua were cloned and characterized. The complete cDNA sequences of SeCYP9A11 and SeCPR are 1,931 and 3,919 bp in length, respectively, and contain open reading frames of 1,593 and 2,070 nucleotides, respectively. Analysis of the putative protein sequences indicated that SeCYP9A11 contains a heme-binding domain and the unique characteristic sequence (SRFALCE) of the CYP9 family, in addition to a signal peptide and transmembrane segment at the N-terminal. Alignment analysis revealed that SeCYP9A11 shares the highest sequence similarity with CYP9A13 from Mamestra brassicae, which is 66.54%. The putative protein sequence of SeCPR has all of the classical CPR features, such as an N-terminal membrane anchor; three conserved domain flavin adenine dinucleotide (FAD), flavin mononucleotide (FMN), and nicotinamide adenine dinucleotide phosphate (NADPH) domain; and characteristic binding motifs. Phylogenetic analysis revealed that SeCPR shares the highest identity with HaCPR, which is 95.21%. The SeCYP9A11 and SeCPR genes were detected in the midgut, fat body, and cuticle tissues, and throughout all of the developmental stages of S. exigua. The mRNA levels of SeCYP9A11 and SeCPR decreased remarkably after exposure to plant secondary metabolites quercetin and tannin. The results regarding SeCYP9A11 and SeCPR genes in the current study provide foundation for the further study of S. exigua P450 system. PMID:26320261
Domain-specific learning of grammatical structure in musical and phonological sequences.
Bly, Benjamin Martin; Carrión, Ricardo E; Rasch, Björn
2009-01-01
Artificial grammar learning depends on acquisition of abstract structural representations rather than domain-specific representational constraints, or so many studies tell us. Using an artificial grammar task, we compared learning performance in two stimulus domains in which respondents have differing tacit prior knowledge. We found that despite grammatically identical sequence structures, learning was better for harmonically related chord sequences than for letter name sequences or harmonically unrelated chord sequences. We also found transfer effects within the musical and letter name tasks, but not across the domains. We conclude that knowledge acquired in implicit learning depends not only on abstract features of structured stimuli, but that the learning of regularities is in some respects domain-specific and strongly linked to particular features of the stimulus domain.
The sequence, structure and evolutionary features of HOTAIR in mammals
2011-01-01
Background An increasing number of long noncoding RNAs (lncRNAs) have been identified recently. Different from all the others that function in cis to regulate local gene expression, the newly identified HOTAIR is located between HoxC11 and HoxC12 in the human genome and regulates HoxD expression in multiple tissues. Like the well-characterised lncRNA Xist, HOTAIR binds to polycomb proteins to methylate histones at multiple HoxD loci, but unlike Xist, many details of its structure and function, as well as the trans regulation, remain unclear. Moreover, HOTAIR is involved in the aberrant regulation of gene expression in cancer. Results To identify conserved domains in HOTAIR and study the phylogenetic distribution of this lncRNA, we searched the genomes of 10 mammalian and 3 non-mammalian vertebrates for matches to its 6 exons and the two conserved domains within the 1800 bp exon6 using Infernal. There was just one high-scoring hit for each mammal, but many low-scoring hits were found in both mammals and non-mammalian vertebrates. These hits and their flanking genes in four placental mammals and platypus were examined to determine whether HOTAIR contained elements shared by other lncRNAs. Several of the hits were within unknown transcripts or ncRNAs, many were within introns of, or antisense to, protein-coding genes, and conservation of the flanking genes was observed only between human and chimpanzee. Phylogenetic analysis revealed discrete evolutionary dynamics for orthologous sequences of HOTAIR exons. Exon1 at the 5' end and a domain in exon6 near the 3' end, which contain domains that bind to multiple proteins, have evolved faster in primates than in other mammals. Structures were predicted for exon1, two domains of exon6 and the full HOTAIR sequence. The sequence and structure of two fragments, in exon1 and the domain B of exon6 respectively, were identified to robustly occur in predicted structures of exon1, domain B of exon6 and the full HOTAIR in mammals. Conclusions HOTAIR exists in mammals, has poorly conserved sequences and considerably conserved structures, and has evolved faster than nearby HoxC genes. Exons of HOTAIR show distinct evolutionary features, and a 239 bp domain in the 1804 bp exon6 is especially conserved. These features, together with the absence of some exons and sequences in mouse, rat and kangaroo, suggest ab initio generation of HOTAIR in marsupials. Structure prediction identifies two fragments in the 5' end exon1 and the 3' end domain B of exon6, with sequence and structure invariably occurring in various predicted structures of exon1, the domain B of exon6 and the full HOTAIR. PMID:21496275
Bhattacharya, D; Steinkötter, J; Melkonian, M
1993-12-01
Centrin (= caltractin) is a ubiquitous, cytoskeletal protein which is a member of the EF-hand superfamily of calcium-binding proteins. A centrin-coding cDNA was isolated and characterized from the prasinophyte green alga Scherffelia dubia. Centrin PCR amplification primers were used to isolate partial, homologous cDNA sequences from the green algae Tetraselmis striata and Spermatozopsis similis. Annealing analyses suggested that centrin is a single-copy-coding region in T. striata and S. similis and other green algae studied. Centrin-coding regions from S. dubia, S. similis and T. striata encode four colinear EF-hand domains which putatively bind calcium. Phylogenetic analyses, including homologous sequences from Chlamydomonas reinhardtii and the land plant Atriplex nummularia, demonstrate that the domains of centrins are congruent and arose from the two-fold duplication of an ancestral EF hand with Domains 1+3 and Domains 2+4 clustering. The domains of centrins are also congruent with those of calmodulins demonstrating that, like calmodulin, centrin is an ancient protein which arose within the ancestor of all eukaryotes via gene duplication. Phylogenetic relationships inferred from centrin-coding region comparisons mirror results of small subunit ribosomal RNA sequence analyses suggesting that centrin-coding regions are useful evolutionary markers within the green algae.
Barrera, Daniel; Valdecantos, Pablo A; García, E Vanesa; Miceli, Dora C
2012-02-01
The glycoprotein envelope surrounding the Bufo arenarum egg exists in different functional forms. Conversion between types involves proteolysis of specific envelope glycoproteins. When the egg is released from the ovary, the envelope cannot be penetrated by sperm. Conversion to a penetrable state occurs during passage through the pars recta portion of the oviduct, where oviductin, a serine protease with trypsin-like substrate specificity, hydrolyzes two kinds of envelope glycoproteins: gp84 and gp55. The nucleotide sequence of a 3203 bp B. arenarum oviductin cDNA was obtained. Deduced amino acid sequence showed a complete open reading frame encoding 980 amino acids. B. arenarum oviductin is a multi-domain protein with a protease domain at the N-terminal region followed by two CUB domains and toward the C-terminal region another protease domain, which lacked an active histidine site, and one CUB domain. Expression of ovochymase 2, the mammalian orthologous of amphibian oviductin, was assayed in mouse female reproductive tract. Ovochymase 2 mRNA was unnoticeable in the mouse oviduct but expression was remarkable in the uterus. Phylogenetic relationship between oviductin and ovochymase 2 opens the possibility to understand the role of this enzyme in mammalian reproduction.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karpinets, Tatiana V; Park, Byung; Syed, Mustafa H
2010-01-01
The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire non-redundant sequences of the CAZy database. Themore » second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains (DUF) and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit (CAT), and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.« less
Park, Byung H; Karpinets, Tatiana V; Syed, Mustafa H; Leuze, Michael R; Uberbacher, Edward C
2010-12-01
The Carbohydrate-Active Enzyme (CAZy) database provides a rich set of manually annotated enzymes that degrade, modify, or create glycosidic bonds. Despite rich and invaluable information stored in the database, software tools utilizing this information for annotation of newly sequenced genomes by CAZy families are limited. We have employed two annotation approaches to fill the gap between manually curated high-quality protein sequences collected in the CAZy database and the growing number of other protein sequences produced by genome or metagenome sequencing projects. The first approach is based on a similarity search against the entire nonredundant sequences of the CAZy database. The second approach performs annotation using links or correspondences between the CAZy families and protein family domains. The links were discovered using the association rule learning algorithm applied to sequences from the CAZy database. The approaches complement each other and in combination achieved high specificity and sensitivity when cross-evaluated with the manually curated genomes of Clostridium thermocellum ATCC 27405 and Saccharophagus degradans 2-40. The capability of the proposed framework to predict the function of unknown protein domains and of hypothetical proteins in the genome of Neurospora crassa is demonstrated. The framework is implemented as a Web service, the CAZymes Analysis Toolkit, and is available at http://cricket.ornl.gov/cgi-bin/cat.cgi.
Atomic interaction networks in the core of protein domains and their native folds.
Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S; Sasisekharan, V; Sasisekharan, Ram
2010-02-23
Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be "signature" of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1-2 angstroms (mean 1.61A) C(alpha) RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the 'twilight' and 'midnight' zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools.
Atomic Interaction Networks in the Core of Protein Domains and Their Native Folds
Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S.; Sasisekharan, V.; Sasisekharan, Ram
2010-01-01
Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be “signature” of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1–2 angstroms (mean 1.61A) Cα RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the ‘twilight’ and ‘midnight’ zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools. PMID:20186337
DOE Office of Scientific and Technical Information (OSTI.GOV)
Helander, Sara; Montecchio, Meri; Lemak, Alexander
Highlights: • We describe the structure of a novel fold in FKBP25 and HectD. • The new fold is named the Basic Tilted Helix Bundle (BTHB) domain. • A conserved basic surface patch is presented, suggesting a functional role. - Abstract: In this paper, we describe the structure of a N-terminal domain motif in nuclear-localized FKBP25{sub 1–73}, a member of the FKBP family, together with the structure of a sequence-related subdomain of the E3 ubiquitin ligase HectD1 that we show belongs to the same fold. This motif adopts a compact 5-helix bundle which we name the Basic Tilted Helix Bundlemore » (BTHB) domain. A positively charged surface patch, structurally centered around the tilted helix H4, is present in both FKBP25 and HectD1 and is conserved in both proteins, suggesting a conserved functional role. We provide detailed comparative analysis of the structures of the two proteins and their sequence similarities, and analysis of the interaction of the proposed FKBP25 binding protein YY1. We suggest that the basic motif in BTHB is involved in the observed DNA binding of FKBP25, and that the function of this domain can be affected by regulatory YY1 binding and/or interactions with adjacent domains.« less
NIDO, AMOP and vWD domains of MUC4 play synergic role in MUC4 mediated signaling
Liu, Xian; Xie, Kun-Ling; Tang, Jie; Jiang, Kui-Rong; Gao, Wen-Tao; Tian, Lei; Zhang, Kai; Xu, Ze-Kuan; Miao, Yi
2017-01-01
MUC4 mucin is well known as an important potential target to overcome pancreatic cancer. Three unique domains (NIDO, AMOP, and vWD) with unclear roles only present in MUC4 but are not found in other membrane-bound mucins. Our previous studies first reported that its splice variant, MUC4/Y can be a model of MUC4 (MUC4 gene fragment is more than 30KB, too huge to clone and eukaryotic express) in pancreatic cancer. More importantly, based on MUC4/Y with the appropriate length of gene sequence, it is easy to construct the unique domain-lacking models of MUC4/Y (MUC4) for research. The present study focuses on investigation of the respective role of the unique NIDO, AMOP, and vWD domain or their synergistic effect on MUC4(MUC4/Y)-mediated functions and mechanisms by series of in vitro assays, sequence-based transcriptome analysis, validation of qRT-PCR & Western blot, and systematic comparative analysis. Our results demonstrate: 1) NIDO, AMOP, and vWD domain or their synergy play significant roles on MUC4/Y-mediated malignant function of pancreatic cancer, downstream of molecule mechanisms, particularly MUC4/Y-triggered malignancy-related positive feedback loops, respectively. 2) The synergistic roles of three unique domains on MUC4/Y-mediated functions and mechanisms are more prominent than the respective domain because the synergy of three domain plays the more remarkable effects on MUC4/Y-mediated signaling hub. Thus, to improve reversed effects of domain-lacking and break the synergism of domains will contribute to block MUC4/Y(MUC4) triggering various oncogenic signaling pathways. PMID:28060749
NIDO, AMOP and vWD domains of MUC4 play synergic role in MUC4 mediated signaling.
Zhu, Yi; Zhang, Jing-Jing; Peng, Yun-Peng; Liu, Xian; Xie, Kun-Ling; Tang, Jie; Jiang, Kui-Rong; Gao, Wen-Tao; Tian, Lei; Zhang, Kai; Xu, Ze-Kuan; Miao, Yi
2017-02-07
MUC4 mucin is well known as an important potential target to overcome pancreatic cancer. Three unique domains (NIDO, AMOP, and vWD) with unclear roles only present in MUC4 but are not found in other membrane-bound mucins. Our previous studies first reported that its splice variant, MUC4/Y can be a model of MUC4 (MUC4 gene fragment is more than 30KB, too huge to clone and eukaryotic express) in pancreatic cancer. More importantly, based on MUC4/Y with the appropriate length of gene sequence, it is easy to construct the unique domain-lacking models of MUC4/Y (MUC4) for research. The present study focuses on investigation of the respective role of the unique NIDO, AMOP, and vWD domain or their synergistic effect on MUC4(MUC4/Y)-mediated functions and mechanisms by series of in vitro assays, sequence-based transcriptome analysis, validation of qRT-PCR & Western blot, and systematic comparative analysis. Our results demonstrate: 1) NIDO, AMOP, and vWD domain or their synergy play significant roles on MUC4/Y-mediated malignant function of pancreatic cancer, downstream of molecule mechanisms, particularly MUC4/Y-triggered malignancy-related positive feedback loops, respectively. 2) The synergistic roles of three unique domains on MUC4/Y-mediated functions and mechanisms are more prominent than the respective domain because the synergy of three domain plays the more remarkable effects on MUC4/Y-mediated signaling hub. Thus, to improve reversed effects of domain-lacking and break the synergism of domains will contribute to block MUC4/Y(MUC4) triggering various oncogenic signaling pathways.
Wolffe, E J; Gause, W C; Pelfrey, C M; Holland, S M; Steinberg, A D; August, J T
1990-01-05
We describe the isolation and sequencing of a cDNA encoding mouse Pgp-1. An oligonucleotide probe corresponding to the NH2-terminal sequence of the purified protein was synthesized by the polymerase chain reaction and used to screen a mouse macrophage lambda gt11 library. A cDNA clone with an insert of 1.2 kilobases was selected and sequenced. In Northern blot analysis, only cells expressing Pgp-1 contained mRNA species that hybridized with this Pgp-1 cDNA. The nucleotide sequence of the cDNA has a single open reading frame that yields a protein-coding sequence of 1076 base pairs followed by a 132-base pair 3'-untranslated sequence that includes a putative polyadenylation signal but no poly(A) tail. The translated sequence comprises a 13-amino acid signal peptide followed by a polypeptide core of 345 residues corresponding to an Mr of 37,800. Portions of the deduced amino acid sequence were identical to those obtained by amino acid sequence analysis from the purified glycoprotein, confirming that the cDNA encodes Pgp-1. The predicted structure of Pgp-1 includes an NH2-terminal extracellular domain (residues 14-265), a transmembrane domain (residues 266-286), and a cytoplasmic tail (residues 287-358). Portions of the mouse Pgp-1 sequence are highly similar to that of the human CD44 cell surface glycoprotein implicated in cell adhesion. The protein also shows sequence similarity to the proteoglycan tandem repeat sequences found in cartilage link protein and cartilage proteoglycan core protein which are thought to be involved in binding to hyaluronic acid.
Functional metagenomics reveals novel β-galactosidases not predictable from gene sequences.
Cheng, Jiujun; Romantsov, Tatyana; Engel, Katja; Doxey, Andrew C; Rose, David R; Neufeld, Josh D; Charles, Trevor C
2017-01-01
The techniques of metagenomics have allowed researchers to access the genomic potential of uncultivated microbes, but there remain significant barriers to determination of gene function based on DNA sequence alone. Functional metagenomics, in which DNA is cloned and expressed in surrogate hosts, can overcome these barriers, and make important contributions to the discovery of novel enzymes. In this study, a soil metagenomic library carried in an IncP cosmid was used for functional complementation for β-galactosidase activity in both Sinorhizobium meliloti (α-Proteobacteria) and Escherichia coli (γ-Proteobacteria) backgrounds. One β-galactosidase, encoded by six overlapping clones that were selected in both hosts, was identified as a member of glycoside hydrolase family 2. We could not identify ORFs obviously encoding possible β-galactosidases in 19 other sequenced clones that were only able to complement S. meliloti. Based on low sequence identity to other known glycoside hydrolases, yet not β-galactosidases, three of these ORFs were examined further. Biochemical analysis confirmed that all three encoded β-galactosidase activity. Lac36W_ORF11 and Lac161_ORF7 had conserved domains, but lacked similarities to known glycoside hydrolases. Lac161_ORF10 had neither conserved domains nor similarity to known glycoside hydrolases. Bioinformatic and structural modeling implied that Lac161_ORF10 protein represented a novel enzyme family with a five-bladed propeller glycoside hydrolase domain. By discovering founding members of three novel β-galactosidase families, we have reinforced the value of functional metagenomics for isolating novel genes that could not have been predicted from DNA sequence analysis alone.
Emergence and subsequent functional specialization of kindlins during evolution of cell adhesiveness
Meller, Julia; Rogozin, Igor B.; Poliakov, Eugenia; Meller, Nahum; Bedanov-Pack, Mark; Plow, Edward F.; Qin, Jun; Podrez, Eugene A.; Byzova, Tatiana V.
2015-01-01
Kindlins are integrin-interacting proteins essential for integrin-mediated cell adhesiveness. In this study, we focused on the evolutionary origin and functional specialization of kindlins as a part of the evolutionary adaptation of cell adhesive machinery. Database searches revealed that many members of the integrin machinery (including talin and integrins) existed before kindlin emergence in evolution. Among the analyzed species, all metazoan lineages—but none of the premetazoans—had at least one kindlin-encoding gene, whereas talin was present in several premetazoan lineages. Kindlin appears to originate from a duplication of the sequence encoding the N-terminal fragment of talin (the talin head domain) with a subsequent insertion of the PH domain of separate origin. Sequence analysis identified a member of the actin filament–associated protein 1 (AFAP1) superfamily as the most likely origin of the kindlin PH domain. The functional divergence between kindlin paralogues was assessed using the sequence swap (chimera) approach. Comparison of kindlin 2 (K2)/kindlin 3 (K3) chimeras revealed that the F2 subdomain, in particular its C-terminal part, is crucial for the differential functional properties of K2 and K3. The presence of this segment enables K2 but not K3 to localize to focal adhesions. Sequence analysis of the C-terminal part of the F2 subdomain of K3 suggests that insertion of a variable glycine-rich sequence in vertebrates contributed to the loss of constitutive K3 targeting to focal adhesions. Thus emergence and subsequent functional specialization of kindlins allowed multicellular organisms to develop additional tissue-specific adaptations of cell adhesiveness. PMID:25540429
Heinz, Eva; Stubenrauch, Christopher J.; Grinter, Rhys; Croft, Nathan P.; Purcell, Anthony W.; Strugnell, Richard A.; Dougan, Gordon; Lithgow, Trevor
2016-01-01
The bacterial cell surface proteins intimin and invasin are virulence factors that share a common domain structure and bind selectively to host cell receptors in the course of bacterial pathogenesis. The β-barrel domains of intimin and invasin show significant sequence and structural similarities. Conversely, a variety of proteins with sometimes limited sequence similarity have also been annotated as “intimin-like” and “invasin” in genome datasets, while other recent work on apparently unrelated virulence-associated proteins ultimately revealed similarities to intimin and invasin. Here we characterize the sequence and structural relationships across this complex protein family. Surprisingly, intimins and invasins represent a very small minority of the sequence diversity in what has been previously the “intimin/invasin protein family”. Analysis of the assembly pathway for expression of the classic intimin, EaeA, and a characteristic example of the most prevalent members of the group, FdeC, revealed a dependence on the translocation and assembly module as a common feature for both these proteins. While the majority of the sequences in the grouping are most similar to FdeC, a further and widespread group is two-partner secretion systems that use the β-barrel domain as the delivery device for secretion of a variety of virulence factors. This comprehensive analysis supports the adoption of the “inverse autotransporter protein family” as the most accurate nomenclature for the family and, in turn, has important consequences for our overall understanding of the Type V secretion systems of bacterial pathogens. PMID:27190006
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Suhkmann; Zhang, Ziming; Upchurch, Sean
2004-04-16
2 ARID is a homologous family of DNA-binding domains that occur in DNA binding proteins from a wide variety of species, ranging from yeast to nematodes, insects, mammals and plants. SWI1, a member of the SWI/SNF protein complex that is involved in chromatin remodeling during transcription, contains the ARID motif. The ARID domain of human SWI1 (also known as p270) does not select for a specific DNA sequence from a random sequence pool. The lack of sequence specificity shown by the SWI1 ARID domain stands in contrast to the other characterized ARID domains, which recognize specific AT-rich sequences. We havemore » solved the three-dimensional structure of human SWI1 ARID using solution NMR methods. In addition, we have characterized non-specific DNA-binding by the SWI1 ARID domain. Results from this study indicate that a flexible long internal loop in ARID motif is likely to be important for sequence specific DNA-recognition. The structure of human SWI1 ARID domain also represents a distinct structural subfamily. Studies of ARID indicate that boundary of the DNA binding structural and functional domains can extend beyond the sequence homologous region in a homologous family of proteins. Structural studies of homologous domains such as ARID family of DNA-binding domains should provide information to better predict the boundary of structural and functional domains in structural genomic studies. Key Words: ARID, SWI1, NMR, structural genomics, protein-DNA interaction.« less
Shilling, F M; Krätzschmar, J; Cai, H; Weskamp, G; Gayko, U; Leibow, J; Myles, D G; Nuccitelli, R; Blobel, C P
1997-06-15
Proteins containing a membrane-anchored metalloprotease domain, a disintegrin domain, and a cysteine-rich region (MDC proteins) are thought to play an important role in mammalian fertilization, as well as in somatic cell-cell interactions. We have identified PCR sequence tags encoding the disintegrin domain of five distinct MDC proteins from Xenopus laevis testis cDNA. Four of these sequence tags (xMDC9, xMDC11.1, xMDC11.2, and xMDC13) showed strong similarity to known mammalian MDC proteins, whereas the fifth (xMDC16) apparently represents a novel family member. Northern blot analysis revealed that the mRNA for xMDC16 was only expressed in testis, and not in heart, muscle, liver, ovaries, or eggs, whereas the mRNAs corresponding to the four other PCR products were expressed in testis and in some or all somatic tissues tested. The xMDC16 protein sequence, as predicted from the full-length cDNA, contains a metalloprotease domain with the active-site sequence HEXXH, a disintegrin domain, a cysteine-rich region, an EGF repeat, a transmembrane domain, and a short cytoplasmic tail. To study a potential role for these xMDC proteins in fertilization, peptides corresponding to the predicted integrin-binding domain of each protein were tested for their ability to inhibit X. laevis fertilization. Cyclic and linear xMDC16 peptides inhibited fertilization in a concentration-dependent manner, whereas xMDC16 peptides that were scrambled or had certain amino acid replacements in the predicted integrin-binding domain did not affect fertilization. Cyclic and linear xMDC9 peptides and linear xMDC13 peptides also inhibited fertilization similarly to xMDC16 peptides, whereas peptides corresponding to the predicted integrin-binding site of xMDC11.1 and xMDC11.2 did not. These results are discussed in the context of a model in which multiple MDC protein-receptor interactions are necessary for fertilization to occur.
HRV Analysis to Identify Stages of Home-based Telerehabilitation Exercise.
Jeong, In Cheol; Finkelstein, Joseph
2014-01-01
Spectral analysis of heart rate variability (HRV) has been widely used to investigate activity of autonomous nervous system. Previous studies demonstrated potential of analysis of short-term sequences of heart rate data in a time domain for continuous monitoring of levels of physiological stress however the value of HRV parameters in frequency domain for monitoring cycling exercise has not been established. The goal of this study was to assess whether HRV parameters in frequency domain differ depending on a stage of cycling exercise. We compared major HRV parameters in high, low and very low frequency ranges during rest, height of exercise, and recovery during cycling exercise. Our results indicated responsiveness of frequency-domain indices to different phases of cycling exercise program and their potential in monitoring autonomic balance and stress levels as a part of a tailored home-based telerehabilitation program.
Das, Debanu; Finn, Robert D; Abdubek, Polat; Astakhova, Tamara; Axelrod, Herbert L; Bakolitsa, Constantina; Cai, Xiaohui; Carlton, Dennis; Chen, Connie; Chiu, Hsiu-Ju; Chiu, Michelle; Clayton, Thomas; Deller, Marc C; Duan, Lian; Ellrott, Kyle; Farr, Carol L; Feuerhelm, Julie; Grant, Joanna C; Grzechnik, Anna; Han, Gye Won; Jaroszewski, Lukasz; Jin, Kevin K; Klock, Heath E; Knuth, Mark W; Kozbial, Piotr; Sri Krishna, S; Kumar, Abhinav; Lam, Winnie W; Marciano, David; Miller, Mitchell D; Morse, Andrew T; Nigoghossian, Edward; Nopakun, Amanda; Okach, Linda; Puckett, Christina; Reyes, Ron; Tien, Henry J; Trame, Christine B; van den Bedem, Henry; Weekes, Dana; Wooten, Tiffany; Xu, Qingping; Yeh, Andrew; Zhou, Jiadong; Hodgson, Keith O; Wooley, John; Elsliger, Marc-André; Deacon, Ashley M; Godzik, Adam; Lesley, Scott A; Wilson, Ian A
2010-01-01
Sufu (Suppressor of Fused), a two-domain protein, plays a critical role in regulating Hedgehog signaling and is conserved from flies to humans. A few bacterial Sufu-like proteins have previously been identified based on sequence similarity to the N-terminal domain of eukaryotic Sufu proteins, but none have been structurally or biochemically characterized and their function in bacteria is unknown. We have determined the crystal structure of a more distantly related Sufu-like homolog, NGO1391 from Neisseria gonorrhoeae, at 1.4 Å resolution, which provides the first biophysical characterization of a bacterial Sufu-like protein. The structure revealed a striking similarity to the N-terminal domain of human Sufu (r.m.s.d. of 2.6 Å over 93% of the NGO1391 protein), despite an extremely low sequence identity of ∼15%. Subsequent sequence analysis revealed that NGO1391 defines a new subset of smaller, Sufu-like proteins that are present in ∼200 bacterial species and has resulted in expansion of the SUFU (PF05076) family in Pfam. PMID:20836087
Heinz, Eva; Lithgow, Trevor
2014-01-01
Members of the Omp85/TpsB protein superfamily are ubiquitously distributed in Gram-negative bacteria, and function in protein translocation (e.g., FhaC) or the assembly of outer membrane proteins (e.g., BamA). Several recent findings are suggestive of a further level of variation in the superfamily, including the identification of the novel membrane protein assembly factor TamA and protein translocase PlpD. To investigate the diversity and the causal evolutionary events, we undertook a comprehensive comparative sequence analysis of the Omp85/TpsB proteins. A total of 10 protein subfamilies were apparent, distinguished in their domain structure and sequence signatures. In addition to the proteins FhaC, BamA, and TamA, for which structural and functional information is available, are families of proteins with so far undescribed domain architectures linked to the Omp85 β-barrel domain. This study brings a classification structure to a dynamic protein superfamily of high interest given its essential function for Gram-negative bacteria as well as its diverse domain architecture, and we discuss several scenarios of putative functions of these so far undescribed proteins. PMID:25101071
2010-01-01
Background Expansins form a large multi-gene family found in wheat and other cereal genomes that are involved in the expansion of cell walls as a tissue grows. The expansin family can be divided up into two main groups, namely, alpha-expansin (EXPA) and beta-expansin proteins (EXPB), with the EXPB group being of particular interest as group 1-pollen allergens. Results In this study, three beta-expansin genes were identified and characterized from a newly sequenced region of the Triticum aestivum cv. Chinese Spring chromosome 3B physical map at the Sr2 locus (FPC contig ctg11). The analysis of a 357 kb sub-sequence of FPC contig ctg11 identified one beta-expansin genes to be TaEXPB11, originally identified as a cDNA from the wheat cv Wyuna. Through the analysis of intron sequences of the three wheat cv. Chinese Spring genes, we propose that two of these beta-expansin genes are duplications of the TaEXPB11 gene. Comparative sequence analysis with two other wheat cultivars (cv. Westonia and cv. Hope) and a Triticum aestivum var. spelta line validated the identification of the Chinese Spring variant of TaEXPB11. The expression in maternal and grain tissues was confirmed by examining EST databases and carrying out RT-PCR experiments. Detailed examination of the position of TaEXPB11 relative to the locus encoding Sr2 disease resistance ruled out the possibility of this gene directly contributing to the resistance phenotype. Conclusions Through 3-D structural protein comparisons with Zea mays EXPB1, we proposed that variations within the coding sequence of TaEXPB11 in wheats may produce a functional change within features such as domain 1 related to possible involvement in cell wall structure and domain 2 defining the pollen allergen domain and binding to IgE protein. The variation established in this gene suggests it is a clearly identifiable member of a gene family and reflects the dynamic features of the wheat genome as it adapted to a range of different environments and uses. Accession Numbers: ctg11 =FN564426 Survey sequences of TaEXPB11ws and TsEXPB11 are provided request. PMID:20507562
Hashemi, Seirana; Nowzari Dalini, Abbas; Jalali, Adrin; Banaei-Moghaddam, Ali Mohammad; Razaghi-Moghadam, Zahra
2017-08-16
Discriminating driver mutations from the ones that play no role in cancer is a severe bottleneck in elucidating molecular mechanisms underlying cancer development. Since protein domains are representatives of functional regions within proteins, mutations on them may disturb the protein functionality. Therefore, studying mutations at domain level may point researchers to more accurate assessment of the functional impact of the mutations. This article presents a comprehensive study to map mutations from 29 cancer types to both sequence- and structure-based domains. Statistical analysis was performed to identify candidate domains in which mutations occur with high statistical significance. For each cancer type, the corresponding type-specific domains were distinguished among all candidate domains. Subsequently, cancer type-specific domains facilitated the identification of specific proteins for each cancer type. Besides, performing interactome analysis on specific proteins of each cancer type showed high levels of interconnectivity among them, which implies their functional relationship. To evaluate the role of mitochondrial genes, stem cell-specific genes and DNA repair genes in cancer development, their mutation frequency was determined via further analysis. This study has provided researchers with a publicly available data repository for studying both CATH and Pfam domain regions on protein-coding genes. Moreover, the associations between different groups of genes/domains and various cancer types have been clarified. The work is available at http://www.cancerouspdomains.ir .
Vaira, A M; Accotto, G P; Costantini, A; Milne, R G
2003-06-01
A 4018 nucleotide sequence was obtained for RNA 1 of Ranunculus white mottle virus (RWMV), genus Ophiovirus, representing an incomplete ORF of 1339 aa. Amino acid sequence analysis revealed significant similarities with RNA polymerases of viruses in the family Rhabdoviridae and a conserved domain of 685 aa, corresponding to the RdRp domain of those in the order Mononegavirales. Phylogenetic analysis indicated that the genus Ophiovirus is not related to the genus Tenuivirus or the family Bunyaviridae, with which it has been linked, and probably deserves a special taxonomic position, within a new family. A pair of degenerate primers was designed from a consensus sequence obtained from a relatively conserved region in the RNA 1 of two members of the genus, Citrus psorosis virus (CPsV) and RWMV. The primers, used in RT-PCR experiments, amplified a 136 bp DNA fragment from all the three recognized members of the genus, i.e. CPsV, RWMV and Tulip mild mottle mosaic virus (TMMMV) and from two tentative ophioviruses from lettuce and freesia. The amplified DNAs were sequenced and compared with the corresponding sequences of CPsV and RWMV and phylogenetic relationships were evaluated. Assays using extracts from plants infected by viruses belonging to the genera Tospovirus, Tenuivirus, Rhabdovirus and Varicosavirus indicated that the primers are genus-specific.
Zhao, A; Guo, A; Liu, Z; Pape, L
1997-01-01
The coding sequences for a Schizosaccharomyces pombe sequence-specific DNA binding protein, Reb1p, have been cloned. The predicted S. pombe Reb1p is 24-29% identical to mouse TTF-1 (transcription termination factor-1) and Saccharomyces cerevisiae REB1 protein, both of which direct termination of RNA polymerase I catalyzed transcripts. The S.pombe Reb1 cDNA encodes a predicted polypeptide of 504 amino acids with a predicted molecular weight of 58.4 kDa. The S. pombe Reb1p is unusual in that the bipartite DNA binding motif identified originally in S.cerevisiae and Klyveromyces lactis REB1 proteins is uninterrupted and thus S.pombe Reb1p may contain the smallest natural REB1 homologous DNA binding domain. Its genomic coding sequences were shown to be interrupted by two introns. A recombinant histidine-tagged Reb1 protein bearing the rDNA binding domain has two homologous, sequence-specific binding sites in the S. pomber DNA intergenic spacer, located between 289 and 480 nt downstream of the end of the approximately 25S rRNA coding sequences. Each binding site is 13-14 bp downstream of two of the three proposed in vivo termination sites. The core of this 17 bp site, AGGTAAGGGTAATGCAC, is specifically protected by Reb1p in footprinting analysis. PMID:9016645
Shin, Dong-Ho; Webb, Barbara M; Nakao, Miki; Smith, Sylvia L
2009-07-01
Complement factor I is a crucial regulator of mammalian complement activity. Very little is known of complement regulators in non-mammalian species. We isolated and sequenced four highly similar complement factor I cDNAs from the liver of the nurse shark (Ginglymostoma cirratum), designated as GcIf-1, GcIf-2, GcIf-3 and GcIf-4 (previously referred to as nsFI-a, -b, -c and -d) which encode 689, 673, 673 and 657 amino acid residues, respectively. They share 95% (
Shin, Dong-Ho; Webb, Barbara M.; Nakao, Miki; Smith, Sylvia L.
2009-01-01
Complement factor I is a crucial regulator of mammalian complement activity. Very little is known of complement regulators in non-mammalian species. We isolated and sequenced four highly similar complement factor I cDNAs from the liver of the nurse shark (Ginglymostoma cirratum), designated as GcIf-1, GcIf-2, GcIf-3 and GcIf-4 (previously referred to as nsFI-a, -b, -c and –d) which encode 689, 673, 673 and 657 amino acid residues, respectively. They share 95% (≤) amino acid identities with each other, 35.4 ~ 39.6% and 62.8 ~ 65.9% with factor I of mammals and banded houndshark (Triakis scyllium), respectively. The modular structure of the GcIf is similar to that of mammals with one notable exception, the presence of a novel shark-specific sequence between the leader peptide (LP) and the factor I membrane attack complex (FIMAC) domain. The cDNA sequences differ only in the size and composition of the shark-specific region (SSR). Sequence analysis of each SSR has identified within the region two novel short sequences (SS1 and SS2) and three repeat sequences (RS1, 2 and 3). Genomic analysis has revealed the existence of three introns between the leader peptide and the FIMAC domain, tentatively designated intron 1, intron 2, and intron 3 which span 4067, 2293 and 2082 bp, respectively. Southern blot analysis suggests the presence of a single gene copy for each cDNA type. Phylogenetic analysis suggests that complement factor I of cartilaginous fish diverged prior to the emergence of mammals. All four GcIf cDNA species are expressed in four different tissues and the liver is the main tissue in which expression level of all four is high. This suggests that the expression of GcIf isotypes is tissue-dependent. PMID:19423168
Ferles, Christos; Beaufort, William-Scott; Ferle, Vanessa
2017-01-01
The present study devises mapping methodologies and projection techniques that visualize and demonstrate biological sequence data clustering results. The Sequence Data Density Display (SDDD) and Sequence Likelihood Projection (SLP) visualizations represent the input symbolical sequences in a lower-dimensional space in such a way that the clusters and relations of data elements are depicted graphically. Both operate in combination/synergy with the Self-Organizing Hidden Markov Model Map (SOHMMM). The resulting unified framework is in position to analyze automatically and directly raw sequence data. This analysis is carried out with little, or even complete absence of, prior information/domain knowledge.
Reductive evolution and the loss of PDC/PAS domains from the genus Staphylococcus
2013-01-01
Background The Per-Arnt-Sim (PAS) domain represents a ubiquitous structural fold that is involved in bacterial sensing and adaptation systems, including several virulence related functions. Although PAS domains and the subclass of PhoQ-DcuS-CitA (PDC) domains have a common structure, there is limited amino acid sequence similarity. To gain greater insight into the evolution of PDC/PAS domains present in the bacterial kingdom and staphylococci in specific, the PDC/PAS domains from the genomic sequences of 48 bacteria, representing 5 phyla, were identified using the sensitive search method based on HMM-to-HMM comparisons (HHblits). Results A total of 1,007 PAS domains and 686 PDC domains distributed over 1,174 proteins were identified. For 28 Gram-positive bacteria, the distribution, organization, and molecular evolution of PDC/PAS domains were analyzed in greater detail, with a special emphasis on the genus Staphylococcus. Compared to other bacteria the staphylococci have relatively fewer proteins (6–9) containing PDC/PAS domains. As a general rule, the staphylococcal genomes examined in this study contain a core group of seven PDC/PAS domain-containing proteins consisting of WalK, SrrB, PhoR, ArlS, HssS, NreB, and GdpP. The exceptions to this rule are: 1) S. saprophyticus lacks the core NreB protein; 2) S. carnosus has two additional PAS domain containing proteins; 3) S. epidermidis, S. aureus, and S. pseudintermedius have an additional protein with two PDC domains that is predicted to code for a sensor histidine kinase; 4) S. lugdunensis has an additional PDC containing protein predicted to be a sensor histidine kinase. Conclusions This comprehensive analysis demonstrates that variation in PDC/PAS domains among bacteria has limited correlations to the genome size or pathogenicity; however, our analysis established that bacteria having a motile phase in their life cycle have significantly more PDC/PAS-containing proteins. In addition, our analysis revealed a tremendous amount of variation in the number of PDC/PAS-containing proteins within genera. This variation extended to the Staphylococcus genus, which had between 6 and 9 PDC/PAS proteins and some of these appear to be previously undescribed signaling proteins. This latter point is important because most staphylococcal proteins that contain PDC/PAS domains regulate virulence factor synthesis or antibiotic resistance. PMID:23902280
Reductive evolution and the loss of PDC/PAS domains from the genus Staphylococcus.
Shah, Neethu; Gaupp, Rosmarie; Moriyama, Hideaki; Eskridge, Kent M; Moriyama, Etsuko N; Somerville, Greg A
2013-07-31
The Per-Arnt-Sim (PAS) domain represents a ubiquitous structural fold that is involved in bacterial sensing and adaptation systems, including several virulence related functions. Although PAS domains and the subclass of PhoQ-DcuS-CitA (PDC) domains have a common structure, there is limited amino acid sequence similarity. To gain greater insight into the evolution of PDC/PAS domains present in the bacterial kingdom and staphylococci in specific, the PDC/PAS domains from the genomic sequences of 48 bacteria, representing 5 phyla, were identified using the sensitive search method based on HMM-to-HMM comparisons (HHblits). A total of 1,007 PAS domains and 686 PDC domains distributed over 1,174 proteins were identified. For 28 Gram-positive bacteria, the distribution, organization, and molecular evolution of PDC/PAS domains were analyzed in greater detail, with a special emphasis on the genus Staphylococcus. Compared to other bacteria the staphylococci have relatively fewer proteins (6-9) containing PDC/PAS domains. As a general rule, the staphylococcal genomes examined in this study contain a core group of seven PDC/PAS domain-containing proteins consisting of WalK, SrrB, PhoR, ArlS, HssS, NreB, and GdpP. The exceptions to this rule are: 1) S. saprophyticus lacks the core NreB protein; 2) S. carnosus has two additional PAS domain containing proteins; 3) S. epidermidis, S. aureus, and S. pseudintermedius have an additional protein with two PDC domains that is predicted to code for a sensor histidine kinase; 4) S. lugdunensis has an additional PDC containing protein predicted to be a sensor histidine kinase. This comprehensive analysis demonstrates that variation in PDC/PAS domains among bacteria has limited correlations to the genome size or pathogenicity; however, our analysis established that bacteria having a motile phase in their life cycle have significantly more PDC/PAS-containing proteins. In addition, our analysis revealed a tremendous amount of variation in the number of PDC/PAS-containing proteins within genera. This variation extended to the Staphylococcus genus, which had between 6 and 9 PDC/PAS proteins and some of these appear to be previously undescribed signaling proteins. This latter point is important because most staphylococcal proteins that contain PDC/PAS domains regulate virulence factor synthesis or antibiotic resistance.
De Silva, Jeremy Ryan; Lau, Yee Ling; Fong, Mun Yik
2017-01-03
The simian malaria parasite Plasmodium knowlesi has been reported to cause significant numbers of human infection in South East Asia. Its merozoite surface protein-3 (MSP3) is a protein that belongs to a multi-gene family of proteins first found in Plasmodium falciparum. Several studies have evaluated the potential of P. falciparum MSP3 as a potential vaccine candidate. However, to date no detailed studies have been carried out on P. knowlesi MSP3 gene (pkmsp3). The present study investigates the genetic diversity, and haplotypes groups of pkmsp3 in P. knowlesi clinical samples from Peninsular Malaysia. Blood samples were collected from P. knowlesi malaria patients within a period of 4 years (2008-2012). The pkmsp3 gene of the isolates was amplified via PCR, and subsequently cloned and sequenced. The full length pkmsp3 sequence was divided into Domain A and Domain B. Natural selection, genetic diversity, and haplotypes of pkmsp3 were analysed using MEGA6 and DnaSP ver. 5.10.00 programmes. From 23 samples, 48 pkmsp3 sequences were successfully obtained. At the nucleotide level, 101 synonymous and 238 non-synonymous mutations were observed. Tests of neutrality were not significant for the full length, Domain A or Domain B sequences. However, the dN/dS ratio of Domain B indicates purifying selection for this domain. Analysis of the deduced amino acid sequences revealed 42 different haplotypes. Neighbour Joining phylogenetic tree and haplotype network analyses revealed that the haplotypes clustered into two distinct groups. A moderate level of genetic diversity was observed in the pkmsp3 and only the C-terminal region (Domain B) appeared to be under purifying selection. The separation of the pkmsp3 into two haplotype groups provides further evidence of the existence of two distinct P. knowlesi types or lineages. Future studies should investigate the diversity of pkmsp3 among P. knowlesi isolates in North Borneo, where large numbers of human knowlesi malaria infection still occur.
Genome Sequences of Three Cluster AU Arthrobacter Phages, Caterpillar, Nightmare, and Teacup
Adair, Tamarah L.; Stowe, Emily; Pizzorno, Marie C.; Krukonis, Gregory; Harrison, Melinda; Garlena, Rebecca A.; Russell, Daniel A.; Jacobs-Sera, Deborah
2017-01-01
ABSTRACT Caterpillar, Nightmare, and Teacup are cluster AU siphoviral phages isolated from enriched soil on Arthrobacter sp. strain ATCC 21022. These genomes are 58 kbp long with an average G+C content of 50%. Sequence analysis predicts 86 to 92 protein-coding genes, including a large number of small proteins with predicted transmembrane domains. PMID:29122860
NASA Technical Reports Server (NTRS)
Kopczynski, E. D.; Bateson, M. M.; Ward, D. M.
1994-01-01
When PCR was used to recover small-subunit (SSU) rRNA genes from a hot spring cyanobacterial mat community, chimeric SSU rRNA sequences which exhibited little or no secondary structural abnormality were recovered. They were revealed as chimeras of SSU rRNA genes of uncultivated species through separate phylogenetic analysis of short sequence domains.
Law, Yee-Song; Gudimella, Ranganath; Song, Beng-Kah; Ratnam, Wickneswari; Harikrishna, Jennifer Ann
2012-01-01
Many of the plant leucine rich repeat receptor-like kinases (LRR-RLKs) have been found to regulate signaling during plant defense processes. In this study, we selected and sequenced an LRR-RLK gene, designated as Oryza rufipogon receptor-like protein kinase 1 (OrufRPK1), located within yield QTL yld1.1 from the wild rice Oryza rufipogon (accession IRGC105491). A 2055 bp coding region and two exons were identified. Southern blotting determined OrufRPK1 to be a single copy gene. Sequence comparison with cultivated rice orthologs (OsI219RPK1, OsI9311RPK1 and OsJNipponRPK1, respectively derived from O. sativa ssp. indica cv. MR219, O. sativa ssp. indica cv. 9311 and O. sativa ssp. japonica cv. Nipponbare) revealed the presence of 12 single nucleotide polymorphisms (SNPs) with five non-synonymous substitutions, and 23 insertion/deletion sites. The biological role of the OrufRPK1 as a defense related LRR-RLK is proposed on the basis of cDNA sequence characterization, domain subfamily classification, structural prediction of extra cellular domains, cluster analysis and comparative gene expression. PMID:22942769
Wolf, Maxim Y; Wolf, Yuri I; Koonin, Eugene V
2008-01-01
Background Proteins show a broad range of evolutionary rates. Understanding the factors that are responsible for the characteristic rate of evolution of a given protein arguably is one of the major goals of evolutionary biology. A long-standing general assumption used to be that the evolution rate is, primarily, determined by the specific functional constraints that affect the given protein. These constrains were traditionally thought to depend both on the specific features of the protein's structure and its biological role. The advent of systems biology brought about new types of data, such as expression level and protein-protein interactions, and unexpectedly, a variety of correlations between protein evolution rate and these variables have been observed. The strongest connections by far were repeatedly seen between protein sequence evolution rate and the expression level of the respective gene. It has been hypothesized that this link is due to the selection for the robustness of the protein structure to mistranslation-induced misfolding that is particularly important for highly expressed proteins and is the dominant determinant of the sequence evolution rate. Results This work is an attempt to assess the relative contributions of protein domain structure and function, on the one hand, and expression level on the other hand, to the rate of sequence evolution. To this end, we performed a genome-wide analysis of the effect of the fusion of a pair of domains in multidomain proteins on the difference in the domain-specific evolutionary rates. The mistranslation-induced misfolding hypothesis would predict that, within multidomain proteins, fused domains, on average, should evolve at substantially closer rates than the same domains in different proteins because, within a mutlidomain protein, all domains are translated at the same rate. We performed a comprehensive comparison of the evolutionary rates of mammalian and plant protein domains that are either joined in multidomain proteins or contained in distinct proteins. Substantial homogenization of evolutionary rates in multidomain proteins was, indeed, observed in both animals and plants, although highly significant differences between domain-specific rates remained. The contributions of the translation rate, as determined by the effect of the fusion of a pair of domains within a multidomain protein, and intrinsic, domain-specific structural-functional constraints appear to be comparable in magnitude. Conclusion Fusion of domains in a multidomain protein results in substantial homogenization of the domain-specific evolutionary rates but significant differences between domain-specific evolution rates remain. Thus, the rate of translation and intrinsic structural-functional constraints both exert sizable and comparable effects on sequence evolution. Reviewers This article was reviewed by Sergei Maslov, Dennis Vitkup, Claus Wilke (nominated by Orly Alter), and Allan Drummond (nominated by Joel Bader). For the full reviews, please go to the Reviewers' Reports section. PMID:18840284
Comparative Analysis of Transcription Factors Families across Fungal Tree of Life
DOE Office of Scientific and Technical Information (OSTI.GOV)
Salamov, Asaf; Grigoriev, Igor
2015-03-19
Transcription factors (TFs) are proteins that regulate the transcription of genes, by binding to specific DNA sequences. Based on literature (Shelest, 2008; Weirauch and Hughes,2011) collected and manually curated list of DBD Pfam domains (in total 62 DBD domains) We looked for distribution of TFs in 395 fungal genomes plus additionally in plant genomes (Phytozome), prokaryotes(IMG), some animals/metazoans and protists genomes
Reddy, G; Nanduri, V B; Basu, A; Modak, M J
1991-08-20
Treatment of murine leukemia virus reverse transcriptase (MuLV RT) with potassium ferrate, an oxidizing agent known to oxidize amino acids involved in phosphate binding domains of proteins, results in the irreversible inactivation of both the DNA polymerase and the RNase H activities. Significant protection from ferrate-mediated inactivation is observed in the presence of template-primer but not in the presence of substrate deoxynucleoside triphosphates. Furthermore, ferrate-treated enzyme loses template-primer binding activity as judged by UV-mediated cross-linking of radiolabeled DNA. Comparative tryptic peptide mapping by reverse-phase HPLC of native and ferrate-oxidized enzyme indicated the presence of two new peptides eluting at 38 and 57 min and a significant loss of a peptide eluting at 74 min. Purification, amino acid composition, and sequencing of these affected peptides revealed that they correspond to amino acid residues 285-295, 630-640, and 586-599, respectively, in the primary amino acid sequence of MuLV RT. These results indicate that the domains constituted by the above peptides are important for the template-primer binding function in MuLV RT. Peptide I is located in the polymerase domain whereas peptides II and III are located in the RNase H domain. Amino acid sequence analysis of peptides I and II suggested Lys-285 and Cys-635 as the probable sites of ferrate action.
Protein Information Resource: a community resource for expert annotation of protein data
Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy
2001-01-01
The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-International databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041
The Thiamin Pyrophosphate-Motif
NASA Technical Reports Server (NTRS)
Dominiak, Paulina M.; Ciszak, Ewa M.
2003-01-01
Using databases the authors have identified a common thiamin pyrophosphate (TPP)-motif in the family of functionally diverse TPP-dependent enzymes. This common motif consists of multimeric organization of subunits, two catalytic centers, common amino acid sequence, and specific contacts to provide a flip-flop, or alternate site, mechanism of action. Each catalytic center [PP:PYR] is formed at the interface of the PP-domain binding the magnesium ion, pyrophosphate and aminopyrimidine ring of TPP, and the PYR-domain binding the aminopyrimidine ring of that cofactor. A pair of these catalytic centers constitutes the catalytic core [PP:PYR]* within these enzymes. Analysis of the structural elements of this catalytic core reveals novel definition of the common amino acid sequences, which are GX@&(G)@XXGQ, and GDGX25-30 within the PP- domain, and the E&(G)@XXG@ within the PYR-domain, where Q, corresponds to a hydrophobic amino acid. This TPP-motif provides a novel tool for annotation of TPP-dependent enzymes useful in advancing functional proteomics.
Hopple, J S; Vilgalys, R
1999-10-01
Phylogenetic relationships were investigated in the mushroom genus Coprinus based on sequence data from the nuclear encoded large-subunit rDNA gene. Forty-seven species of Coprinus and 19 additional species from the families Coprinaceae, Strophariaceae, Bolbitiaceae, Agaricaceae, Podaxaceae, and Montagneaceae were studied. A total of 1360 sites was sequenced across seven divergent domains and intervening sequences. A total of 302 phylogenetically informative characters was found. Ninety-eight percent of the average divergence between taxa was located within the divergent domains, with domains D2 and D8 being most divergent and domains D7 and D10 the least divergent. An empirical test of phylogenetic signal among divergent domains also showed that domains D2 and D3 had the lowest levels of homoplasy. Two equally most parsimonious trees were resolved using Wagner parsimony. A character-state weighted analysis produced 12 equally most parsimonious trees similar to those generated by Wagner parsimony. Phylogenetic analyses employing topological constraints suggest that none of the major taxonomic systems proposed for subgeneric classification is able to completely reflect phylogenetic relationships in Coprinus. A strict consensus integration of the two Wagner trees demonstrates the problematic nature of choosing outgroups within dark-spored mushrooms. The genus Coprinus is found to be polyphyletic and is separated into three distinct clades. Most Coprinus taxa belong to the first two clades, which together form a larger monophyletic group with Lacrymaria and Psathyrella in basal positions. A third clade contains members of Coprinus section Comati as well as the genus Leucocoprinus, Podaxis pistillaris, Montagnea arenaria, and Agaricus pocillator. This third clade is separated from the other species of Coprinus by members of the families Strophariaceae and Bolbitiaceae and the genus Panaeolus. Copyright 1999 Academic Press.
Machnicka, Magdalena A; Kaminska, Katarzyna H; Dunin-Horkawicz, Stanislaw; Bujnicki, Janusz M
2015-10-23
GmrSD is a modification-dependent restriction endonuclease that specifically targets and cleaves glucosylated hydroxymethylcytosine (glc-HMC) modified DNA. It is encoded either as two separate single-domain GmrS and GmrD proteins or as a single protein carrying both domains. Previous studies suggested that GmrS acts as endonuclease and NTPase whereas GmrD binds DNA. In this work we applied homology detection, sequence conservation analysis, fold recognition and homology modeling methods to study sequence-structure-function relationships in the GmrSD restriction endonucleases family. We also analyzed the phylogeny and genomic context of the family members. Results of our comparative genomics study show that GmrS exhibits similarity to proteins from the ParB/Srx fold which can have both NTPase and nuclease activity. In contrast to the previous studies though, we attribute the nuclease activity also to GmrD as we found it to contain the HNH endonuclease motif. We revealed residues potentially important for structure and function in both domains. Moreover, we found that GmrSD systems exist predominantly as a fused, double-domain form rather than as a heterodimer and that their homologs are often encoded in regions enriched in defense and gene mobility-related elements. Finally, phylogenetic reconstructions of GmrS and GmrD domains revealed that they coevolved and only few GmrSD systems appear to be assembled from distantly related GmrS and GmrD components. Our study provides insight into sequence-structure-function relationships in the yet poorly characterized family of Type IV restriction enzymes. Comparative genomics allowed to propose possible role of GmrD domain in the function of the GmrSD enzyme and possible active sites of both GmrS and GmrD domains. Presented results can guide further experimental characterization of these enzymes.
Cuadrat, Rafael R. C.; Cury, Juliano C.; Dávila, Alberto M. R.
2015-01-01
Marine environments harbor a wide range of microorganisms from the three domains of life. These microorganisms have great potential to enable discovery of new enzymes and bioactive compounds for industrial use. However, only ~1% of microorganisms from the environment can currently be identified through cultured isolates, limiting the discovery of new compounds. To overcome this limitation, a metagenomics approach has been widely adopted for biodiversity studies on samples from marine environments. In this study, we screened metagenomes in order to estimate the potential for new natural compound synthesis mediated by diversity in the Polyketide Synthase (PKS) and Nonribosomal Peptide Synthetase (NRPS) genes. The samples were collected from the Praia dos Anjos (Angel’s Beach) surface water—Arraial do Cabo (Rio de Janeiro state, Brazil), an environment affected by upwelling. In order to evaluate the potential for screening natural products in Arraial do Cabo samples, we used KS (keto-synthase) and C (condensation) domains (from PKS and NRPS, respectively) to build Hidden Markov Models (HMM) models. From both samples, a total of 84 KS and 46 C novel domain sequences were obtained, showing the potential of this environment for the discovery of new genes of biotechnological interest. These domains were classified by phylogenetic analysis and this was the first study conducted to screen PKS and NRPS genes in an upwelling affected sample PMID:26633360
Modular protein domains: an engineering approach toward functional biomaterials.
Lin, Charng-Yu; Liu, Julie C
2016-08-01
Protein domains and peptide sequences are a powerful tool for conferring specific functions to engineered biomaterials. Protein sequences with a wide variety of functionalities, including structure, bioactivity, protein-protein interactions, and stimuli responsiveness, have been identified, and advances in molecular biology continue to pinpoint new sequences. Protein domains can be combined to make recombinant proteins with multiple functionalities. The high fidelity of the protein translation machinery results in exquisite control over the sequence of recombinant proteins and the resulting properties of protein-based materials. In this review, we discuss protein domains and peptide sequences in the context of functional protein-based materials, composite materials, and their biological applications. Copyright © 2016 Elsevier Ltd. All rights reserved.
Evidence for two koi herpesvirus (KHV) genotypes in South Korea.
Kim, Hyoung Jun; Kwon, Se Ryun
2013-06-13
The geographic distribution of koi herpesvirus (KHV) has recently been analyzed by polymerase chain reaction (PCR, based on the alleles of 3 domains) and sequence analysis using 3 regions of KHV genomic DNA (SphI-5, 9/5, and the thymidine kinase gene). In this study, samples from 6 carp showing symptoms of KHV infection in 2008 were examined for the presence of KHV by using PCR and cell culture isolation methods. KHV was detected in 2 (Pyeongtaek and Buan) of the samples. Sequence analysis revealed that the genotype of the KHV PT-08 isolate was Asia genotype variant 1 (A1), and the genotype of the KHV BA-08 isolate was European genotype variant 4 (E4). In addition, PCR patterns and sequence analysis based on the alleles of 3 domains of an alternate KHV classification system confirmed that the genotype of the KHV PT-08 isolate was CyHV3-J, and the genotype of the KHV BA-08 isolate was CyHV3-third genotype. To our knowledge, this is the first study to demonstrate the presence of 2 genotypes of KHV (genotype A1/CyHV3-J; genotype E4/CyHV3-third genotype) in South Korea.
EventThread: Visual Summarization and Stage Analysis of Event Sequence Data.
Guo, Shunan; Xu, Ke; Zhao, Rongwen; Gotz, David; Zha, Hongyuan; Cao, Nan
2018-01-01
Event sequence data such as electronic health records, a person's academic records, or car service records, are ordered series of events which have occurred over a period of time. Analyzing collections of event sequences can reveal common or semantically important sequential patterns. For example, event sequence analysis might reveal frequently used care plans for treating a disease, typical publishing patterns of professors, and the patterns of service that result in a well-maintained car. It is challenging, however, to visually explore large numbers of event sequences, or sequences with large numbers of event types. Existing methods focus on extracting explicitly matching patterns of events using statistical analysis to create stages of event progression over time. However, these methods fail to capture latent clusters of similar but not identical evolutions of event sequences. In this paper, we introduce a novel visualization system named EventThread which clusters event sequences into threads based on tensor analysis and visualizes the latent stage categories and evolution patterns by interactively grouping the threads by similarity into time-specific clusters. We demonstrate the effectiveness of EventThread through usage scenarios in three different application domains and via interviews with an expert user.
Wang, Yunsheng; Zhou, Lijuan; Li, Dazhi; Dai, Liangying; Lawton-Rauh, Amy; Srimani, Pradip K.; Duan, Yongping; Luo, Feng
2015-01-01
In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention. PMID:25811466
Wang, Yunsheng; Zhou, Lijuan; Li, Dazhi; Dai, Liangying; Lawton-Rauh, Amy; Srimani, Pradip K; Duan, Yongping; Luo, Feng
2015-01-01
In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.
2013-01-01
Background The major role of enzymatic toxins that target nucleic acids in biological conflicts at all levels has become increasingly apparent thanks in large part to the advances of comparative genomics. Typically, toxins evolve rapidly hampering the identification of these proteins by sequence analysis. Here we analyze an unexpectedly widespread superfamily of toxin domains most of which possess RNase activity. Results The HEPN superfamily is comprised of all α-helical domains that were first identified as being associated with DNA polymerase β-type nucleotidyltransferases in prokaryotes and animal Sacsin proteins. Using sensitive sequence and structure comparison methods, we vastly extend the HEPN superfamily by identifying numerous novel families and by detecting diverged HEPN domains in several known protein families. The new HEPN families include the RNase LS and LsoA catalytic domains, KEN domains (e.g. RNaseL and Ire1) and the RNase domains of RloC and PrrC. The majority of HEPN domains contain conserved motifs that constitute a metal-independent endoRNase active site. Some HEPN domains lacking this motif probably function as non-catalytic RNA-binding domains, such as in the case of the mannitol repressor MtlR. Our analysis shows that HEPN domains function as toxins that are shared by numerous systems implicated in intra-genomic, inter-genomic and intra-organismal conflicts across the three domains of cellular life. In prokaryotes HEPN domains are essential components of numerous toxin-antitoxin (TA) and abortive infection (Abi) systems and in addition are tightly associated with many restriction-modification (R-M) and CRISPR-Cas systems, and occasionally with other defense systems such as Pgl and Ter. We present evidence of multiple modes of action of HEPN domains in these systems, which include direct attack on viral RNAs (e.g. LsoA and RNase LS) in conjunction with other RNase domains (e.g. a novel RNase H fold domain, NamA), suicidal or dormancy-inducing attack on self RNAs (RM systems and possibly CRISPR-Cas systems), and suicidal attack coupled with direct interaction with phage components (Abi systems). These findings are compatible with the hypothesis on coupling of pathogen-targeting (immunity) and self-directed (programmed cell death and dormancy induction) responses in the evolution of robust antiviral strategies. We propose that altruistic cell suicide mediated by HEPN domains and other functionally similar RNases was essential for the evolution of kin and group selection and cell cooperation. HEPN domains were repeatedly acquired by eukaryotes and incorporated into several core functions such as endonucleolytic processing of the 5.8S-25S/28S rRNA precursor (Las1), a novel ER membrane-associated RNA degradation system (C6orf70), sensing of unprocessed transcripts at the nuclear periphery (Swt1). Multiple lines of evidence suggest that, similar to prokaryotes, HEPN proteins were recruited to antiviral, antitransposon, apoptotic systems or RNA-level response to unfolded proteins (Sacsin and KEN domains) in several groups of eukaryotes. Conclusions Extensive sequence and structure comparisons reveal unexpectedly broad presence of the HEPN domain in an enormous variety of defense and stress response systems across the tree of life. In addition, HEPN domains have been recruited to perform essential functions, in particular in eukaryotic rRNA processing. These findings are expected to stimulate experiments that could shed light on diverse cellular processes across the three domains of life. Reviewers This article was reviewed by Martijn Huynen, Igor Zhulin and Nick Grishin PMID:23768067
Anantharaman, Vivek; Makarova, Kira S; Burroughs, A Maxwell; Koonin, Eugene V; Aravind, L
2013-06-15
The major role of enzymatic toxins that target nucleic acids in biological conflicts at all levels has become increasingly apparent thanks in large part to the advances of comparative genomics. Typically, toxins evolve rapidly hampering the identification of these proteins by sequence analysis. Here we analyze an unexpectedly widespread superfamily of toxin domains most of which possess RNase activity. The HEPN superfamily is comprised of all α-helical domains that were first identified as being associated with DNA polymerase β-type nucleotidyltransferases in prokaryotes and animal Sacsin proteins. Using sensitive sequence and structure comparison methods, we vastly extend the HEPN superfamily by identifying numerous novel families and by detecting diverged HEPN domains in several known protein families. The new HEPN families include the RNase LS and LsoA catalytic domains, KEN domains (e.g. RNaseL and Ire1) and the RNase domains of RloC and PrrC. The majority of HEPN domains contain conserved motifs that constitute a metal-independent endoRNase active site. Some HEPN domains lacking this motif probably function as non-catalytic RNA-binding domains, such as in the case of the mannitol repressor MtlR. Our analysis shows that HEPN domains function as toxins that are shared by numerous systems implicated in intra-genomic, inter-genomic and intra-organismal conflicts across the three domains of cellular life. In prokaryotes HEPN domains are essential components of numerous toxin-antitoxin (TA) and abortive infection (Abi) systems and in addition are tightly associated with many restriction-modification (R-M) and CRISPR-Cas systems, and occasionally with other defense systems such as Pgl and Ter. We present evidence of multiple modes of action of HEPN domains in these systems, which include direct attack on viral RNAs (e.g. LsoA and RNase LS) in conjunction with other RNase domains (e.g. a novel RNase H fold domain, NamA), suicidal or dormancy-inducing attack on self RNAs (RM systems and possibly CRISPR-Cas systems), and suicidal attack coupled with direct interaction with phage components (Abi systems). These findings are compatible with the hypothesis on coupling of pathogen-targeting (immunity) and self-directed (programmed cell death and dormancy induction) responses in the evolution of robust antiviral strategies. We propose that altruistic cell suicide mediated by HEPN domains and other functionally similar RNases was essential for the evolution of kin and group selection and cell cooperation. HEPN domains were repeatedly acquired by eukaryotes and incorporated into several core functions such as endonucleolytic processing of the 5.8S-25S/28S rRNA precursor (Las1), a novel ER membrane-associated RNA degradation system (C6orf70), sensing of unprocessed transcripts at the nuclear periphery (Swt1). Multiple lines of evidence suggest that, similar to prokaryotes, HEPN proteins were recruited to antiviral, antitransposon, apoptotic systems or RNA-level response to unfolded proteins (Sacsin and KEN domains) in several groups of eukaryotes. Extensive sequence and structure comparisons reveal unexpectedly broad presence of the HEPN domain in an enormous variety of defense and stress response systems across the tree of life. In addition, HEPN domains have been recruited to perform essential functions, in particular in eukaryotic rRNA processing. These findings are expected to stimulate experiments that could shed light on diverse cellular processes across the three domains of life. This article was reviewed by Martijn Huynen, Igor Zhulin and Nick Grishin.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hou, Xiaomin; Meehan, Edward J.; Xie, Jieming
2008-10-27
A novel type 1 ribosome-inactivating protein (RIP) designated cucurmosin was isolated from the sarcocarp of Cucurbita moschata (pumpkin). Besides rRNA N-glycosidase activity, cucurmosin exhibits strong cytotoxicities to three cancer cell lines of both human and murine origins, but low toxicity to normal cells. Plant genomic DNA extracted from the tender leaves was amplified by PCR between primers based on the N-terminal sequence and X-ray sequence of the C-terminal. The complete mature protein sequence was obtained from N-terminal protein sequencing and partial DNA sequencing, confirmed by high resolution crystal structure analysis. The crystal structure of cucurmosin has been determined at 1.04more » {angstrom}, a resolution that has never been achieved before for any RIP. The structure contains two domains: a large N-terminal domain composed of seven {alpha}-helices and eight {beta}-strands, and a smaller C-terminal domain consisting of three {alpha}-helices and two {beta}-strands. The high resolution structure established a glycosylation pattern of GlcNAc{sub 2}Man3Xyl. Asn225 was identified as a glycosylation site. Residues Tyr70, Tyr109, Glu158 and Arg161 define the active site of cucurmosin as an RNA N-glycosidase. The structural basis of cytotoxicity difference between cucurmosin and trichosanthin is discussed.« less
Alternative dimerization interfaces in the glucocorticoid receptor-α ligand binding domain.
Bianchetti, Laurent; Wassmer, Bianca; Defosset, Audrey; Smertina, Anna; Tiberti, Marion L; Stote, Roland H; Dejaegere, Annick
2018-04-30
Nuclear hormone receptors (NRs) constitute a large family of multi-domain ligand-activated transcription factors. Dimerization is essential for their regulation, and both DNA binding domain (DBD) and ligand binding domain (LBD) are implicated in dimerization. Intriguingly, the glucocorticoid receptor-α (GRα) presents a DBD dimeric architecture similar to that of the homologous estrogen receptor-α (ERα), but an atypical dimeric architecture for the LBD. The physiological relevance of the proposed GRα LBD dimer is a subject of debate. We analyzed all GRα LBD homodimers observed in crystals using an energetic analysis based on the PISA and on the MM/PBSA methods and a sequence conservation analysis, using the ERα LBD dimer as a reference point. Several dimeric assemblies were observed for GRα LBD. The assembly generally taken to be physiologically relevant showed weak binding free energy and no significant residue conservation at the contact interface, while an alternative homodimer mediated by both helix 9 and C-terminal residues showed significant binding free energy and residue conservation. However, none of the GRα LBD assemblies found in crystals are as stable or conserved as the canonical ERα LBD dimer. GRα C-terminal sequence (F-domain) forms a steric obstacle to the canonical dimer assembly in all available structures. Our analysis calls for a re-examination of the currently accepted GRα homodimer structure and experimental investigations of the alternative architectures. This work questions the validity of the currently accepted architecture. This has implications for interpreting physiological data and for therapeutic design pertaining to glucocorticoid research. Copyright © 2018. Published by Elsevier B.V.
Oncodomains: A protein domain-centric framework for analyzing rare variants in tumor samples
Peterson, Thomas A.; Park, Junyong
2017-01-01
The fight against cancer is hindered by its highly heterogeneous nature. Genome-wide sequencing studies have shown that individual malignancies contain many mutations that range from those commonly found in tumor genomes to rare somatic variants present only in a small fraction of lesions. Such rare somatic variants dominate the landscape of genomic mutations in cancer, yet efforts to correlate somatic mutations found in one or few individuals with functional roles have been largely unsuccessful. Traditional methods for identifying somatic variants that drive cancer are ‘gene-centric’ in that they consider only somatic variants within a particular gene and make no comparison to other similar genes in the same family that may play a similar role in cancer. In this work, we present oncodomain hotspots, a new ‘domain-centric’ method for identifying clusters of somatic mutations across entire gene families using protein domain models. Our analysis confirms that our approach creates a framework for leveraging structural and functional information encapsulated by protein domains into the analysis of somatic variants in cancer, enabling the assessment of even rare somatic variants by comparison to similar genes. Our results reveal a vast landscape of somatic variants that act at the level of domain families altering pathways known to be involved with cancer such as protein phosphorylation, signaling, gene regulation, and cell metabolism. Due to oncodomain hotspots’ unique ability to assess rare variants, we expect our method to become an important tool for the analysis of sequenced tumor genomes, complementing existing methods. PMID:28426665
Predictive and comparative analysis of Ebolavirus proteins
Cong, Qian; Pei, Jimin; Grishin, Nick V
2015-01-01
Ebolavirus is the pathogen for Ebola Hemorrhagic Fever (EHF). This disease exhibits a high fatality rate and has recently reached a historically epidemic proportion in West Africa. Out of the 5 known Ebolavirus species, only Reston ebolavirus has lost human pathogenicity, while retaining the ability to cause EHF in long-tailed macaque. Significant efforts have been spent to determine the three-dimensional (3D) structures of Ebolavirus proteins, to study their interaction with host proteins, and to identify the functional motifs in these viral proteins. Here, in light of these experimental results, we apply computational analysis to predict the 3D structures and functional sites for Ebolavirus protein domains with unknown structure, including a zinc-finger domain of VP30, the RNA-dependent RNA polymerase catalytic domain and a methyltransferase domain of protein L. In addition, we compare sequences of proteins that interact with Ebolavirus proteins from RESTV-resistant primates with those from RESTV-susceptible monkeys. The host proteins that interact with GP and VP35 show an elevated level of sequence divergence between the RESTV-resistant and RESTV-susceptible species, suggesting that they may be responsible for host specificity. Meanwhile, we detect variable positions in protein sequences that are likely associated with the loss of human pathogenicity in RESTV, map them onto the 3D structures and compare their positions to known functional sites. VP35 and VP30 are significantly enriched in these potential pathogenicity determinants and the clustering of such positions on the surfaces of VP35 and GP suggests possible uncharacterized interaction sites with host proteins that contribute to the virulence of Ebolavirus. PMID:26158395
Predictive and comparative analysis of Ebolavirus proteins.
Cong, Qian; Pei, Jimin; Grishin, Nick V
2015-01-01
Ebolavirus is the pathogen for Ebola Hemorrhagic Fever (EHF). This disease exhibits a high fatality rate and has recently reached a historically epidemic proportion in West Africa. Out of the 5 known Ebolavirus species, only Reston ebolavirus has lost human pathogenicity, while retaining the ability to cause EHF in long-tailed macaque. Significant efforts have been spent to determine the three-dimensional (3D) structures of Ebolavirus proteins, to study their interaction with host proteins, and to identify the functional motifs in these viral proteins. Here, in light of these experimental results, we apply computational analysis to predict the 3D structures and functional sites for Ebolavirus protein domains with unknown structure, including a zinc-finger domain of VP30, the RNA-dependent RNA polymerase catalytic domain and a methyltransferase domain of protein L. In addition, we compare sequences of proteins that interact with Ebolavirus proteins from RESTV-resistant primates with those from RESTV-susceptible monkeys. The host proteins that interact with GP and VP35 show an elevated level of sequence divergence between the RESTV-resistant and RESTV-susceptible species, suggesting that they may be responsible for host specificity. Meanwhile, we detect variable positions in protein sequences that are likely associated with the loss of human pathogenicity in RESTV, map them onto the 3D structures and compare their positions to known functional sites. VP35 and VP30 are significantly enriched in these potential pathogenicity determinants and the clustering of such positions on the surfaces of VP35 and GP suggests possible uncharacterized interaction sites with host proteins that contribute to the virulence of Ebolavirus.
Pang, Erli; Wu, Xiaomei; Lin, Kui
2016-06-01
Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution.
The binding of TIA-1 to RNA C-rich sequences is driven by its C-terminal RRM domain.
Cruz-Gallardo, Isabel; Aroca, Ángeles; Gunzburg, Menachem J; Sivakumaran, Andrew; Yoon, Je-Hyun; Angulo, Jesús; Persson, Cecilia; Gorospe, Myriam; Karlsson, B Göran; Wilce, Jacqueline A; Díaz-Moreno, Irene
2014-01-01
T-cell intracellular antigen-1 (TIA-1) is a key DNA/RNA binding protein that regulates translation by sequestering target mRNAs in stress granules (SG) in response to stress conditions. TIA-1 possesses three RNA recognition motifs (RRM) along with a glutamine-rich domain, with the central domains (RRM2 and RRM3) acting as RNA binding platforms. While the RRM2 domain, which displays high affinity for U-rich RNA sequences, is primarily responsible for interaction with RNA, the contribution of RRM3 to bind RNA as well as the target RNA sequences that it binds preferentially are still unknown. Here we combined nuclear magnetic resonance (NMR) and surface plasmon resonance (SPR) techniques to elucidate the sequence specificity of TIA-1 RRM3. With a novel approach using saturation transfer difference NMR (STD-NMR) to quantify protein-nucleic acids interactions, we demonstrate that isolated RRM3 binds to both C- and U-rich stretches with micromolar affinity. In combination with RRM2 and in the context of full-length TIA-1, RRM3 significantly enhanced the binding to RNA, particularly to cytosine-rich RNA oligos, as assessed by biotinylated RNA pull-down analysis. Our findings provide new insight into the role of RRM3 in regulating TIA-1 binding to C-rich stretches, that are abundant at the 5' TOPs (5' terminal oligopyrimidine tracts) of mRNAs whose translation is repressed under stress situations.
The binding of TIA-1 to RNA C-rich sequences is driven by its C-terminal RRM domain
Cruz-Gallardo, Isabel; Aroca, Ángeles; Gunzburg, Menachem J; Sivakumaran, Andrew; Yoon, Je-Hyun; Angulo, Jesús; Persson, Cecilia; Gorospe, Myriam; Karlsson, B Göran; Wilce, Jacqueline A; Díaz-Moreno, Irene
2014-01-01
T-cell intracellular antigen-1 (TIA-1) is a key DNA/RNA binding protein that regulates translation by sequestering target mRNAs in stress granules (SG) in response to stress conditions. TIA-1 possesses three RNA recognition motifs (RRM) along with a glutamine-rich domain, with the central domains (RRM2 and RRM3) acting as RNA binding platforms. While the RRM2 domain, which displays high affinity for U-rich RNA sequences, is primarily responsible for interaction with RNA, the contribution of RRM3 to bind RNA as well as the target RNA sequences that it binds preferentially are still unknown. Here we combined nuclear magnetic resonance (NMR) and surface plasmon resonance (SPR) techniques to elucidate the sequence specificity of TIA-1 RRM3. With a novel approach using saturation transfer difference NMR (STD-NMR) to quantify protein–nucleic acids interactions, we demonstrate that isolated RRM3 binds to both C- and U-rich stretches with micromolar affinity. In combination with RRM2 and in the context of full-length TIA-1, RRM3 significantly enhanced the binding to RNA, particularly to cytosine-rich RNA oligos, as assessed by biotinylated RNA pull-down analysis. Our findings provide new insight into the role of RRM3 in regulating TIA-1 binding to C-rich stretches, that are abundant at the 5′ TOPs (5′ terminal oligopyrimidine tracts) of mRNAs whose translation is repressed under stress situations. PMID:24824036
Automatic prediction of protein domains from sequence information using a hybrid learning system.
Nagarajan, Niranjan; Yona, Golan
2004-06-12
We describe a novel method for detecting the domain structure of a protein from sequence information alone. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence and are combined into a single predictor using a neural network. The output is further smoothed and post-processed using a probabilistic model to predict the most likely transition positions between domains. The method was assessed using the domain definitions in SCOP and CATH for proteins of known structure and was compared with several other existing methods. Our method performs well both in terms of accuracy and sensitivity. It improves significantly over the best methods available, even some of the semi-manual ones, while being fully automatic. Our method can also be used to suggest and verify domain partitions based on structural data. A few examples of predicted domain definitions and alternative partitions, as suggested by our method, are also discussed. An online domain-prediction server is available at http://biozon.org/tools/domains/
Lentes, K U; Mathieu, E; Bischoff, R; Rasmussen, U B; Pavirani, A
1993-01-01
Current methods for comparative analyses of protein sequences are 1D-alignments of amino acid sequences based on the maximization of amino acid identity (homology) and the prediction of secondary structure elements. This method has a major drawback once the amino acid identity drops below 20-25%, since maximization of a homology score does not take into account any structural information. A new technique called Hydrophobic Cluster Analysis (HCA) has been developed by Lemesle-Varloot et al. (Biochimie 72, 555-574), 1990). This consists of comparing several sequences simultaneously and combining homology detection with secondary structure analysis. HCA is primarily based on the detection and comparison of structural segments constituting the hydrophobic core of globular protein domains, with or without transmembrane domains. We have applied HCA to the analysis of different families of G-protein coupled receptors, such as catecholamine receptors as well as peptide hormone receptors. Utilizing HCA the thrombin receptor, a new and as yet unique member of the family of G-protein coupled receptors, can be clearly classified as being closely related to the family of neuropeptide receptors rather than to the catecholamine receptors for which the shape of the hydrophobic clusters and the length of their third cytoplasmic loop are very different. Furthermore, the potential of HCA to predict relationships between new putative and already characterized members of this family of receptors will be presented.
Costanzi, Stefano; Skorski, Matthew; Deplano, Alessandro; Habermehl, Brett; Mendoza, Mary; Wang, Keyun; Biederman, Michelle; Dawson, Jessica; Gao, Jia
2016-11-01
With the present work we quantitatively studied the modellability of the inactive state of Class A G protein-coupled receptors (GPCRs). Specifically, we constructed models of one of the Class A GPCRs for which structures solved in the inactive state are available, namely the β 2 AR, using as templates each of the other class members for which structures solved in the inactive state are also available. Our results showed a detectable linear correlation between model accuracy and model/template sequence identity. This suggests that the likely accuracy of the homology models that can be built for a given receptor can be generally forecasted on the basis of the available templates. We also probed whether sequence alignments that allow for the presence of gaps within the transmembrane domains to account for structural irregularities afford better models than the classical alignment procedures that do not allow for the presence of gaps within such domains. As our results indicated, although the overall differences are very subtle, the inclusion of internal gaps within the transmembrane domains has a noticeable a beneficial effect on the local structural accuracy of the domain in question. Copyright © 2016 Elsevier Inc. All rights reserved.
SOBA: sequence ontology bioinformatics analysis.
Moore, Barry; Fan, Guozhen; Eilbeck, Karen
2010-07-01
The advent of cheaper, faster sequencing technologies has pushed the task of sequence annotation from the exclusive domain of large-scale multi-national sequencing projects to that of research laboratories and small consortia. The bioinformatics burden placed on these laboratories, some with very little programming experience can be daunting. Fortunately, there exist software libraries and pipelines designed with these groups in mind, to ease the transition from an assembled genome to an annotated and accessible genome resource. We have developed the Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome. We envisage its use during annotation jamborees, genome comparison and for use by developers for rapid feedback during annotation software development and testing. SOBA also provides annotation consistency feedback to ensure correct use of terminology within annotations, and guides users to add new terms to the Sequence Ontology when required. SOBA is available at http://www.sequenceontology.org/cgi-bin/soba.cgi.
Genome-wide analysis of the Zn(II)2Cys6 zinc cluster-encoding gene family in Aspergillus flavus
USDA-ARS?s Scientific Manuscript database
Proteins with a Zn(II)2Cys6 domain, Cys-X2-Cys-X6-Cys-X5-12-Cys-X2-Cys-X6-9-Cys (hereafter, referred to as the C6 domain), form a subclass of zinc finger proteins found exclusively in fungi and yeast. Genome sequence databases of Saccharomyces cerevisiae and Candida albicans have provided an overvie...
Marck, Christian; Grosjean, Henri
2002-01-01
From 50 genomes of the three domains of life (7 eukarya, 13 archaea, and 30 bacteria), we extracted, analyzed, and compared over 4,000 sequences corresponding to cytoplasmic, nonorganellar tRNAs. For each genome, the complete set of tRNAs required to read the 61 sense codons was identified, which permitted revelation of three major anticodon-sparing strategies. Other features and sequence peculiarities analyzed are the following: (1) fit to the standard cloverleaf structure, (2) characteristic consensus sequences for elongator and initiator tDNAs, (3) frequencies of bases at each sequence position, (4) type and frequencies of conserved 2D and 3D base pairs, (5) anticodon/tDNA usages and anticodon-sparing strategies, (6) identification of the tRNA-Ile with anticodon CAU reading AUA, (7) size of variable arm, (8) occurrence and location of introns, (9) occurrence of 3'-CCA and 5'-extra G encoded at the tDNA level, and (10) distribution of the tRNA genes in genomes and their mode of transcription. Among all tRNA isoacceptors, we found that initiator tDNA-iMet is the most conserved across the three domains, yet domain-specific signatures exist. Also, according to which tRNA feature is considered (5'-extra G encoded in tDNAs-His, AUA codon read by tRNA-Ile with anticodon CAU, presence of intron, absence of "two-out-of-three" reading mode and short V-arm in tDNA-Tyr) Archaea sequester either with Bacteria or Eukarya. No common features between Eukarya and Bacteria not shared with Archaea could be unveiled. Thus, from the tRNomic point of view, Archaea appears as an "intermediate domain" between Eukarya and Bacteria. PMID:12403461
Miao, L X; Jiang, M; Zhang, Y C; Yang, X F; Zhang, H Q; Zhang, Z F; Wang, Y Z; Jiang, G H
2016-08-05
The MLO (powdery mildew locus O) gene family is important in resistance to powdery mildew (PM). In this study, all of the members of the MLO family were identified and analyzed in the strawberry (Fragaria vesca) genome. The strawberry contains at least 20 members of the MLO family, and the protein sequence contained between 171 and 1485 amino acids, with 0-34 introns. Chromosomal localization showed that the MLOs were unevenly distributed on each of the chromosomes, except for chromosome 4. The greatest number of MLOs (seven) was found on chromosome 3. A phylogenetic tree showed that the MLOs were divided into seven groups (I-VII), four of which consisted of MLOs from strawberry, Arabidopsis thaliana, rice, and maize, suggesting that these genes may have evolved after the divergence of monocots and dicots. Multiple sequence alignment showed that strawberry MLO candidates related to powdery mildew resistance possessed seven highly conserved transmembrane domains, a calmodulin-binding domain, and two conserved regions, all of which are important domains for powdery mildew resistance genes. Expressed sequence tag analysis revealed that the MLOs were induced by multiple abiotic stressors, including low and high temperature, drought, and high salinity. These findings will contribute to the functional characterization of MLOs related to PM susceptibility, and will assist in the development of disease resistance in strawberries.
Aguilar-Hernández, Victor; Aguilar-Henonin, Laura; Guzmán, Plinio
2011-01-01
Ubiquitin-ligases or E3s are components of the ubiquitin proteasome system (UPS) that coordinate the transfer of ubiquitin to the target protein. A major class of ubiquitin-ligases consists of RING-finger domain proteins that include the substrate recognition sequences in the same polypeptide; these are known as single-subunit RING finger E3s. We are studying a particular family of RING finger E3s, named ATL, that contain a transmembrane domain and the RING-H2 finger domain; none of the member of the family contains any other previously described domain. Although the study of a few members in A. thaliana and O. sativa has been reported, the role of this family in the life cycle of a plant is still vague. To provide tools to advance on the functional analysis of this family we have undertaken a phylogenetic analysis of ATLs in twenty-four plant genomes. ATLs were found in all the 24 plant species analyzed, in numbers ranging from 20-28 in two basal species to 162 in soybean. Analysis of ATLs arrayed in tandem indicates that sets of genes are expanding in a species-specific manner. To get insights into the domain architecture of ATLs we generated 75 pHMM LOGOs from 1815 ATLs, and unraveled potential protein-protein interaction regions by means of yeast two-hybrid assays. Several ATLs were found to interact with DSK2a/ubiquilin through a region at the amino-terminal end, suggesting that this is a widespread interaction that may assist in the mode of action of ATLs; the region was traced to a distinct sequence LOGO. Our analysis provides significant observations on the evolution and expansion of the ATL family in addition to information on the domain structure of this class of ubiquitin-ligases that may be involved in plant adaptation to environmental stress.
Hrle, Ajla; Maier, Lisa-Katharina; Sharma, Kundan; Ebert, Judith; Basquin, Claire; Urlaub, Henning; Marchfelder, Anita; Conti, Elena
2014-01-01
Upon pathogen invasion, bacteria and archaea activate an RNA-interference-like mechanism termed CRISPR (clustered regularly interspaced short palindromic repeats). A large family of Cas (CRISPR-associated) proteins mediates the different stages of this sophisticated immune response. Bioinformatic studies have classified the Cas proteins into families, according to their sequences and respective functions. These range from the insertion of the foreign genetic elements into the host genome to the activation of the interference machinery as well as target degradation upon attack. Cas7 family proteins are central to the type I and type III interference machineries as they constitute the backbone of the large interference complexes. Here we report the crystal structure of Thermofilum pendens Csc2, a Cas7 family protein of type I-D. We found that Csc2 forms a core RRM-like domain, flanked by three peripheral insertion domains: a lid domain, a Zinc-binding domain and a helical domain. Comparison with other Cas7 family proteins reveals a set of similar structural features both in the core and in the peripheral domains, despite the absence of significant sequence similarity. T. pendens Csc2 binds single-stranded RNA in vitro in a sequence-independent manner. Using a crosslinking - mass-spectrometry approach, we mapped the RNA-binding surface to a positively charged surface patch on T. pendens Csc2. Thus our analysis of the key structural and functional features of T. pendens Csc2 highlights recurring themes and evolutionary relationships in type I and type III Cas proteins.
Al-Qahtani, Ahmed A; Abdel-Muhsin, Abdel-Muhsin A; Dajem, Saad M Bin; AlSheikh, Adel Ali H; Bohol, Marie Fe F; Al-Ahdal, Mohammed N; Putaporntip, Chaturong; Jongwutiwes, Somchai
2016-04-01
The apical membrane antigen 1 of Plasmodium falciparum (PfAMA1) plays a crucial role in erythrocyte invasion and is a target of protective antibodies. Although domain I of PfAMA1 has been considered a promising vaccine component, extensive sequence diversity in this domain could compromise an effective vaccine design. To explore the extent of sequence diversity in domain I of PfAMA1, P. falciparum-infected blood samples from Saudi Arabia collected between 2007 and 2009 were analyzed and compared with those from worldwide parasite populations. Forty-six haplotypes and a novel codon change (M190V) were found among Saudi Arabian isolates. The haplotype diversity (0.948±0.004) and nucleotide diversity (0.0191±0.0008) were comparable to those from African hyperendemic countries. Positive selection in domain I of PfAMA1 among Saudi Arabian parasite population was observed because nonsynonymous nucleotide substitutions per nonsynonymous site (dN) significantly exceeded synonymous nucleotide substitutions per synonymous site (dS) and Tajima's D and its related statistics significantly deviated from neutrality in the positive direction. Despite a relatively low prevalence of malaria in Saudi Arabia, a minimum of 17 recombination events occurred in domain I. Genetic differentiation was significant between P. falciparum in Saudi Arabia and parasites from other geographic origins. Several shared or closely related haplotypes were found among parasites from different geographic areas, suggesting that vaccine derived from multiple shared epitopes could be effective across endemic countries. Copyright © 2016 Elsevier B.V. All rights reserved.
Ghouila, Amel; Florent, Isabelle; Guerfali, Fatma Zahra; Terrapon, Nicolas; Laouini, Dhafer; Yahia, Sadok Ben; Gascuel, Olivier; Bréhélin, Laurent
2014-01-01
Identification of protein domains is a key step for understanding protein function. Hidden Markov Models (HMMs) have proved to be a powerful tool for this task. The Pfam database notably provides a large collection of HMMs which are widely used for the annotation of proteins in sequenced organisms. This is done via sequence/HMM comparisons. However, this approach may lack sensitivity when searching for domains in divergent species. Recently, methods for HMM/HMM comparisons have been proposed and proved to be more sensitive than sequence/HMM approaches in certain cases. However, these approaches are usually not used for protein domain discovery at a genome scale, and the benefit that could be expected from their utilization for this problem has not been investigated. Using proteins of P. falciparum and L. major as examples, we investigate the extent to which HMM/HMM comparisons can identify new domain occurrences not already identified by sequence/HMM approaches. We show that although HMM/HMM comparisons are much more sensitive than sequence/HMM comparisons, they are not sufficiently accurate to be used as a standalone complement of sequence/HMM approaches at the genome scale. Hence, we propose to use domain co-occurrence--the general domain tendency to preferentially appear along with some favorite domains in the proteins--to improve the accuracy of the approach. We show that the combination of HMM/HMM comparisons and co-occurrence domain detection boosts protein annotations. At an estimated False Discovery Rate of 5%, it revealed 901 and 1098 new domains in Plasmodium and Leishmania proteins, respectively. Manual inspection of part of these predictions shows that it contains several domain families that were missing in the two organisms. All new domain occurrences have been integrated in the EuPathDomains database, along with the GO annotations that can be deduced.
Ghouila, Amel; Florent, Isabelle; Guerfali, Fatma Zahra; Terrapon, Nicolas; Laouini, Dhafer; Yahia, Sadok Ben; Gascuel, Olivier; Bréhélin, Laurent
2014-01-01
Identification of protein domains is a key step for understanding protein function. Hidden Markov Models (HMMs) have proved to be a powerful tool for this task. The Pfam database notably provides a large collection of HMMs which are widely used for the annotation of proteins in sequenced organisms. This is done via sequence/HMM comparisons. However, this approach may lack sensitivity when searching for domains in divergent species. Recently, methods for HMM/HMM comparisons have been proposed and proved to be more sensitive than sequence/HMM approaches in certain cases. However, these approaches are usually not used for protein domain discovery at a genome scale, and the benefit that could be expected from their utilization for this problem has not been investigated. Using proteins of P. falciparum and L. major as examples, we investigate the extent to which HMM/HMM comparisons can identify new domain occurrences not already identified by sequence/HMM approaches. We show that although HMM/HMM comparisons are much more sensitive than sequence/HMM comparisons, they are not sufficiently accurate to be used as a standalone complement of sequence/HMM approaches at the genome scale. Hence, we propose to use domain co-occurrence — the general domain tendency to preferentially appear along with some favorite domains in the proteins — to improve the accuracy of the approach. We show that the combination of HMM/HMM comparisons and co-occurrence domain detection boosts protein annotations. At an estimated False Discovery Rate of 5%, it revealed 901 and 1098 new domains in Plasmodium and Leishmania proteins, respectively. Manual inspection of part of these predictions shows that it contains several domain families that were missing in the two organisms. All new domain occurrences have been integrated in the EuPathDomains database, along with the GO annotations that can be deduced. PMID:24901648
A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3
Dietmann, Sabine; Park, Jong; Notredame, Cedric; Heger, Andreas; Lappe, Michael; Holm, Liisa
2001-01-01
The Dali Domain Dictionary (http://www.ebi.ac.uk/dali/domain) is a numerical taxonomy of all known structures in the Protein Data Bank (PDB). The taxonomy is derived fully automatically from measurements of structural, functional and sequence similarities. Here, we report the extension of the classification to match the traditional four hierarchical levels corresponding to: (i) supersecondary structural motifs (attractors in fold space), (ii) the topology of globular domains (fold types), (iii) remote homologues (functional families) and (iv) homologues with sequence identity above 25% (sequence families). The computational definitions of attractors and functional families are new. In September 2000, the Dali classification contained 10 531 PDB entries comprising 17 101 chains, which were partitioned into five attractor regions, 1375 fold types, 2582 functional families and 3724 domain sequence families. Sequence families were further associated with 99 582 unique homologous sequences in the HSSP database, which increases the number of effectively known structures several-fold. The resulting database contains the description of protein domain architecture, the definition of structural neighbours around each known structure, the definition of structurally conserved cores and a comprehensive library of explicit multiple alignments of distantly related protein families. PMID:11125048
Worley, K C; Wiese, B A; Smith, R F
1995-09-01
BEAUTY (BLAST enhanced alignment utility) is an enhanced version of the NCBI's BLAST data base search tool that facilitates identification of the functions of matched sequences. We have created new data bases of conserved regions and functional domains for protein sequences in NCBI's Entrez data base, and BEAUTY allows this information to be incorporated directly into BLAST search results. A Conserved Regions Data Base, containing the locations of conserved regions within Entrez protein sequences, was constructed by (1) clustering the entire data base into families, (2) aligning each family using our PIMA multiple sequence alignment program, and (3) scanning the multiple alignments to locate the conserved regions within each aligned sequence. A separate Annotated Domains Data Base was constructed by extracting the locations of all annotated domains and sites from sequences represented in the Entrez, PROSITE, BLOCKS, and PRINTS data bases. BEAUTY performs a BLAST search of those Entrez sequences with conserved regions and/or annotated domains. BEAUTY then uses the information from the Conserved Regions and Annotated Domains data bases to generate, for each matched sequence, a schematic display that allows one to directly compare the relative locations of (1) the conserved regions, (2) annotated domains and sites, and (3) the locally aligned regions matched in the BLAST search. In addition, BEAUTY search results include World-Wide Web hypertext links to a number of external data bases that provide a variety of additional types of information on the function of matched sequences. This convenient integration of protein families, conserved regions, annotated domains, alignment displays, and World-Wide Web resources greatly enhances the biological informativeness of sequence similarity searches. BEAUTY searches can be performed remotely on our system using the "BCM Search Launcher" World-Wide Web pages (URL is < http:/ /gc.bcm.tmc.edu:8088/ search-launcher/launcher.html > ).
The Human Transcript Database: A Catalogue of Full Length cDNA Inserts
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bouckk John; Michael McLeod; Kim Worley
1999-09-10
The BCM Search Launcher provided improved access to web-based sequence analysis services during the granting period and beyond. The Search Launcher web site grouped analysis procedures by function and provided default parameters that provided reasonable search results for most applications. For instance, most queries were automatically masked for repeat sequences prior to sequence database searches to avoid spurious matches. In addition to the web-based access and arrangements that were made using the functions easier, the BCM Search Launcher provided unique value-added applications like the BEAUTY sequence database search tool that combined information about protein domains and sequence database search resultsmore » to give an enhanced, more complete picture of the reliability and relative value of the information reported. This enhanced search tool made evaluating search results more straight-forward and consistent. Some of the favorite features of the web site are the sequence utilities and the batch client functionality that allows processing of multiple samples from the command line interface. One measure of the success of the BCM Search Launcher is the number of sites that have adopted the models first developed on the site. The graphic display on the BLAST search from the NCBI web site is one such outgrowth, as is the display of protein domain search results within BLAST search results, and the design of the Biology Workbench application. The logs of usage and comments from users confirm the great utility of this resource.« less
Gene and domain duplication in the chordate Otx gene family: insights from amphioxus Otx.
Williams, N A; Holland, P W
1998-05-01
We report the genomic organization and deduced protein sequence of a cephalochordate member of the Otx homeobox gene family (AmphiOtx) and show its probable single-copy state in the genome. We also present molecular phylogenetic analysis indicating that there was single ancestral Otx gene in the first chordates which was duplicated in the vertebrate lineage after it had split from the lineage leading to the cephalochordates. Duplication of a C-terminal protein domain has occurred specifically in the vertebrate lineage, strengthening the case for a single Otx gene in an ancestral chordate whose gene structure has been retained in an extant cephalochordate. Comparative analysis of protein sequences and published gene expression patterns suggest that the ancestral chordate Otx gene had roles in patterning the anterior mesendoderm and central nervous system. These roles were elaborated following Otx gene duplication in vertebrates, accompanied by regulatory and structural divergence, particularly of Otx1 descendant genes.
NASA Astrophysics Data System (ADS)
Thumb, Werner; Graf, Christine; Parslow, Tristram; Schneider, Rainer; Auer, Manfred
1999-11-01
The interaction of the human immunodeficiency virus type 1 (HIV-1) regulatory protein Rev with cellular cofactors is crucial for the viral life cycle. The HIV-1 Rev transactivation domain is functionally interchangeable with analog regions of Rev proteins of other retroviruses suggesting common folding patterns. In order to obtain experimental evidence for similar structural features mediating protein-protein contacts we investigated activation domain peptides from HIV-1, HIV-2, VISNA virus, feline immunodeficiency virus (FIV) and equine infectious anemia virus (EIAV) by CD spectroscopy, secondary structure prediction and sequence analysis. Although different in polarity and hydrophobicity, all peptides showed a similar behavior with respect to solution conformation, concentration dependence and variations in ionic strength and pH. Temperature studies revealed an unusual induction of β-structure with rising temperatures in all activation domain peptides. The high stability of β-structure in this region was demonstrated in three different peptides of the activation domain of HIV-1 Rev in solutions containing 40% hexafluoropropanol, a reagent usually known to induce α-helix into amino acid sequences. Sequence alignments revealed similarities between the polar effector domains from FIV and EIAV and the leucine rich (hydrophobic) effector domains found in HIV-1, HIV-2 and VISNA. Studies on activation domain peptides of two dominant negative HIV-1 Rev mutants, M10 and M32, pointed towards different reasons for the biological behavior. Whereas the peptide containing the M10 mutation (L 78E 79→D 78L 79) showed wild-type structure, the M32 mutant peptide (L 78L 81L 83→A 78A 81A 83) revealed a different protein fold to be the reason for the disturbed binding to cellular cofactors. From our data, we conclude, that the activation domain of Rev proteins from different viral origins adopt a similar fold and that a β-structural element is involved in binding to a cellular cofactor.
Ribeiro, José R de A; Carvalho, Patrícia M B de; Cabral, Anderson de S; Macrae, Andrew; Mendonça-Hagler, Leda C S; Berbara, Ricardo L L; Hagler, Allen N
2011-10-01
A novel yeast species within the Metschnikowiaceae is described based on a strain from the sugarcane (Saccharum sp.) rhizoplane of an organically managed farm in Rio de Janeiro, Brazil. The D1/D2 domain of the large subunit ribosomal RNA gene sequence analysis showed that the closest related species were Candida tsuchiyae with 86.2% and Candida thailandica with 86.7% of sequence identity. All three are anamorphs in the Clavispora opuntiae clade. The name Candida middelhoveniana sp. nov. is proposed to accommodate this highly divergent organism with the type strain Instituto de Microbiologia, Universidade Federal do Rio de Janeiro (IMUFRJ) 51965(T) (=Centraalbureau voor Schimmelcultures (CBS) 12306(T), Universidade Federal de Minas Gerais (UFMG)-70(T), DBVPG 8031(T)) and the GenBank/EMBL/DDBJ accession number for the D1/D2 domain LSU rDNA sequence is FN428871. The Mycobank deposit number is MB 519801.
Sequence analysis of RNase MRP RNA reveals its origination from eukaryotic RNase P RNA
Zhu, Yanglong; Stribinskis, Vilius; Ramos, Kenneth S.; Li, Yong
2006-01-01
RNase MRP is a eukaryote-specific endoribonuclease that generates RNA primers for mitochondrial DNA replication and processes precursor rRNA. RNase P is a ubiquitous endoribonuclease that cleaves precursor tRNA transcripts to produce their mature 5′ termini. We found extensive sequence homology of catalytic domains and specificity domains between their RNA subunits in many organisms. In Candida glabrata, the internal loop of helix P3 is 100% conserved between MRP and P RNAs. The helix P8 of MRP RNA from microsporidia Encephalitozoon cuniculi is identical to that of P RNA. Sequence homology can be widely spread over the whole molecule of MRP RNA and P RNA, such as those from Dictyostelium discoideum. These conserved nucleotides between the MRP and P RNAs strongly support the hypothesis that the MRP RNA is derived from the P RNA molecule in early eukaryote evolution. PMID:16540690
Complete complementary DNA-derived amino acid sequence of canine cardiac phospholamban.
Fujii, J; Ueno, A; Kitano, K; Tanaka, S; Kadoma, M; Tada, M
1987-01-01
Complementary DNA (cDNA) clones specific for phospholamban of sarcoplasmic reticulum membranes have been isolated from a canine cardiac cDNA library. The amino acid sequence deduced from the cDNA sequence indicates that phospholamban consists of 52 amino acid residues and lacks an amino-terminal signal sequence. The protein has an inferred mol wt 6,080 that is in agreement with its apparent monomeric mol wt 6,000, estimated previously by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Phospholamban contains two distinct domains, a hydrophilic region at the amino terminus (domain I) and a hydrophobic region at the carboxy terminus (domain II). We propose that domain I is localized at the cytoplasmic surface and offers phosphorylatable sites whereas domain II is anchored into the sarcoplasmic reticulum membrane. PMID:3793929
Dhir, Somdutta; Pacurar, Mircea; Franklin, Dino; Gáspári, Zoltán; Kertész-Farkas, Attila; Kocsor, András; Eisenhaber, Frank; Pongor, Sándor
2010-11-01
SBASE is a project initiated to detect known domain types and predicting domain architectures using sequence similarity searching (Simon et al., Protein Seq Data Anal, 5: 39-42, 1992, Pongor et al, Nucl. Acids. Res. 21:3111-3115, 1992). The current approach uses a curated collection of domain sequences - the SBASE domain library - and standard similarity search algorithms, followed by postprocessing which is based on a simple statistics of the domain similarity network (http://hydra.icgeb.trieste.it/sbase/). It is especially useful in detecting rare, atypical examples of known domain types which are sometimes missed even by more sophisticated methodologies. This approach does not require multiple alignment or machine learning techniques, and can be a useful complement to other domain detection methodologies. This article gives an overview of the project history as well as of the concepts and principles developed within this the project.
Harris, Golda G.; Lombardi, Patrick M.; Pemberton, Travis A.; Matsui, Tsutomu; Weiss, Thomas M.; Cole, Kathryn E.; Köksal, Mustafa; Murphy, Frank V.; Vedula, L. Sangeetha; Chou, Wayne K.W.; Cane, David E.; Christianson, David W.
2015-01-01
Geosmin synthase from Streptomyces coelicolor (ScGS) catalyzes an unusual, metal-dependent terpenoid cyclization and fragmentation reaction sequence. Two distinct active sites are required for catalysis: the N-terminal domain catalyzes the ionization and cyclization of farnesyl diphosphate to form germacradienol and inorganic pyrophosphate (PPi), and the C-terminal domain catalyzes the protonation, cyclization, and fragmentation of germacradienol to form geosmin and acetone through a retro-Prins reaction. A unique αα domain architecture is predicted for ScGS based on amino acid sequence: each domain contains the metal-binding motifs typical of a class I terpenoid cyclase, and each domain requires Mg2+ for catalysis. Here, we report the X-ray crystal structure of the unliganded N-terminal domain of ScGS and the structure of its complex with 3 Mg2+ ions and alendronate. These structures highlight conformational changes required for active site closure and catalysis. Although neither full-length ScGS nor constructs of the C-terminal domain could be crystallized, homology models of the C-terminal domain were constructed based on ~36% sequence identity with the N-terminal domain. Small-angle X-ray scattering experiments yield low resolution molecular envelopes into which the N-terminal domain crystal structure and the C-terminal domain homology model were fit, suggesting possible αα domain architectures as frameworks for bifunctional catalysis. PMID:26598179
Genes encoding calmodulin-binding proteins in the Arabidopsis genome
NASA Technical Reports Server (NTRS)
Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.
2002-01-01
Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.
Mitsuda, Nobutaka; Hisabori, Toru; Takeyasu, Kunio; Sato, Masa H
2004-07-01
A 38-bp pollen-specific cis-acting region of the AVP1 gene is involved in the expression of the Arabidopsis thaliana V-PPase during pollen development. Here, we report the isolation and structural characterization of AtVOZ1 and AtVOZ2, novel transcription factors that bind to the 38-bp cis-acting region of A. thaliana V-PPase gene, AVP1. AtVOZ1 and AtVOZ2 show 53% amino acid sequence similarity. Homologs of AtVOZ1 and AtVOZ2 are found in various vascular plants as well as a moss, Physcomitrella patens. Promoter-beta-glucuronidase reporter analysis shows that AtVOZ1 is specifically expressed in the phloem tissue and AtVOZ2 is strongly expressed in the root. In vivo transient effector-reporter analysis in A. thaliana suspension-cultured cells demonstrates that AtVOZ1 and AtVOZ2 function as transcriptional activators in the Arabidopsis cell. Two conserved regions termed Domain-A and Domain-B were identified from an alignment of AtVOZ proteins and their homologs of O. sativa and P. patens. AtVOZ2 binds as a dimer to the specific palindromic sequence, GCGTNx7ACGC, with Domain-B, which is comprised of a functional novel zinc coordinating motif and a conserved basic region. Domain-B is shown to function as both the DNA-binding and the dimerization domains of AtVOZ2. From highly the conservative nature among all identified VOZ proteins, we conclude that Domain-B is responsible for the DNA binding and dimerization of all VOZ-family proteins and designate it as the VOZ-domain.
The signaling helix: a common functional theme in diverse signaling proteins
Anantharaman, Vivek; Balaji, S; Aravind, L
2006-01-01
Background The mechanism by which the signals are transmitted between receptor and effector domains in multi-domain signaling proteins is poorly understood. Results Using sensitive sequence analysis methods we identify a conserved helical segment of around 40 residues in a wide range of signaling proteins, including numerous sensor histidine kinases such as Sln1p, and receptor guanylyl cyclases such as the atrial natriuretic peptide receptor and nitric oxide receptors. We term this helical segment the signaling (S)-helix and present evidence that it forms a novel parallel coiled-coil element, distinct from previously known helical segments in signaling proteins, such as the Dimerization-Histidine phosphotransfer module of histidine kinases, the intra-cellular domains of the chemotaxis receptors, inter-GAF domain helical linkers and the α-helical HAMP module. Analysis of domain architectures allowed us to reconstruct the domain-neighborhood graph for the S-helix, which showed that the S-helix almost always occurs between two signaling domains. Several striking patterns in the domain neighborhood of the S-helix also became evident from the graph. It most often separates diverse N-terminal sensory domains from various C-terminal catalytic signaling domains such as histidine kinases, cNMP cyclase, PP2C phosphatases, NtrC-like AAA+ ATPases and diguanylate cyclases. It might also occur between two sensory domains such as PAS domains and occasionally between a DNA-binding HTH domain and a sensory domain. The sequence conservation pattern of the S-helix revealed the presence of a unique constellation of polar residues in the dimer-interface positions within the central heptad of the coiled-coil formed by the S-helix. Conclusion Combining these observations with previously reported mutagenesis studies on different S-helix-containing proteins we suggest that it functions as a switch that prevents constitutive activation of linked downstream signaling domains. However, upon occurrence of specific conformational changes due to binding of ligand or other sensory inputs in a linked upstream domain it transmits the signal to the downstream domain. Thus, the S-helix represents one of the most prevalent functional themes involved in the flow of signals between modules in diverse prokaryote-type multi-domain signaling proteins. Reviewers This article was reviewed by Frank Eisenhaber, Arcady Mushegian and Sandor Pongor. PMID:16953892
Characterization of a novel organic solute transporter homologue from Clonorchis sinensis
Dai, Fuhong; Lee, Ji-Yun; Pak, Jhang Ho; Sohn, Woon-Mok
2018-01-01
Clonorchis sinensis is a liver fluke that can dwell in the bile ducts of mammals. Bile acid transporters function to maintain the homeostasis of bile acids in C. sinensis, as they induce physiological changes or have harmful effects on C. sinensis survival. The organic solute transporter (OST) transports mainly bile acid and belongs to the SLC51 subfamily of solute carrier transporters. OST plays a critical role in the recirculation of bile acids in higher animals. In this study, we cloned full-length cDNA of the 480-amino acid OST from C. sinensis (CsOST). Genomic analysis revealed 11 exons and nine introns. The CsOST protein had a ‘Solute_trans_a’ domain with 67% homology to Schistosoma japonicum OST. For further analysis, the CsOST protein sequence was split into the ordered domain (CsOST-N) at the N-terminus and disordered domain (CsOST-C) at the C-terminus. The tertiary structure of each domain was built using a threading-based method and determined by manual comparison. In a phylogenetic tree, the CsOST-N domain belonged to the OSTα and CsOST-C to the OSTβ clade. These two domains were more highly conserved with the OST α- and β-subunits at the structure level than at sequence level. These findings suggested that CsOST comprised the OST α- and β-subunits. CsOST was localized in the oral and ventral suckers and in the mesenchymal tissues abundant around the intestine, vitelline glands, uterus, and testes. This study provides fundamental data for the further understanding of homologues in other flukes. PMID:29702646
Prokaryotic ancestry of eukaryotic protein networks mediating innate immunity and apoptosis.
Dunin-Horkawicz, Stanislaw; Kopec, Klaus O; Lupas, Andrei N
2014-04-03
Protein domains characteristic of eukaryotic innate immunity and apoptosis have many prokaryotic counterparts of unknown function. By reconstructing interactomes computationally, we found that bacterial proteins containing these domains are part of a network that also includes other domains not hitherto associated with immunity. This network is connected to the network of prokaryotic signal transduction proteins, such as histidine kinases and chemoreceptors. The network varies considerably in domain composition and degree of paralogy, even between strains of the same species, and its repetitive domains are often amplified recently, with individual repeats sharing up to 100% sequence identity. Both phenomena are evidence of considerable evolutionary pressure and thus compatible with a role in the "arms race" between host and pathogen. In order to investigate the relationship of this network to its eukaryotic counterparts, we performed a cluster analysis of organisms based on a census of its constituent domains across all fully sequenced genomes. We obtained a large central cluster of mainly unicellular organisms, from which multicellular organisms radiate out in two main directions. One is taken by multicellular bacteria, primarily cyanobacteria and actinomycetes, and plants form an extension of this direction, connected via the basal, unicellular cyanobacteria. The second main direction is taken by animals and fungi, which form separate branches with a common root in the α-proteobacteria of the central cluster. This analysis supports the notion that the innate immunity networks of eukaryotes originated from their endosymbionts and that increases in the complexity of these networks accompanied the emergence of multicellularity. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Myocilin, a Component of a Membrane-Associated Protein Complex Driven by a Homologous Q-SNARE Domain
Dismuke, W. Michael; McKay, Brian S.; Stamer, W. Daniel
2012-01-01
Myocilin is a widely expressed protein with no known function, however, mutations in myocilin appear to manifest uniquely as ocular hypertension and the blinding disease glaucoma. Using the protein homology/analogy recognition engine (PHYRE) we find that the olfactomedin domain of myocilin is similar in sequence motif and structure to a six-bladed, kelch repeat motif based on the known crystal structures of such proteins. Additionally, using sequence analysis we identify a coiled-coil segment of myocilin with homology to human Q-SNARE proteins. Using COS-7 cells expressing full length human myocilin and a version lacking the C-terminal olfactomedin domain, we identified a membrane-associated protein complex containing myocilin by hydrodynamic analysis. The myocilin construct that included the coiled-coil but lacked the olfactomedin domain formed complexes similar to the full-length protein, indicating that the coiled-coil domain of myocilin is sufficient for myocilin to bind to the large detergent resistant complex. In human retina and retinal pigment epithelium, which express myocilin, we detected the protein in a large, SDS-resistant, membrane-associated complex. We characterized the hydrodynamic properties of myocilin in human tissues as either a 15s complex with an Mr=405,000–440,000 yielding a slightly elongated globular shape similar to known SNARE complexes or a dimer of 6.4s and Mr=108,000. By identifying the Q-SNARE homology within the second coil of myocilin and documenting its participation in a SNARE-like complex, we provide evidence of a SNARE domain containing protein associated with a human disease. PMID:22463803
The chordate proteome history database.
Levasseur, Anthony; Paganini, Julien; Dainat, Jacques; Thompson, Julie D; Poch, Olivier; Pontarotti, Pierre; Gouret, Philippe
2012-01-01
The chordate proteome history database (http://ioda.univ-provence.fr) comprises some 20,000 evolutionary analyses of proteins from chordate species. Our main objective was to characterize and study the evolutionary histories of the chordate proteome, and in particular to detect genomic events and automatic functional searches. Firstly, phylogenetic analyses based on high quality multiple sequence alignments and a robust phylogenetic pipeline were performed for the whole protein and for each individual domain. Novel approaches were developed to identify orthologs/paralogs, and predict gene duplication/gain/loss events and the occurrence of new protein architectures (domain gains, losses and shuffling). These important genetic events were localized on the phylogenetic trees and on the genomic sequence. Secondly, the phylogenetic trees were enhanced by the creation of phylogroups, whereby groups of orthologous sequences created using OrthoMCL were corrected based on the phylogenetic trees; gene family size and gene gain/loss in a given lineage could be deduced from the phylogroups. For each ortholog group obtained from the phylogenetic or the phylogroup analysis, functional information and expression data can be retrieved. Database searches can be performed easily using biological objects: protein identifier, keyword or domain, but can also be based on events, eg, domain exchange events can be retrieved. To our knowledge, this is the first database that links group clustering, phylogeny and automatic functional searches along with the detection of important events occurring during genome evolution, such as the appearance of a new domain architecture.
Tabatabaee, Akram; Siadat, Seyed Davar; Moosavi, Seyed Fazllolah; Aghasadeghi, Mohammad Reza; Memarnejadian, Arash; Pouriayevali, Mohammad Hassan; Yavari, Neda
2013-01-01
Background Nontypeable Haemophilus influenzae (NTHi) is a common cause of respiratory tract disease and initiates infection by colonization in nasopharynx. The Haemophilus influenzae (H. influenzae) Hap adhesin is an auto transporter protein that promotes initial interaction with human epithelial cells. Hap protein contains a 110 kDa internal passenger domain called “HapS” and a 45 kDa C-terminal translocator domain called “Hapβ”. Hap adhesive activity has been recently reported to be connected to its Cell Binding Domain (CBD) which resides within the 311 C-terminal residues of the internal passenger domain of the protein. Furthermore, immunization with this CBD protein has been shown to prevent bacterial nasopharynx colonization in animal models. Methods To provide enough amounts of pure HapS protein for vaccine studies, we sought to develop a highly optimized system to overexpress and purify the protein in large quantities. To this end, pET24a-cbd plasmid harboring cbd sequence from NTHi ATCC49766 was constructed and its expression was optimized by testing various expression parameters such as growth media, induction temperature, IPTG inducer concentration, induction stage and duration. SDS-PAGE and Western-blotting were used for protein analysis and confirmation and eventually the expressed protein was easily purified via immobilized metal affinity chromatography (IMAC) using Ni-NTA columns. Results The highest expression level of target protein was achieved when CBD expressing E. coli BL21 (DE3) cells were grown at 37°C in 2xTY medium with 1.0 mM IPTG at mid-log phase (OD600 nm equal to 0.6) for 5 hrs. Amino acid sequence alignment of expressed CBD protein with 3 previously published CBD amino acid sequences were more than %97 identical and antigenicity plot analysis further revealed 9 antigenic domains which appeared to be well conserved among different analyzed CBD sequences. Conclusion Due to the presence of high similarity among CBD from NTHi ATCC49766 and other NTHi strains, CBD protein expressed here sounds to be theoretically ideal as a universal candidate for being used in vaccine studies against NTHi strains of various geographical areas. Further investigations to corroborate the potency of this protein as a vaccine candidate are under process. PMID:23919121
Structural features of diverse Pin-II proteinase inhibitor genes from Capsicum annuum.
Mahajan, Neha S; Dewangan, Veena; Lomate, Purushottam R; Joshi, Rakesh S; Mishra, Manasi; Gupta, Vidya S; Giri, Ashok P
2015-02-01
The proteinase inhibitor (PI) genes from Capsicum annuum were characterized with respect to their UTR, introns and promoter elements. The occurrence of PIs with circularly permuted domain organization was evident. Several potato inhibitor II (Pin-II) type proteinase inhibitor (PI) genes have been analyzed from Capsicum annuum (L.) with respect to their differential expression during plant defense response. However, complete gene characterization of any of these C. annuum PIs (CanPIs) has not been carried out so far. Complete gene architectures of a previously identified CanPI-7 (Beads-on-string, Type A) and a member of newly isolated Bracelet type B, CanPI-69 are reported in this study. The 5' UTR (untranslated region), 3'UTR, and intronic sequences of both the CanPI genes were obtained. The genomic sequence of CanPI-7 exhibited, exon 1 (49 base pair, bp) and exon 2 (740 bp) interrupted by a 294-bp long type I intron. We noted the occurrence of three multi-domain PIs (CanPI-69, 70, 71) with circularly permuted domain organization. CanPI-69 was found to possess exon 1 (49 bp), exon 2 (551 bp) and a 584-bp long type I intron. The upstream sequence analysis of CanPI-7 and CanPI-69 predicted various transcription factor-binding sites including TATA and CAAT boxes, hormone-responsive elements (ABRELATERD1, DOFCOREZM, ERELEE4), and a defense-responsive element (WRKY71OS). Binding of transcription factors such as zinc finger motif MADS-box and MYB to the promoter regions was confirmed using electrophoretic mobility shift assay followed by mass spectrometric identification. The 3' UTR analysis for 25 CanPI genes revealed unique/distinct 3' UTR sequence for each gene. Structures of three domain CanPIs of type A and B were predicted and further analyzed for their attributes. This investigation of CanPI gene architecture will enable the better understanding of the genetic elements present in CanPIs.
Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas
2014-01-01
The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881
Chappell, J D; Gunn, V L; Wetzel, J D; Baer, G S; Dermody, T S
1997-03-01
The reovirus attachment protein, sigma1, determines numerous aspects of reovirus-induced disease, including viral virulence, pathways of spread, and tropism for certain types of cells in the central nervous system. The sigma1 protein projects from the virion surface and consists of two distinct morphologic domains, a virion-distal globular domain known as the head and an elongated fibrous domain, termed the tail, which is anchored into the virion capsid. To better understand structure-function relationships of sigma1 protein, we conducted experiments to identify sequences in sigma1 important for viral binding to sialic acid, a component of the receptor for type 3 reovirus. Three serotype 3 reovirus strains incapable of binding sialylated receptors were adapted to growth in murine erythroleukemia (MEL) cells, in which sialic acid is essential for reovirus infectivity. MEL-adapted (MA) mutant viruses isolated by serial passage in MEL cells acquired the capacity to bind sialic acid-containing receptors and demonstrated a dependence on sialic acid for infection of MEL cells. Analysis of reassortant viruses isolated from crosses of an MA mutant virus and a reovirus strain that does not bind sialic acid indicated that the sigma1 protein is solely responsible for efficient growth of MA mutant viruses in MEL cells. The deduced sigma1 amino acid sequences of the MA mutant viruses revealed that each strain contains a substitution within a short region of sequence in the sigma1 tail predicted to form beta-sheet. These studies identify specific sequences that determine the capacity of reovirus to bind sialylated receptors and suggest a location for a sialic acid-binding domain. Furthermore, the results support a model in which type 3 sigma1 protein contains discrete receptor binding domains, one in the head and another in the tail that binds sialic acid.
In silico structural analysis of group 3, 6 and 9 allergens from Dermatophagoides farinae.
Teng, Feixiang; Yu, Lili; Bian, Yonghua; Sun, Jinxia; Wu, Juansong; Ling, Cunbao; Yang, Li; Wang, Yungang; Cui, Yubao
2015-05-01
Dermatophagoides farinae (Hughes; Acari: Pyroglyphidae) are the predominant source of dust mite allergens, which provoke allergic diseases, such as rhinitis, asthma and eczema. Of the 30 allergen groups produced by D. farinae, the Der f 3, Der f 6 and Der f 9 allergens are all trypsin‑associated proteins, however little else is currently known about them. The present study used in silico tools to compare the amino acid sequences, and predict the secondary and tertiary structures of Der f 3, Der f 6 and Der f 9 allergens. Protein sequence alignment detected ~46% identity between Der f 3, Der f 6 and Der f 9. Furthermore, each protein was shown to contain three active sites and two highly conserved trypsin functional domains. Predictions of the secondary and tertiary structure identified α‑helices, β‑sheets and random coils. The active sites of the three proteins appeared to fold onto each other in a three‑dimensional model, constituting the active site of the enzyme. Epitope analysis demonstrated that Der f 3, Der f 6 and Der f 9 have 4‑5 potential epitopes located in random coils, and the epitope sequences of Der f 3, Der f 6 and Der f 9 were shown to overlap in two domains (at amino acids 83‑87 and 179‑180); however the residues in these two domains were not identical. The present study aimed to conduct a biochemical and genetic analysis of these three allergens, and to potentially contribute to the development of vaccines for allergen‑specific immunotherapy.
Pu, L; Zhang, L C; Zhang, J S; Song, X; Wang, L G; Liang, J; Zhang, Y B; Liu, X; Yan, H; Zhang, T; Yue, J W; Li, N; Wu, Q Q; Wang, L X
2016-08-12
Mitogen-activated protein kinase kinase kinase 5 (MAP3K5) is essential for apoptosis, proliferation, differentiation, and immune responses, and is a candidate marker for residual feed intake (RFI) in pig. We cloned the full-length cDNA sequence of porcine MAP3K5 by rapid-amplification of cDNA ends. The 5451-bp gene contains a 5'-untranslated region (UTR) (718 bp), a coding region (3738 bp), and a 3'-UTR (995 bp), and encodes a peptide of 1245 amino acids, which shares 97, 99, 97, 93, 91, and 84% sequence identity with cattle, sheep, human, mouse, chicken, and zebrafish MAP3K5, respectively. The deduced MAP3K5 protein sequence contains two conserved domains: a DUF4071 domain and a protein kinase domain. Phylogenetic analysis showed that porcine MAP3K5 forms a separate branch to vicugna and camel MAP3K5. Tissue expression analysis using real-time quantitative polymerase chain reaction (qRT-PCR) revealed that MAP3K5 was expressed in the heart, liver, spleen, lung, kidney, muscle, fat, pancrea, ileum, and stomach tissues. Copy number variation was detected for porcine MAP3K5 and validated by qRT-PCR. Furthermore, a significant increase in average copy number was detected in the low RFI group when compared to the high RFI group in a Duroc pig population. These results provide useful information regarding the influence of MAP3K5 on RFI in pigs.
Sharma, Parichit; Mantri, Shrikant S
2014-01-01
The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis.
Sharma, Parichit; Mantri, Shrikant S.
2014-01-01
The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis. PMID:24979410
NovelFam3000 – Uncharacterized human protein domains conserved across model organisms
Kemmer, Danielle; Podowski, Raf M; Arenillas, David; Lim, Jonathan; Hodges, Emily; Roth, Peggy; Sonnhammer, Erik LL; Höög, Christer; Wasserman, Wyeth W
2006-01-01
Background Despite significant efforts from the research community, an extensive portion of the proteins encoded by human genes lack an assigned cellular function. Most metazoan proteins are composed of structural and/or functional domains, of which many appear in multiple proteins. Once a domain is characterized in one protein, the presence of a similar sequence in an uncharacterized protein serves as a basis for inference of function. Thus knowledge of a domain's function, or the protein within which it arises, can facilitate the analysis of an entire set of proteins. Description From the Pfam domain database, we extracted uncharacterized protein domains represented in proteins from humans, worms, and flies. A data centre was created to facilitate the analysis of the uncharacterized domain-containing proteins. The centre both provides researchers with links to dispersed internet resources containing gene-specific experimental data and enables them to post relevant experimental results or comments. For each human gene in the system, a characterization score is posted, allowing users to track the progress of characterization over time or to identify for study uncharacterized domains in well-characterized genes. As a test of the system, a subset of 39 domains was selected for analysis and the experimental results posted to the NovelFam3000 system. For 25 human protein members of these 39 domain families, detailed sub-cellular localizations were determined. Specific observations are presented based on the analysis of the integrated information provided through the online NovelFam3000 system. Conclusion Consistent experimental results between multiple members of a domain family allow for inferences of the domain's functional role. We unite bioinformatics resources and experimental data in order to accelerate the functional characterization of scarcely annotated domain families. PMID:16533400
Near-Complete Genome Sequence of a Novel Single-Stranded RNA Virus Discovered in Indoor Air
2018-01-01
ABSTRACT Viral metagenomic analysis of heating, ventilation, and air conditioning (HVAC) filters recovered the near-complete genome sequence of a novel virus, named HVAC-associated RNA virus 1 (HVAC-RV1). The HVAC-RV1 genome is most similar to those of picorna-like viruses identified in arthropods but encodes a small domain observed only in negative-sense single-stranded RNA viruses. PMID:29567746
Yafremava, Liudmila S; Di Giulio, Massimo; Caetano-Anollés, Gustavo
2013-01-01
Amino acid substitution patterns between the nonbarophilic Pyrococcus furiosus and its barophilic relative P. abyssi confirm that hydrostatic pressure asymmetry indices reflect the extent to which amino acids are preferred by barophilic archaeal organisms. Substitution patterns in entire protein sequences, shared protein domains defined at fold superfamily level, domains in homologous sequence pairs, and domains of very ancient and very recent origin now provide further clues about the environment that led to the genetic code and diversified life. The pyrococcal proteomes are very similar and share a very early ancestor. Relative amino acid abundance analyses showed that biases in the use of amino acids are due to their shared fold superfamilies. Within these repertoires, only two of the five amino acids that are preferentially barophilic, aspartic acid and arginine, displayed this preference significantly and consistently across structure and in domains appearing in the ancestor. The more primordial asparagine, lysine and threonine displayed a consistent preference for nonbarophily across structure and in the ancestor. Since barophilic preferences are already evident in ancient domains that are at least ~3 billion year old, we conclude that barophily is a very ancient trait that unfolded concurrently with genetic idiosyncrasies in convergence towards a universal code.
Genome-Wide Identification and Comparative Analysis of Albumin Family in Vertebrates
Li, Shugang; Cao, Yiping; Geng, Fang
2017-01-01
Albumins are the most well-known globular proteins, and the most typical representatives are the serum albumins. However, less attention was paid to the albumin family, except for the human and bovine serum albumin. To characterize the features of albumin family, we have mined all the putative albumin proteins from the available genome sequences. The results showed that albumin is widely distributed in vertebrates, but not present in the bacteria and archaea. The phylogenetic analysis of vertebrate albumin family implied an evolutionary relationship between members of serum albumin, α-fetoprotein, vitamin D–binding protein, and afamin. Meanwhile, a new member from the albumin family was found, namely, extracellular matrix protein 1. The structural analysis revealed that the motifs for forming the internal disulfide bonds are highly conserved in the albumin family, despite the low overall sequence identity across the family. The domain arrangement of albumin proteins indicated that most of vertebrate albumins contain 3 characteristic domains, arising from 2 evolutionary patterns. And a significant trend has been observed that the albumin proteins in higher vertebrate species tend to possess more characteristic domains. This study has provided the fundamental information required for achieving a better understanding of the albumin distribution, phylogenetic relationship, characteristic motif, structure, and new insights into the evolutionary pattern. PMID:28680266
Wang, Renjie; Normand, Christophe; Gadal, Olivier
2016-01-01
Spatial organization of the genome has important impacts on all aspects of chromosome biology, including transcription, replication, and DNA repair. Frequent interactions of some chromosome domains with specific nuclear compartments, such as the nucleolus, are now well documented using genome-scale methods. However, direct measurement of distance and interaction frequency between loci requires microscopic observation of specific genomic domains and the nucleolus, followed by image analysis to allow quantification. The fluorescent repressor operator system (FROS) is an invaluable method to fluorescently tag DNA sequences and investigate chromosome position and dynamics in living cells. This chapter describes a combination of methods to define motion and region of confinement of a locus relative to the nucleolus in cell's nucleus, from fluorescence acquisition to automated image analysis using two dedicated pipelines.
McNicholas, Paul; Wei, Yi; Whitcomb, Jeannette; Greaves, Wayne; Black, Todd A; Tremblay, Cecile L; Strizki, Julie M
2010-05-15
Vicriviroc is a C-C motif chemokine receptor 5 (CCR5) antagonist that is in clinical development for the treatment of human immunodeficiency virus type 1 (HIV-1) infection. This study explored the molecular basis for the development of phenotypically resistant virus. HIV-1 RNA from treatment-naive subjects who experienced virological failure in a phase 2 dose-finding trial was evaluated for coreceptor usage and susceptibility. For viruses that exhibited reduced susceptibility to vicriviroc, envelope clones were phenotypically and genotypically characterized. Twenty-six vicriviroc-treated subjects experienced virological failure; for 24 the virus remained CCR5-tropic, and 2 had dual/X4 virus. Reduced susceptibility to vicriviroc, manifested as decreases in the maximum percent inhibition value (no increase in median inhibitory concentration), was detected in 4 of the 26 subjects who experienced virological failure. Clonal analysis of envelopes in samples from these 4 subjects revealed multiple sequence changes in gp160, principally within the variable domain 1/variable domain 2, variable domain 3, and variable domain 4 loops. However, no consistent pattern of mutations was observed across subjects. In this study, only a small proportion of treatment failures were associated with tropism changes or reduced susceptibility to vicriviroc. Genotypic analysis of cloned env sequences revealed no specific mutational pattern associated with reduced susceptibility to vicriviroc, although numerous changes were observed in the variable domain 3 loop and in other regions of gp160.
Gonzalez, Patrice; Labarère, Jacques
1998-01-01
A comparative study of variable domains V4, V6, and V9 of the mitochondrial small-subunit (SSU) rRNA was carried out with the genus Agrocybe by PCR amplification of 42 wild isolates belonging to 10 species, Agrocybe aegerita, Agrocybe dura, Agrocybe chaxingu, Agrocybe erebia, Agrocybe firma, Agrocybe praecox, Agrocybe paludosa, Agrocybe pediades, Agrocybe alnetorum, and Agrocybe vervacti. Sequencing of the PCR products showed that the three domains in the isolates belonging to the same species were the same length and had the same sequence, while variations were found among the 10 species. Alignment of the sequences showed that nucleotide motifs encountered in the smallest sequence of each variable domain were also found in the largest sequence, indicating that the sequences evolved by insertion-deletion events. Determination of the secondary structure of each domain revealed that the insertion-deletion events commonly occurred in regions not directly involved in the secondary structure (i.e., the loops). Moreover, conserved sequences ranging from 4 to 25 nucleotides long were found at the beginning and end of each domain and could constitute genus-specific sequences. Comparisons of the V4, V6, and V9 secondary structures resulted in identification of the following four groups: (i) group I, which was characterized by the presence of additional P23-1 and P23-3 helices in the V4 domain and the lack of the P49-1 helix in V9 and included A. aegerita, A. chaxingu, and A. erebia; (ii) group II, which had the P23-3 helix in V4 and the P49-1 helix in V9 and included A. pediades; (iii) group III, which did not have additional helices in V4, had the P49-1 helix in V9 and included A. paludosa, A. firma, A. alnetorum, and A. praecox; and (iv) group IV, which lacked both the V4 additional helices and the P49-1 helix in V9 and included A. vervacti and A. dura. This grouping of species was supported by the structure of a consensus tree based on the variable domain sequences. The conservation of the sequences of the V4, V6, and V9 domains of the mitochondrial SSU rRNA within species and the high degree of interspecific variation found in the Agrocybe species studied open the way for these sequences to be used as specific molecular markers of the Basidiomycota. PMID:9797259
Gonzalez, P; Labarère, J
1998-11-01
A comparative study of variable domains V4, V6, and V9 of the mitochondrial small-subunit (SSU) rRNA was carried out with the genus Agrocybe by PCR amplification of 42 wild isolates belonging to 10 species, Agrocybe aegerita, Agrocybe dura, Agrocybe chaxingu, Agrocybe erebia, Agrocybe firma, Agrocybe praecox, Agrocybe paludosa, Agrocybe pediades, Agrocybe alnetorum, and Agrocybe vervacti. Sequencing of the PCR products showed that the three domains in the isolates belonging to the same species were the same length and had the same sequence, while variations were found among the 10 species. Alignment of the sequences showed that nucleotide motifs encountered in the smallest sequence of each variable domain were also found in the largest sequence, indicating that the sequences evolved by insertion-deletion events. Determination of the secondary structure of each domain revealed that the insertion-deletion events commonly occurred in regions not directly involved in the secondary structure (i.e., the loops). Moreover, conserved sequences ranging from 4 to 25 nucleotides long were found at the beginning and end of each domain and could constitute genus-specific sequences. Comparisons of the V4, V6, and V9 secondary structures resulted in identification of the following four groups: (i) group I, which was characterized by the presence of additional P23-1 and P23-3 helices in the V4 domain and the lack of the P49-1 helix in V9 and included A. aegerita, A. chaxingu, and A. erebia; (ii) group II, which had the P23-3 helix in V4 and the P49-1 helix in V9 and included A. pediades; (iii) group III, which did not have additional helices in V4, had the P49-1 helix in V9 and included A. paludosa, A. firma, A. alnetorum, and A. praecox; and (iv) group IV, which lacked both the V4 additional helices and the P49-1 helix in V9 and included A. vervacti and A. dura. This grouping of species was supported by the structure of a consensus tree based on the variable domain sequences. The conservation of the sequences of the V4, V6, and V9 domains of the mitochondrial SSU rRNA within species and the high degree of interspecific variation found in the Agrocybe species studied open the way for these sequences to be used as specific molecular markers of the Basidiomycota.
Santamaría-Hernando, Saray; Krell, Tino; Ramos-González, María-Isabel
2012-01-01
Proteins of the animal heme peroxidase (ANP) superfamily differ greatly in size since they have either one or two catalytic domains that match profile PS50292. The orf PP_2561 of Pseudomonas putida KT2440 that we have called PepA encodes a two-domain ANP. The alignment of these domains with those of PepA homologues revealed a variable number of insertions with the consensus G-x-D-G-x-x-[GN]-[TN]-x-D-D. This motif has also been detected in the structure of pseudopilin (pdb 3G20), where it was found to be involved in Ca(2+) coordination although a sequence analysis did not reveal the presence of any known calcium binding motifs in this protein. Isothermal titration calorimetry revealed that a peptide containing this consensus motif bound specifically calcium ions with affinities ranging between 33-79 µM depending on the pH. Microcalorimetric titrations of the purified N-terminal ANP-like domain of PepA revealed Ca(2+) binding with a K(D) of 12 µM and stoichiometry of 1.25 calcium ions per protein monomer. This domain exhibited peroxidase activity after its reconstitution with heme. These data led to the definition of a novel calcium binding motif that we have termed PERCAL and which was abundantly present in animal peroxidase-like domains of bacterial proteins. Bacterial heme peroxidases thus possess two different types of calcium binding motifs, namely PERCAL and the related hemolysin type calcium binding motif, with the latter being located outside the catalytic domains and in their C-terminal end. A phylogenetic tree of ANP-like catalytic domains of bacterial proteins with PERCAL motifs, including single domain peroxidases, was divided into two major clusters, representing domains with and without PERCAL motif containing insertions. We have verified that the recently reported classification of bacterial heme peroxidases in two families (cd09819 and cd09821) is unrelated to these insertions. Sequences matching PERCAL were detected in all kingdoms of life.
Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine
2011-03-10
Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de.
Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine
2011-01-01
Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de. PMID:21423752
USDA-ARS?s Scientific Manuscript database
Plant class IV chitinases are composed of a carboxy-terminal chitinase domain that is attached, through a linker sequence, to a small amino-terminal domain that can be thought of as a structured peptide. While both the peptide-like domain and the chitinase domain share sequence homology throughout m...
DOGMA: domain-based transcriptome and proteome quality assessment.
Dohmen, Elias; Kremer, Lukas P M; Bornberg-Bauer, Erich; Kemena, Carsten
2016-09-01
Genome studies have become cheaper and easier than ever before, due to the decreased costs of high-throughput sequencing and the free availability of analysis software. However, the quality of genome or transcriptome assemblies can vary a lot. Therefore, quality assessment of assemblies and annotations are crucial aspects of genome analysis pipelines. We developed DOGMA, a program for fast and easy quality assessment of transcriptome and proteome data based on conserved protein domains. DOGMA measures the completeness of a given transcriptome or proteome and provides information about domain content for further analysis. DOGMA provides a very fast way to do quality assessment within seconds. DOGMA is implemented in Python and published under GNU GPL v.3 license. The source code is available on https://ebbgit.uni-muenster.de/domainWorld/DOGMA/ CONTACTS: e.dohmen@wwu.de or c.kemena@wwu.de Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Peptide Array X-Linking (PAX): A New Peptide-Protein Identification Approach
Okada, Hirokazu; Uezu, Akiyoshi; Soderblom, Erik J.; Moseley, M. Arthur; Gertler, Frank B.; Soderling, Scott H.
2012-01-01
Many protein interaction domains bind short peptides based on canonical sequence consensus motifs. Here we report the development of a peptide array-based proteomics tool to identify proteins directly interacting with ligand peptides from cell lysates. Array-formatted bait peptides containing an amino acid-derived cross-linker are photo-induced to crosslink with interacting proteins from lysates of interest. Indirect associations are removed by high stringency washes under denaturing conditions. Covalently trapped proteins are subsequently identified by LC-MS/MS and screened by cluster analysis and domain scanning. We apply this methodology to peptides with different proline-containing consensus sequences and show successful identifications from brain lysates of known and novel proteins containing polyproline motif-binding domains such as EH, EVH1, SH3, WW domains. These results suggest the capacity of arrayed peptide ligands to capture and subsequently identify proteins by mass spectrometry is relatively broad and robust. Additionally, the approach is rapid and applicable to cell or tissue fractions from any source, making the approach a flexible tool for initial protein-protein interaction discovery. PMID:22606326
Dong, Chongmei; Vincent, Kate; Sharp, Peter
2009-12-04
TILLING (Targeting Induced Local Lesions IN Genomes) is a powerful tool for reverse genetics, combining traditional chemical mutagenesis with high-throughput PCR-based mutation detection to discover induced mutations that alter protein function. The most popular mutation detection method for TILLING is a mismatch cleavage assay using the endonuclease CelI. For this method, locus-specific PCR is essential. Most wheat genes are present as three similar sequences with high homology in exons and low homology in introns. Locus-specific primers can usually be designed in introns. However, it is sometimes difficult to design locus-specific PCR primers in a conserved region with high homology among the three homoeologous genes, or in a gene lacking introns, or if information on introns is not available. Here we describe a mutation detection method which combines High Resolution Melting (HRM) analysis of mixed PCR amplicons containing three homoeologous gene fragments and sequence analysis using Mutation Surveyor software, aimed at simultaneous detection of mutations in three homoeologous genes. We demonstrate that High Resolution Melting (HRM) analysis can be used in mutation scans in mixed PCR amplicons containing three homoeologous gene fragments. Combining HRM scanning with sequence analysis using Mutation Surveyor is sensitive enough to detect a single nucleotide mutation in the heterozygous state in a mixed PCR amplicon containing three homoeoloci. The method was tested and validated in an EMS (ethylmethane sulfonate)-treated wheat TILLING population, screening mutations in the carboxyl terminal domain of the Starch Synthase II (SSII) gene. Selected identified mutations of interest can be further analysed by cloning to confirm the mutation and determine the genomic origin of the mutation. Polyploidy is common in plants. Conserved regions of a gene often represent functional domains and have high sequence similarity between homoeologous loci. The method described here is a useful alternative to locus-specific based methods for screening mutations in conserved functional domains of homoeologous genes. This method can also be used for SNP (single nucleotide polymorphism) marker development and eco-TILLING in polyploid species.
Mutation of domain III and domain VI in L gene conserved domain of Nipah virus
NASA Astrophysics Data System (ADS)
Jalani, Siti Aishah; Ibrahim, Nazlina
2016-11-01
Nipah virus (NiV) is the etiologic agent responsible for the respiratory illness and causes fatal encephalitis in human. NiV L protein subunit is thought to be responsible for the majority of enzymatic activities involved in viral transcription and replication. The L protein which is the viral RNA dependent RNA polymerase has high sequence homology among negative sense RNA viruses. In negative stranded RNA viruses, based on sequence alignment six conserved domain (domain I-IV) have been determined. Each domain is separated on variable regions that suggest the structure to consist concatenated functional domain. To directly address the roles of domains III and VI, site-directed mutations were constructed by the substitution of bases at sequences 2497, 2500, 5528 and 5532. Each mutated L gene can be used in future studies to test the ability for expression on in vitro translation.
Puranik, Swati; Bahadur, Ranjit Prasad; Srivastava, Prem S; Prasad, Manoj
2011-10-01
The plant-specific NAC (NAM, ATAF, and CUC) transcription factors have diverse role in development and stress regulation. A transcript encoding NAC protein, termed SiNAC was identified from a salt stress subtractive cDNA library of S. italica seedling (Puranik et al., J Plant Physiol 168:280-287, 2011). This single/low copy gene containing four exons and four introns within the genomic-sequence encoded a protein of 462 amino acids. Structural analysis revealed that highly divergent C terminus contains a transmembrane domain. The NAC domain consisted of a twisted antiparallel beta-sheet packing against N terminal alpha helix on one side and a shorter helix on the other side. The domain was predicted to homodimerize and control DNA-binding specificity. The physicochemical features of the SiNAC homodimer interface justified the dimeric form of the predicted model. A 1539 bp fragment upstream to the start codon of SiNAC gene was cloned and in silico analysis revealed several putative cis-acting regulatory elements within the promoter sequence. Transactivation analysis indicated that SiNAC activated expression of reporter gene and the activation domain lied at the C terminal. The SiNAC:GFP was detected in the nucleus and cytoplasm while SiNAC ΔC(1-158):GFP was nuclear localized in onion epidermal cells. SiNAC transcripts mostly accumulated in young spikes and were strongly induced by dehydration, salinity, ethephon, and methyl jasmonate. These results suggest that SiNAC encodes a membrane associated NAC-domain protein that may function as a transcriptional activator in response to stress and developmental regulation in plants.
Molecular beacon sequence design algorithm.
Monroe, W Todd; Haselton, Frederick R
2003-01-01
A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.
Jin, Qijiang; Hu, Xin; Li, Xin; Wang, Bei; Wang, Yanjie; Jiang, Hongwei; Mattson, Neil; Xu, Yingchun
2016-01-01
Trehalose-6-phosphate synthase (TPS) plays a key role in plant carbohydrate metabolism and the perception of carbohydrate availability. In the present work, the publicly available Nelumbo nucifera (lotus) genome sequence database was analyzed which led to identification of nine lotus TPS genes (NnTPS). It was found that at least two introns are included in the coding sequences of NnTPS genes. When the motif compositions were analyzed we found that NnTPS generally shared the similar motifs, implying that they have similar functions. The dN/dS ratios were always less than 1 for different domains and regions outside domains, suggesting purifying selection on the lotus TPS gene family. The regions outside TPS domain evolved relatively faster than NnTPS domains. A phylogenetic tree was constructed using all predicted coding sequences of lotus TPS genes, together with those from Arabidopsis, poplar, soybean, and rice. The result indicated that those TPS genes could be clearly divided into two main subfamilies (I-II), where each subfamily could be further divided into 2 (I) and 5 (II) subgroups. Analyses of divergence and adaptive evolution show that purifying selection may have been the main force driving evolution of plant TPS genes. Some of the critical sites that contributed to divergence may have been under positive selection. Transcriptome data analysis revealed that most NnTPS genes were predominantly expressed in sink tissues. Expression pattern of NnTPS genes under copper and submergence stress indicated that NNU_014679 and NNU_022788 might play important roles in lotus energy metabolism and participate in stress response. Our results can facilitate further functional studies of TPS genes in lotus. PMID:27746792
Davison, Michelle; Treangen, Todd J; Koren, Sergey; Pop, Mihai; Bhaya, Devaki
2016-01-01
The polymicrobial biofilm communities in Mushroom and Octopus Spring in Yellowstone National Park (YNP) are well characterized, yet little is known about the phage populations. Dominant species, Synechococcus sp. JA-2-3B'a(2-13), Synechococcus sp. JA-3-3Ab, Chloroflexus sp. Y-400-fl, and Roseiflexus sp. RS-1, contain multiple CRISPR-Cas arrays, suggesting complex interactions with phage predators. To analyze phage populations from Octopus Spring biofilms, we sequenced a viral enriched fraction. To assemble and analyze phage metagenomic data, we developed a custom module, VIRITAS, implemented within the MetAMOS framework. This module bins contigs into groups based on tetranucleotide frequencies and CRISPR spacer-protospacer matching and ORF calling. Using this pipeline we were able to assemble phage sequences into contigs and bin them into three clusters that corroborated with their potential host range. The virome contained 52,348 predicted ORFs; some were clearly phage-like; 9319 ORFs had a recognizable Pfam domain while the rest were hypothetical. Of the recognized domains with CRISPR spacer matches, was the phage endolysin used by lytic phage to disrupt cells. Analysis of the endolysins present in the thermophilic cyanophage contigs revealed a subset of characterized endolysins as well as a Glyco_hydro_108 (PF05838) domain not previously associated with sequenced cyanophages. A search for CRISPR spacer matches to all identified phage endolysins demonstrated that a majority of endolysin domains were targets. This strategy provides a general way to link host and phage as endolysins are known to be widely distributed in bacteriophage. Endolysins can also provide information about host cell wall composition and have the additional potential to be used as targets for novel therapeutics.
Prediction of Protein Structure by Template-Based Modeling Combined with the UNRES Force Field.
Krupa, Paweł; Mozolewska, Magdalena A; Joo, Keehyoung; Lee, Jooyoung; Czaplewski, Cezary; Liwo, Adam
2015-06-22
A new approach to the prediction of protein structures that uses distance and backbone virtual-bond dihedral angle restraints derived from template-based models and simulations with the united residue (UNRES) force field is proposed. The approach combines the accuracy and reliability of template-based methods for the segments of the target sequence with high similarity to those having known structures with the ability of UNRES to pack the domains correctly. Multiplexed replica-exchange molecular dynamics with restraints derived from template-based models of a given target, in which each restraint is weighted according to the accuracy of the prediction of the corresponding section of the molecule, is used to search the conformational space, and the weighted histogram analysis method and cluster analysis are applied to determine the families of the most probable conformations, from which candidate predictions are selected. To test the capability of the method to recover template-based models from restraints, five single-domain proteins with structures that have been well-predicted by template-based methods were used; it was found that the resulting structures were of the same quality as the best of the original models. To assess whether the new approach can improve template-based predictions with incorrectly predicted domain packing, four such targets were selected from the CASP10 targets; for three of them the new approach resulted in significantly better predictions compared with the original template-based models. The new approach can be used to predict the structures of proteins for which good templates can be found for sections of the sequence or an overall good template can be found for the entire sequence but the prediction quality is remarkably weaker in putative domain-linker regions.
Rsp5 WW domains interact directly with the carboxyl-terminal domain of RNA polymerase II.
Chang, A; Cheang, S; Espanel, X; Sudol, M
2000-07-07
RSP5 is an essential gene in Saccharomyces cerevisiae and was recently shown to form a physical and functional complex with RNA polymerase II (RNA pol II). The amino-terminal half of Rsp5 consists of four domains: a C2 domain, which binds membrane phospholipids; and three WW domains, which are protein interaction modules that bind proline-rich ligands. The carboxyl-terminal half of Rsp5 contains a HECT (homologous to E6-AP carboxyl terminus) domain that catalytically ligates ubiquitin to proteins and functionally classifies Rsp5 as an E3 ubiquitin-protein ligase. The C2 and WW domains are presumed to act as membrane localization and substrate recognition modules, respectively. We report that the second (and possibly third) Rsp5 WW domain mediates binding to the carboxyl-terminal domain (CTD) of the RNA pol II large subunit. The CTD comprises a heptamer (YSPTSPS) repeated 26 times and a PXY core that is critical for interaction with a specific group of WW domains. An analysis of synthetic peptides revealed a minimal CTD sequence that is sufficient to bind to the second Rsp5 WW domain (Rsp5 WW2) in vitro and in yeast two-hybrid assays. Furthermore, we found that specific "imperfect" CTD repeats can form a complex with Rsp5 WW2. In addition, we have shown that phosphorylation of this minimal CTD sequence on serine, threonine and tyrosine residues acts as a negative regulator of the Rsp5 WW2-CTD interaction. In view of the recent data pertaining to phosphorylation-driven interactions between the RNA pol II CTD and the WW domain of Ess1/Pin1, we suggest that CTD dephosphorylation may be a prerequisite for targeted RNA pol II degradation.
SH2-catalytic domain linker heterogeneity influences allosteric coupling across the SFK family.
Register, A C; Leonard, Stephen E; Maly, Dustin J
2014-11-11
Src-family kinases (SFKs) make up a family of nine homologous multidomain tyrosine kinases whose misregulation is responsible for human disease (cancer, diabetes, inflammation, etc.). Despite overall sequence homology and identical domain architecture, differences in SH3 and SH2 regulatory domain accessibility and ability to allosterically autoinhibit the ATP-binding site have been observed for the prototypical SFKs Src and Hck. Biochemical and structural studies indicate that the SH2-catalytic domain (SH2-CD) linker, the intramolecular binding epitope for SFK SH3 domains, is responsible for allosterically coupling SH3 domain engagement to autoinhibition of the ATP-binding site through the conformation of the αC helix. As a relatively unconserved region between SFK family members, SH2-CD linker sequence variability across the SFK family is likely a source of nonredundant cellular functions between individual SFKs via its effect on the availability of SH3 and SH2 domains for intermolecular interactions and post-translational modification. Using a combination of SFKs engineered with enhanced or weakened regulatory domain intramolecular interactions and conformation-selective inhibitors that report αC helix conformation, this study explores how SH2-CD sequence heterogeneity affects allosteric coupling across the SFK family by examining Lyn, Fyn1, and Fyn2. Analyses of Fyn1 and Fyn2, isoforms that are identical but for a 50-residue sequence spanning the SH2-CD linker, demonstrate that SH2-CD linker sequence differences can have profound effects on allosteric coupling between otherwise identical kinases. Most notably, a dampened allosteric connection between the SH3 domain and αC helix leads to greater autoinhibitory phosphorylation by Csk, illustrating the complex effects of SH2-CD linker sequence on cellular function.
Zhang, Zhen; Liu, Qun; Hendrickson, Wayne A.
2014-01-01
The adult human gut presents a complicated ecosystem where host-bacterium symbiosis plays an important role. Bacteroides thetaiotaomicron is a predominant member of the gut microflora, providing the human digestive tract with a large number of glycolytic enzymes. Expression of many of these enzymes appears to be controlled by histidine kinase receptors that are fused into unusual hybrid two-component systems that share homologous periplasmic sensor domains. These sensor domains belong to the third most populated (HK3) family based on a previous bioinformatics analysis of predicted histidine kinase sensors. Here, we present crystal structures of two sensor domains representative of the HK3 family. Each sensor is folded into three domains: two seven-bladed β-propeller domains and one β-sandwich domain. Both sensors form dimers in crystals and one sensor appears to be physiologically relevant. The folding characteristics in the individual domains, the domain organization, and the oligomeric architecture are all unique to the HK3 sensors. The sequence analysis of the HK3 sensors indicates that these sensors are shared among other signaling molecules, implying a combinatorial molecular evolution. PMID:24995510
Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius
Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.
2010-01-01
Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665
Estrada-Gómez, Sebastian; Vargas-Muñoz, Leidy Johana; Saldarriaga-Córdoba, Mónica; Cifuentes, Yeimy; Perafan, Carlos
2017-04-01
Theraphosidae spider venoms are well known for possess a complex mixture of protein and non-protein compounds in their venom. The objective of this study was to report and identify different proteins translated from the venom gland DNA information of the recently described Theraphosidae spider Pamphobeteus verdolaga. Using a venom gland transcriptomic analysis, we reported a set of the first complete sequences of seven different proteins of the recenlty described Theraphosidae spider P. verdolaga. Protein analysis indicates the presence of different proteins on the venom composition of this new spider, some of them uncommon in the Theraphosidae family. MS/MS analysis of P. verdolaga showed different fragments matching sphingomyelinases (sicaritoxin), barytoxins, hexatoxins, latroinsectotoxins, and linear (zadotoxins) peptides. Only four of the MS/MS fragments showed 100% sequence similarity with one of the transcribed proteins. Transcriptomic analysis showed the presence of different groups of proteins like phospholipases, hyaluronidases, inhibitory cysteine knots (ICK) peptides among others. The three database of protein domains used in this study (Pfam, SMART and CDD) showed congruency in the search of unique conserved protein domain for only four of the translated proteins. Those proteins matched with EF-hand proteins, cysteine rich secretory proteins, jingzhaotoxins, theraphotoxins and hexatoxins, from different Mygalomorphae spiders belonging to the families Theraphosidae, Barychelidae and Hexathelidae. None of the analyzed sequences showed a complete 100% similarity. Copyright © 2017 Elsevier Ltd. All rights reserved.
Evidence for an uncommon alpha-actinin protein in Trichomonas vaginalis.
Bricheux, G; Coffe, G; Pradel, N; Brugerolle, G
1998-09-15
As part of our ongoing project of identification of actin-binding proteins implicated in the cell transition (flagellate to amoeboid/adherent) of Trichomonas vaginalis, we have characterized an alpha-actinin-related protein in this parasite. The protein (P100) has a molecular mass of 100 kDa and an isoelectric point of 5.5. A monoclonal antibody raised against this protein co-localizes with the actin network. P100 gene transcripts are co-expressed with actin throughout the cell cycle. Analysis of the deduced protein sequence reveals three domains: an N-terminal actin-binding region; a central region rich in alpha-helix; and a C-terminal domain with Ca(2+)-binding capacity. Whereas the N- and C-terminal regions are well-conserved as compared to other alpha-actinins, we observe in the central region an atypical distribution of residues in five repeats. The sequence of the repeats does not show any homology with the rod domain of the other alpha-actinins, except for the first repeat which shows some similarity. The four other repeats of T. vaginalis P100 appear to result from a duplication event which is not detectable in the other sequences.
Do pattern recognition skills transfer across sports? A preliminary analysis.
Smeeton, Nicholas J; Ward, Paul; Williams, A Mark
2004-02-01
The ability to recognize patterns of play is fundamental to performance in team sports. While typically assumed to be domain-specific, pattern recognition skills may transfer from one sport to another if similarities exist in the perceptual features and their relations and/or the strategies used to encode and retrieve relevant information. A transfer paradigm was employed to compare skilled and less skilled soccer, field hockey and volleyball players' pattern recognition skills. Participants viewed structured and unstructured action sequences from each sport, half of which were randomly represented with clips not previously seen. The task was to identify previously viewed action sequences quickly and accurately. Transfer of pattern recognition skill was dependent on the participant's skill, sport practised, nature of the task and degree of structure. The skilled soccer and hockey players were quicker than the skilled volleyball players at recognizing structured soccer and hockey action sequences. Performance differences were not observed on the structured volleyball trials between the skilled soccer, field hockey and volleyball players. The skilled field hockey and soccer players were able to transfer perceptual information or strategies between their respective sports. The less skilled participants' results were less clear. Implications for domain-specific expertise, transfer and diversity across domains are discussed.
Quantitative analysis of the anti-noise performance of an m-sequence in an electromagnetic method
NASA Astrophysics Data System (ADS)
Yuan, Zhe; Zhang, Yiming; Zheng, Qijia
2018-02-01
An electromagnetic method with a transmitted waveform coded by an m-sequence achieved better anti-noise performance compared to the conventional manner with a square-wave. The anti-noise performance of the m-sequence varied with multiple coding parameters; hence, a quantitative analysis of the anti-noise performance for m-sequences with different coding parameters was required to optimize them. This paper proposes the concept of an identification system, with the identified Earth impulse response obtained by measuring the system output with the input of the voltage response. A quantitative analysis of the anti-noise performance of the m-sequence was achieved by analyzing the amplitude-frequency response of the corresponding identification system. The effects of the coding parameters on the anti-noise performance are summarized by numerical simulation, and their optimization is further discussed in our conclusions; the validity of the conclusions is further verified by field experiment. The quantitative analysis method proposed in this paper provides a new insight into the anti-noise mechanism of the m-sequence, and could be used to evaluate the anti-noise performance of artificial sources in other time-domain exploration methods, such as the seismic method.
Proteomics analysis of immunoprecipitated proteins associated with the oncogenic kinase cot.
Wu, Binhui; Wilmouth, R C
2008-02-29
Cancer Osaka thyroid, also known as Tpl-2 (Cot) is a member of the MAP3K kinase family and plays a key role in the regulation of the immune response to pro-inflammatory stimuli such as lipopolysaccharide (LPS) and tumour necrosis factor-alpha (TNF-alpha). A series of Cot constructs with an N-terminal 6xHis tag were transiently expressed in HEK293 cells: Cot(130-399) (kinase domain), Cot(1-388) (N-terminal and kinase domains), Cot(1-413), Cot(1-438) (containing a putative PEST sequence), Cot(1-457) (containing both PEST and degron sequences) and Cot(1-467) (full-length protein). These Cot proteins were pulled down using an anti-6xHis antibody and separated by 2D electrophoresis. The gels were silver-stained and 21 proteins were detected that did not appear, or had substantially reduced intensity, in the control sample. Three of these were identified by MS and MS/MS analysis as Hsp90, Hsp70 and Grp78. Hsp90 appeared to bind to the kinase domain of Cot and this interaction was further investigated using co-immuno-precipitation with both overexpressed Cot in HEK293 cells and endogenous Cot in Hela cells.
Bimolata, Waikhom; Kumar, Anirudh; Sundaram, Raman Meenakshi; Laha, Gouri Shankar; Qureshi, Insaf Ahmed; Reddy, Gajjala Ashok; Ghazi, Irfan Ahmad
2013-08-01
Xa27 is one of the important R-genes, effective against bacterial blight disease of rice caused by Xanthomonas oryzae pv. oryzae (Xoo). Using natural population of Oryza, we analyzed the sequence variation in the functionally important domains of Xa27 across the Oryza species. DNA sequences of Xa27 alleles from 27 rice accessions revealed higher nucleotide diversity among the reported R-genes of rice. Sequence polymorphism analysis revealed synonymous and non-synonymous mutations in addition to a number of InDels in non-coding regions of the gene. High sequence variation was observed in the promoter region including the 5'UTR with 'π' value 0.00916 and 'θ w ' = 0.01785. Comparative analysis of the identified Xa27 alleles with that of IRBB27 and IR24 indicated the operation of both positive selection (Ka/Ks > 1) and neutral selection (Ka/Ks ≈ 0). The genetic distances of alleles of the gene from Oryza nivara were nearer to IRBB27 as compared to IR24. We also found the presence of conserved and null UPT (upregulated by transcriptional activator) box in the isolated alleles. Considerable amino acid polymorphism was localized in the trans-membrane domain for which the functional significance is yet to be elucidated. However, the absence of functional UPT box in all the alleles except IRBB27 suggests the maintenance of single resistant allele throughout the natural population.
Recombinant soluble adenovirus receptor
Freimuth, Paul I.
2002-01-01
Disclosed are isolated polypeptides from human CAR (coxsackievirus and adenovirus receptor) protein which bind adenovirus. Specifically disclosed are amino acid sequences which corresponds to adenovirus binding domain D1 and the entire extracellular domain of human CAR protein comprising D1 and D2. In other aspects, the disclosure relates to nucleic acid sequences encoding these domains as well as expression vectors which encode the domains and bacterial cells containing such vectors. Also disclosed is an isolated fusion protein comprised of the D1 polypeptide sequence fused to a polypeptide sequence which facilitates folding of D1 into a functional, soluble domain when expressed in bacteria. The functional D1 domain finds application for example in a therapeutic method for treating a patient infected with a virus which binds to D1, and also in a method for identifying an antiviral compound which interferes with viral attachment. Also included is a method for specifically targeting a cell for infection by a virus which binds to D1.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Q Zhai; M Landesman; H Robinson
2011-12-31
Retroviral Gag proteins contain short late-domain motifs that recruit cellular ESCRT pathway proteins to facilitate virus budding. ALIX-binding late domains often contain the core consensus sequence YPX{sub n}L (where X{sub n} can vary in sequence and length). However, some simian immunodeficiency virus (SIV) Gag proteins lack this consensus sequence, yet still bind ALIX. We mapped divergent, ALIX-binding late domains within the p6{sup Gag} proteins of SIV{sub MAC239} ({sub 40}SREK{und P}YKE{und VT}ED{und L}LHLNSLF{sub 59}) and SIV{sub agmTan-1} ({sub 24}AAG{und A}YDP{und AR}KL{und L}EQYAKK{sub 41}). Crystal structures revealed that anchoring tyrosines (in lightface) and nearby hydrophobic residues (underlined) contact the ALIX V domain,more » revealing how lentiviruses employ a diverse family of late-domain sequences to bind ALIX and promote virus budding.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhai, Q.; Robinson, H.; Landesman, M. B.
2011-01-01
Retroviral Gag proteins contain short late-domain motifs that recruit cellular ESCRT pathway proteins to facilitate virus budding. ALIX-binding late domains often contain the core consensus sequence YPX{sub n}L (where X{sub n} can vary in sequence and length). However, some simian immunodeficiency virus (SIV) Gag proteins lack this consensus sequence, yet still bind ALIX. We mapped divergent, ALIX-binding late domains within the p6{sup Gag} proteins of SIV{sub mac239} ({sub 40}SREK{und P}YKE{und VT}ED{und L}LHLNSLF{sub 59}) and SIV{sub agmTan-1} ({sub 24}AAG{und A}YDP{und AR}KL{und L}EQYAKK{sub 41}). Crystal structures revealed that anchoring tyrosines (in lightface) and nearby hydrophobic residues (underlined) contact the ALIX V domain,more » revealing how lentiviruses employ a diverse family of late-domain sequences to bind ALIX and promote virus budding.« less
Phosphorylation-regulated Binding of RNA Polymerase II to Fibrous Polymers of Low Complexity Domains
Xiang, Siheng; Wu, Leeju; Theodoropoulos, Pano; Mirzaei, Hamid; Han, Tina; Xie, Shanhai; Corden, Jeffry L.; McKnight, Steven L.
2014-01-01
SUMMARY The low complexity (LC) domains of the products of the fused in sarcoma (FUS), Ewings sarcoma (EWS) and TAF15 genes are translocated onto a variety of different DNA-binding domains and thereby assist in driving the formation of cancerous cells. In the context of the translocated fusion proteins, these LC sequences function as transcriptional activation domains. Here we show that polymeric fibers formed from these LC domains directly bind the C-terminal domain (CTD) of RNA polymerase II in a manner reversible by phosphorylation of the iterated, heptad repeats of the CTD. Mutational analysis indicates that the degree of binding between the CTD and the LC domain polymers correlates with the strength of transcriptional activation. These studies offer a simple means of conceptualizing how RNA polymerase II is recruited to active genes in its unphosphorylated state, and released for elongation following phosphorylation of the CTD. PMID:24267890
Garamszegi, Sara; Franzosa, Eric A.; Xia, Yu
2013-01-01
A central challenge in host-pathogen systems biology is the elucidation of general, systems-level principles that distinguish host-pathogen interactions from within-host interactions. Current analyses of host-pathogen and within-host protein-protein interaction networks are largely limited by their resolution, treating proteins as nodes and interactions as edges. Here, we construct a domain-resolved map of human-virus and within-human protein-protein interaction networks by annotating protein interactions with high-coverage, high-accuracy, domain-centric interaction mechanisms: (1) domain-domain interactions, in which a domain in one protein binds to a domain in a second protein, and (2) domain-motif interactions, in which a domain in one protein binds to a short, linear peptide motif in a second protein. Analysis of these domain-resolved networks reveals, for the first time, significant mechanistic differences between virus-human and within-human interactions at the resolution of single domains. While human proteins tend to compete with each other for domain binding sites by means of sequence similarity, viral proteins tend to compete with human proteins for domain binding sites in the absence of sequence similarity. Independent of their previously established preference for targeting human protein hubs, viral proteins also preferentially target human proteins containing linear motif-binding domains. Compared to human proteins, viral proteins participate in more domain-motif interactions, target more unique linear motif-binding domains per residue, and contain more unique linear motifs per residue. Together, these results suggest that viruses surmount genome size constraints by convergently evolving multiple short linear motifs in order to effectively mimic, hijack, and manipulate complex host processes for their survival. Our domain-resolved analyses reveal unique signatures of pleiotropy, economy, and convergent evolution in viral-host interactions that are otherwise hidden in the traditional binary network, highlighting the power and necessity of high-resolution approaches in host-pathogen systems biology. PMID:24339775
Garamszegi, Sara; Franzosa, Eric A; Xia, Yu
2013-01-01
A central challenge in host-pathogen systems biology is the elucidation of general, systems-level principles that distinguish host-pathogen interactions from within-host interactions. Current analyses of host-pathogen and within-host protein-protein interaction networks are largely limited by their resolution, treating proteins as nodes and interactions as edges. Here, we construct a domain-resolved map of human-virus and within-human protein-protein interaction networks by annotating protein interactions with high-coverage, high-accuracy, domain-centric interaction mechanisms: (1) domain-domain interactions, in which a domain in one protein binds to a domain in a second protein, and (2) domain-motif interactions, in which a domain in one protein binds to a short, linear peptide motif in a second protein. Analysis of these domain-resolved networks reveals, for the first time, significant mechanistic differences between virus-human and within-human interactions at the resolution of single domains. While human proteins tend to compete with each other for domain binding sites by means of sequence similarity, viral proteins tend to compete with human proteins for domain binding sites in the absence of sequence similarity. Independent of their previously established preference for targeting human protein hubs, viral proteins also preferentially target human proteins containing linear motif-binding domains. Compared to human proteins, viral proteins participate in more domain-motif interactions, target more unique linear motif-binding domains per residue, and contain more unique linear motifs per residue. Together, these results suggest that viruses surmount genome size constraints by convergently evolving multiple short linear motifs in order to effectively mimic, hijack, and manipulate complex host processes for their survival. Our domain-resolved analyses reveal unique signatures of pleiotropy, economy, and convergent evolution in viral-host interactions that are otherwise hidden in the traditional binary network, highlighting the power and necessity of high-resolution approaches in host-pathogen systems biology.
Nature of the protein universe
Levitt, Michael
2009-01-01
The protein universe is the set of all proteins of all organisms. Here, all currently known sequences are analyzed in terms of families that have single-domain or multidomain architectures and whether they have a known three-dimensional structure. Growth of new single-domain families is very slow: Almost all growth comes from new multidomain architectures that are combinations of domains characterized by ≈15,000 sequence profiles. Single-domain families are mostly shared by the major groups of organisms, whereas multidomain architectures are specific and account for species diversity. There are known structures for a quarter of the single-domain families, and >70% of all sequences can be partially modeled thanks to their membership in these families. PMID:19541617
Lahm, H; Hoeflich, A; Andre, S; Sordat, B; Kaltner, H; Wolf, E; Gabius, H J
2000-09-01
The family of Ca2+-independent galactoside-binding lectins with the beta-strand topology of the jelly-roll, referred to as galectins, is known to mediate and modulate a variety of cellular activities. Their functional versatility explains the current interest in monitoring their expression in cancer research, so far primarily focused on galectin-1 and -3. Tandem-repeat-type galectin-9 and its (most probably) allelic variant ecalectin, a potent eosinophil chemoattractant, are known to be human leukocyte products. We show by RT-PCR with primers specific for both that their mRNA is expressed in 17 of 21 human colorectal cancer lines. As also indicated by restriction analysis, in addition to the expected transcript of 571 bp an otherwise identical isoform coding for a 32-amino acid extension of the link peptide was detected. Positive cell lines differentially expressed either one (7 lines) or both transcripts (10 lines). Sequence analysis of RT-PCR products, performed in four cases, allowed to assign the standard transcript to ecalectin in the case of SW480 cells and detected two point mutations in the insert of the link peptide-coding sequence in WiDr and Colo205. Furthermore, this analysis identified the insertion of a single nucleotide into the coding sequence generating a frame-shift mutation, an event which has so far not been reported for any galectin. This alteration encountered in both transcripts of the WiDr line and the isoform transcript of Colo205 cells will most likely truncate the protein part within the second (C-terminal) carbohydrate recognition domain. Our results thus reveal the presence of mRNA for a galectin-9-isoform or a potent eosinophil chemoattractant (ecalectin) or a truncated version thereof with preserved N-terminal carbohydrate recognition domain in established human colon cancer cell lines.
Piccoli, Giovanni; Onofri, Franco; Cirnaru, Maria Daniela; Kaiser, Christoph J. O.; Jagtap, Pravinkumar; Kastenmüller, Andreas; Pischedda, Francesca; Marte, Antonella; von Zweydorf, Felix; Vogt, Andreas; Giesert, Florian; Pan, Lifeng; Antonucci, Flavia; Kiel, Christina; Zhang, Mingjie; Weinkauf, Sevil; Sattler, Michael; Sala, Carlo; Matteoli, Michela; Ueffing, Marius
2014-01-01
Mutations in the leucine-rich repeat kinase 2 gene (LRRK2) are associated with familial and sporadic Parkinson's disease (PD). LRRK2 is a complex protein that consists of multiple domains, including predicted C-terminal WD40 repeats. In this study, we analyzed functional and molecular features conferred by the WD40 domain. Electron microscopic analysis of the purified LRRK2 C-terminal domain revealed doughnut-shaped particles, providing experimental evidence for its WD40 fold. We demonstrate that LRRK2 WD40 binds and sequesters synaptic vesicles via interaction with vesicle-associated proteins. In fact, a domain-based pulldown approach combined with mass spectrometric analysis identified LRRK2 as being part of a highly specific protein network involved in synaptic vesicle trafficking. In addition, we found that a C-terminal sequence variant associated with an increased risk of developing PD, G2385R, correlates with a reduced binding affinity of LRRK2 WD40 to synaptic vesicles. Our data demonstrate a critical role of the WD40 domain within LRRK2 function. PMID:24687852
MutationAligner: a resource of recurrent mutation hotspots in protein domains in cancer
Gauthier, Nicholas Paul; Reznik, Ed; Gao, Jianjiong; Sumer, Selcuk Onur; Schultz, Nikolaus; Sander, Chris; Miller, Martin L.
2016-01-01
The MutationAligner web resource, available at http://www.mutationaligner.org, enables discovery and exploration of somatic mutation hotspots identified in protein domains in currently (mid-2015) more than 5000 cancer patient samples across 22 different tumor types. Using multiple sequence alignments of protein domains in the human genome, we extend the principle of recurrence analysis by aggregating mutations in homologous positions across sets of paralogous genes. Protein domain analysis enhances the statistical power to detect cancer-relevant mutations and links mutations to the specific biological functions encoded in domains. We illustrate how the MutationAligner database and interactive web tool can be used to explore, visualize and analyze mutation hotspots in protein domains across genes and tumor types. We believe that MutationAligner will be an important resource for the cancer research community by providing detailed clues for the functional importance of particular mutations, as well as for the design of functional genomics experiments and for decision support in precision medicine. MutationAligner is slated to be periodically updated to incorporate additional analyses and new data from cancer genomics projects. PMID:26590264
Nedelcu, Aurora M
2009-03-01
Programmed cell death (PCD) represents a significant component of normal growth and development in multicellular organisms. Recently, PCD-like processes have been reported in single-celled eukaryotes, implying that some components of the PCD machinery existed early in eukaryotic evolution. This study provides a comparative analysis of PCD-related sequences across more than 50 unicellular genera from four eukaryotic supergroups: Unikonts, Excavata, Chromalveolata, and Plantae. A complex set of PCD-related sequences that correspond to domains or proteins associated with all main functional classes--from ligands and receptors to executors of PCD--was found in many unicellular lineages. Several PCD domains and proteins previously thought to be restricted to animals or land plants are also present in unicellular species. Noteworthy, the yeast, Saccharomyces cerevisiae--used as an experimental model system for PCD research, has a rather reduced set of PCD-related sequences relative to other unicellular species. The phylogenetic distribution of the PCD-related sequences identified in unicellular lineages suggests that the genetic basis for the evolution of the complex PCD machinery present in extant multicellular lineages has been established early in the evolution of eukaryotes. The shaping of the PCD machinery in multicellular lineages involved the duplication, co-option, recruitment, and shuffling of domains already present in their unicellular ancestors.
Ma, G X; Zhou, R Q; Hu, L; Luo, Y L; Luo, Y F; Zhu, H H
2018-03-01
Toxocara canis is an important but neglected zoonotic parasite, and is the causative agent of human toxocariasis. Chondroitin proteoglycans are biological macromolecules, widely distributed in extracellular matrices, with a great diversity of functions in mammals. However, there is limited information regarding chondroitin proteoglycans in nematode parasites. In the present study, a female-enriched chondroitin proteoglycan 2 gene of T. canis (Tc-cpg-2) was cloned and characterized. Quantitative real-time polymerase chain reaction (qRT-PCR) was employed to measure the transcription levels of Tc-cpg-2 among tissues of male and female adult worms. A 485-amino-acid (aa) polypeptide was predicted from a continuous 1458-nuleotide open reading frame and designated as TcCPG2, which contains a 21-aa signal peptide. Conserved domain searching indicated three chitin-binding peritrophin-A (CBM_14) domains in the amino acid sequence of TcCPG2. Multiple alignment with the inferred amino acid sequences of Caenorhabditis elegans and Ascaris suum showed that CBM_14 domains were well conserved among these species. Phylogenetic analysis suggested that TcCPG2 was closely related to the sequence of chondroitin proteoglycan 2 of A. suum. Interestingly, a high level of Tc-cpg-2 was detected in female germline tissues, particularly in the oviduct, suggesting potential roles of this gene in reproduction (e.g. oogenesis and embryogenesis) of adult T. canis. The functional roles of Tc-cpg-2 in reproduction and development in this parasite and related parasitic nematodes warrant further functional studies.
Immunoglobulin from Antarctic fish species of Rajidae family.
Coscia, Maria Rosaria; Cocca, Ennio; Giacomelli, Stefano; Cuccaro, Fausta; Oreste, Umberto
2012-03-01
Immunoglobulins (Ig) of Chondroichthyes have been extensively studied in sharks; in contrast, in skates investigations on Ig remain scarce and fragmentary despite the high occurrence of skates in all of the major oceans of the world. To focus on Rajidae Igμ, the most abundant heavy chain isotype, we have chosen the Antarctic species Bathyraja eatonii, Bathyraja albomaculata, Bathyraja brachyurops, and Amblyraja georgiana which live at high latitudes in the Southern Ocean, and at very low temperatures. We prepared mRNA from the spleen of individuals of each species and performed RT-PCR experiments using two oligonucleotides designed on the alignment of various elasmobranch Igμ heavy chain sequences available in GenBank. The PCR products, about 1400-nt long, were cloned and sequenced. Nucleotide sequence identities calculated for the constant region domains ranged from 88.5% to 97.5% between species, and from 91.1% to 99.7% within species. In a distance tree, including also Raja erinacea sequences, two major branches were obtained, one containing Arhynchobatinae sequences, the other one Rajinae sequences. Four presumptive D gene segments were identified in the region of the VH/D/JH recombination; two different D segments were often found in the same sequence. Moreover, 5-15 genomic fragments of different lengths, carrying the gene locus encoding Igμ chain were revealed by Southern blotting analysis. B. eatonii amino acid sequences were analyzed for the positional diversity by Shannon entropy analysis, showing CH4 as the most conserved domain, and CH3 as the most variable one. B. eatonii CDR3 region length varied between 11 and 15 amino acid residues; the mean length (13.4 aa) was greater than that of Leucoraja eglanteria sequences (7.7 aa). An alignment of representative sequences of Antarctic species and R. erinacea showed that more cysteine residues not involved in the intradomain disulfide bridges were present in Antarctic species. Copyright © 2011 Elsevier B.V. All rights reserved.
Brown, D P; Idler, K B; Katz, L
1990-01-01
The 18.1-kilobase plasmid pSE211 integrates into the chromosome of Saccharopolyspora erythraea at a specific attB site. Restriction analysis of the integrated plasmid, pSE211int, and adjacent chromosomal sequences allowed identification of attP, the plasmid attachment site. Nucleotide sequencing of attP, attB, attL, and attR revealed a 57-base-pair sequence common to all sites with no duplications of adjacent plasmid or chromosomal sequences in the integrated state, indicating that integration takes place through conservative, reciprocal strand exchange. An analysis of the sequences indicated the presence of a putative gene for Phe-tRNA at attB which is preserved at attL after integration has occurred. A comparison of the attB site for a number of actinomycete plasmids is presented. Integration at attB was also observed when a 2.4-kilobase segment of pSE211 containing attP and the adjacent plasmid sequence was used to transform a pSE211- host. Nucleotide sequencing of this segment revealed the presence of two complete open reading frames (ORFs) and a segment of a third ORF. The ORF adjacent to attP encodes a putative polypeptide 437 amino acids in length that shows similarity, at its C-terminal domain, to sequences of site-specific recombinases of the integrase family. The adjacent ORF encodes a putative 98-amino-acid basic polypeptide that contains a helix-turn-helix motif at its N terminus which corresponds to domains in the Xis proteins of a number of bacteriophages. A proposal for the function of this polypeptide is presented. The deduced amino acid sequence of the third ORF did not reveal similarities to polypeptide sequences in the current data banks. Images FIG. 2 FIG. 3 PMID:2180909
Alawad, Abdullah; Alharbi, Sultan; Alhazzaa, Othman; Alagrafi, Faisal; Alkhrayef, Mohammed; Alhamdan, Ziyad; Alenazi, Abdullah; Al-Johi, Hasan; Alanazi, Ibrahim O; Hammad, Mohamed
2016-01-01
Although the sequencing information of Sox2 cDNA for many mammalian is available, the Sox2 cDNA of Camelus dromedaries has not yet been characterized. The objective of this study was to sequence and characterize Sox2 cDNA from the brain of C. dromedarius (also known as Arabian camel). A full coding sequence of the Sox2 gene from the brain of C. dromedarius was amplified by reverse transcription PCRjmc and then sequenced using the 3730XL series platform Sequencer (Applied Biosystem) for the first time. The cDNA sequence displayed an open reading frame of 822 nucleotides, encoding a protein of 273 amino acids. The molecular weight and the isoelectric point of the translated protein were calculated as 29.825 kDa and 10.11, respectively, using bioinformatics analysis. The predicted cSox2 protein sequence exhibited high identity: 99% for Homo sapiens, Mus musculus, Bos taurus, and Vicugna pacos; 98% for Sus scrofa and 93% for Camelus ferus. A 3D structure was built based on the available crystal structure of the HMG-box domain of human stem cell transcription factor Sox2 (PDB: 2 LE4) with 81 residues and predicting bioinformatics software for 273 amino acid residues. The comparison confirms the presence of the HMG-box domain in the cSox2 protein. The orthologous phylogenetic analysis showed that the Sox2 isoform from C. dromedarius was grouped with humans, alpacas, cattle, and pigs. We believe that this genetic and structural information will be a helpful source for the annotation. Furthermore, Sox2 is one of the transcription factors that contributes to the generation-induced pluripotent stem cells (iPSCs), which in turn will probably help generate camel induced pluripotent stem cells (CiPSCs).
Application of viromics: a new approach to the understanding of viral infections in humans.
Ramamurthy, Mageshbabu; Sankar, Sathish; Kannangai, Rajesh; Nandagopal, Balaji; Sridharan, Gopalan
2017-12-01
This review is focused at exploring the strengths of modern technology driven data compiled in the areas of virus gene sequencing, virus protein structures and their implication to viral diagnosis and therapy. The information for virome analysis (viromics) is generated by the study of viral genomes (entire nucleotide sequence) and viral genes (coding for protein). Presently, the study of viral infectious diseases in terms of etiopathogenesis and development of newer therapeutics is undergoing rapid changes. Currently, viromics relies on deep sequencing, next generation sequencing (NGS) data and public domain databases like GenBank and unique virus specific databases. Two commonly used NGS platforms: Illumina and Ion Torrent, recommend maximum fragment lengths of about 300 and 400 nucleotides for analysis respectively. Direct detection of viruses in clinical samples is now evolving using these methods. Presently, there are a considerable number of good treatment options for HBV/HIV/HCV. These viruses however show development of drug resistance. The drug susceptibility regions of the genomes are sequenced and the prediction of drug resistance is now possible from 3 public domains available on the web. This has been made possible through advances in the technology with the advent of high throughput sequencing and meta-analysis through sophisticated and easy to use software and the use of high speed computers for bioinformatics. More recently NGS technology has been improved with single-molecule real-time sequencing. Here complete long reads can be obtained with less error overcoming a limitation of the NGS which is inherently prone to software anomalies that arise in the hands of personnel without adequate training. The development in understanding the viruses in terms of their genome, pathobiology, transcriptomics and molecular epidemiology constitutes viromics. It could be stated that these developments will bring about radical changes and advancement especially in the field of antiviral therapy and diagnostic virology.
Molecular Characterization of Epiphytic Bacterial Communities on Charophycean Green Algae
Fisher, Madeline M.; Wilcox, Lee W.; Graham, Linda E.
1998-01-01
Epiphytic bacterial communities within the sheath material of three filamentous green algae, Desmidium grevillii, Hyalotheca dissiliens, and Spondylosium pulchrum (class Charophyceae, order Zygnematales), collected from a Sphagnum bog were characterized by PCR amplification, cloning, and sequencing of 16S ribosomal DNA. A total of 20 partial sequences and nine different sequence types were obtained, and one sequence type was recovered from the bacterial communities on all three algae. By phylogenetic analysis, the cloned sequences were placed into several major lineages of the Bacteria domain: the Flexibacter/Cytophaga/Bacteroides phylum and the α, β, and γ subdivisions of the phylum Proteobacteria. Analysis at the subphylum level revealed that the majority of our sequences were not closely affiliated with those of known, cultured taxa, although the estimated evolutionary distances between our sequences and their nearest neighbors were always less than 0.1 (i.e., greater than 90% similar). This result suggests that the majority of sequences obtained in this study represent as yet phenotypically undescribed bacterial species and that the range of bacterial-algal interactions that occur in nature has not yet been fully described. PMID:9797295
In silico analysis of subtilisin from Glaciozyma antarctica PI12
NASA Astrophysics Data System (ADS)
Mustafha, Siti Mardhiah; Murad, Abdul Munir Abdul; Mahadi, Nor Muhammad; Kamaruddin, Shazilah; Bakar, Farah Diba Abu
2015-09-01
Subtilisin constitute as a major player in industrial enzymes that has a wide range of application especially in the detergent industry. In this study, a cDNA encoding for subtilisin (GaSUBT) was extracted from the psychrophilic yeast, Glaciozyma antarctica PI12, PCR amplified and sequenced. Various bioinformatics tools were used to characterize the GaSUBT. GaSUBT contains 1587 bp nucleotides encoding for 529 amino acids. The predicted molecular weight of the deduced protein is 55.34 kDa with an isoelectric point of 6.25. GaSUBT was predicted to possess a signal peptide and pro-peptide consisting of a peptidase inhibitor I9 sequence. From the sequence alignment analysis of deduced amino acids with other subtilisins in the NCBI database showed that the sequences surrounding the catalytic triad that forms the catalytic domain are well conserved.
Chronodes: Interactive Multifocus Exploration of Event Sequences
POLACK, PETER J.; CHEN, SHANG-TSE; KAHNG, MINSUK; DE BARBARO, KAYA; BASOLE, RAHUL; SHARMIN, MOUSHUMI; CHAU, DUEN HORNG
2018-01-01
The advent of mobile health (mHealth) technologies challenges the capabilities of current visualizations, interactive tools, and algorithms. We present Chronodes, an interactive system that unifies data mining and human-centric visualization techniques to support explorative analysis of longitudinal mHealth data. Chronodes extracts and visualizes frequent event sequences that reveal chronological patterns across multiple participant timelines of mHealth data. It then combines novel interaction and visualization techniques to enable multifocus event sequence analysis, which allows health researchers to interactively define, explore, and compare groups of participant behaviors using event sequence combinations. Through summarizing insights gained from a pilot study with 20 behavioral and biomedical health experts, we discuss Chronodes’s efficacy and potential impact in the mHealth domain. Ultimately, we outline important open challenges in mHealth, and offer recommendations and design guidelines for future research. PMID:29515937
Wan, Cen; Lees, Jonathan G; Minneci, Federico; Orengo, Christine A; Jones, David T
2017-10-01
Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster.
Pons, T; Hernández, L; Batista, F R; Chinea, G
2000-11-01
The three-dimensional (3D) structure of fructan biosynthetic enzymes is still unknown. Here, we have explored folding similarities between reported microbial and plant enzymes that catalyze transfructosylation reactions. A sequence-structure compatibility search using TOPITS, SDP, 3D-PSSM, and SAM-T98 programs identified a beta-propeller fold with scores above the confidence threshold that indicate a structurally conserved catalytic domain in fructosyltransferases (FTFs) of diverse origin and substrate specificity. The predicted fold appeared related to that of neuraminidase and sialidase, of glycoside hydrolase families 33 and 34, respectively. The most reliable structural model was obtained using the crystal structure of neuraminidase (Protein Data Bank file: 5nn9) as template, and it is consistent with the location of previously identified functional residues of bacterial levansucrases (Batista et al., 1999; Song & Jacques, 1999). The sequence-sequence analysis presented here reinforces the recent inclusion of fungal and plant FTFs into glycoside hydrolase family 32, and suggests a modified sequence pattern H-x (2)-[PTV]-x (4)-[LIVMA]-[NSCAYG]-[DE]-P-[NDSC][GA]3 for this family.
Pons, T.; Hernández, L.; Batista, F. R.; Chinea, G.
2000-01-01
The three-dimensional (3D) structure of fructan biosynthetic enzymes is still unknown. Here, we have explored folding similarities between reported microbial and plant enzymes that catalyze transfructosylation reactions. A sequence-structure compatibility search using TOPITS, SDP, 3D-PSSM, and SAM-T98 programs identified a beta-propeller fold with scores above the confidence threshold that indicate a structurally conserved catalytic domain in fructosyltransferases (FTFs) of diverse origin and substrate specificity. The predicted fold appeared related to that of neuraminidase and sialidase, of glycoside hydrolase families 33 and 34, respectively. The most reliable structural model was obtained using the crystal structure of neuraminidase (Protein Data Bank file: 5nn9) as template, and it is consistent with the location of previously identified functional residues of bacterial levansucrases (Batista et al., 1999; Song & Jacques, 1999). The sequence-sequence analysis presented here reinforces the recent inclusion of fungal and plant FTFs into glycoside hydrolase family 32, and suggests a modified sequence pattern H-x (2)-[PTV]-x (4)-[LIVMA]-[NSCAYG]-[DE]-P-[NDSC][GA]3 for this family. PMID:11305239
NASA Astrophysics Data System (ADS)
Shao, Xupeng
2017-04-01
Glutenite bodies are widely developed in northern Minfeng zone of Dongying Sag. Their litho-electric relationship is not clear. In addition, as the conventional sequence stratigraphic research method drawbacks of involving too many subjective human factors, it has limited deepening of the regional sequence stratigraphic research. The wavelet transform technique based on logging data and the time-frequency analysis technique based on seismic data have advantages of dividing sequence stratigraphy quantitatively comparing with the conventional methods. Under the basis of the conventional sequence research method, this paper used the above techniques to divide the fourth-order sequence of the upper Es4 in northern Minfeng zone of Dongying Sag. The research shows that the wavelet transform technique based on logging data and the time-frequency analysis technique based on seismic data are essentially consistent, both of which divide sequence stratigraphy quantitatively in the frequency domain; wavelet transform technique has high resolutions. It is suitable for areas with wells. The seismic time-frequency analysis technique has wide applicability, but a low resolution. Both of the techniques should be combined; the upper Es4 in northern Minfeng zone of Dongying Sag is a complete set of third-order sequence, which can be further subdivided into 5 fourth-order sequences that has the depositional characteristics of fine-upward sequence in granularity. Key words: Dongying sag, northern Minfeng zone, wavelet transform technique, time-frequency analysis technique ,the upper Es4, sequence stratigraphy
Sequence-structure mapping errors in the PDB: OB-fold domains
Venclovas, Česlovas; Ginalski, Krzysztof; Kang, Chulhee
2004-01-01
The Protein Data Bank (PDB) is the single most important repository of structural data for proteins and other biologically relevant molecules. Therefore, it is critically important to keep the PDB data, as much as possible, error-free. In this study, we have analyzed PDB crystal structures possessing oligonucleotide/oligosaccharide binding (OB)-fold, one of the highly populated folds, for the presence of sequence-structure mapping errors. Using energy-based structure quality assessment coupled with sequence analyses, we have found that there are at least five OB-structures in the PDB that have regions where sequences have been incorrectly mapped onto the structure. We have demonstrated that the combination of these computation techniques is effective not only in detecting sequence-structure mapping errors, but also in providing guidance to correct them. Namely, we have used results of computational analysis to direct a revision of X-ray data for one of the PDB entries containing a fairly inconspicuous sequence-structure mapping error. The revised structure has been deposited with the PDB. We suggest use of computational energy assessment and sequence analysis techniques to facilitate structure determination when homologs having known structure are available to use as a reference. Such computational analysis may be useful in either guiding the sequence-structure assignment process or verifying the sequence mapping within poorly defined regions. PMID:15133161
Vlahovicek, K; Munteanu, M G; Pongor, S
1999-01-01
Bending is a local conformational micropolymorphism of DNA in which the original B-DNA structure is only distorted but not extensively modified. Bending can be predicted by simple static geometry models as well as by a recently developed elastic model that incorporate sequence dependent anisotropic bendability (SDAB). The SDAB model qualitatively explains phenomena including affinity of protein binding, kinking, as well as sequence-dependent vibrational properties of DNA. The vibrational properties of DNA segments can be studied by finite element analysis of a model subjected to an initial bending moment. The frequency spectrum is obtained by applying Fourier analysis to the displacement values in the time domain. This analysis shows that the spectrum of the bending vibrations quite sensitively depends on the sequence, for example the spectrum of a curved sequence is characteristically different from the spectrum of straight sequence motifs of identical basepair composition. Curvature distributions are genome-specific, and pronounced differences are found between protein-coding and regulatory regions, respectively, that is, sites of extreme curvature and/or bendability are less frequent in protein-coding regions. A WWW server is set up for the prediction of curvature and generation of 3D models from DNA sequences (http:@www.icgeb.trieste.it/dna).
van Koningsbruggen, Silvana; Gierliński, Marek; Schofield, Pietá; Martin, David; Barton, Geoffey J.; Ariyurek, Yavuz; den Dunnen, Johan T.
2010-01-01
The nuclear space is mostly occupied by chromosome territories and nuclear bodies. Although this organization of chromosomes affects gene function, relatively little is known about the role of nuclear bodies in the organization of chromosomal regions. The nucleolus is the best-studied subnuclear structure and forms around the rRNA repeat gene clusters on the acrocentric chromosomes. In addition to rDNA, other chromatin sequences also surround the nucleolar surface and may even loop into the nucleolus. These additional nucleolar-associated domains (NADs) have not been well characterized. We present here a whole-genome, high-resolution analysis of chromatin endogenously associated with nucleoli. We have used a combination of three complementary approaches, namely fluorescence comparative genome hybridization, high-throughput deep DNA sequencing and photoactivation combined with time-lapse fluorescence microscopy. The data show that specific sequences from most human chromosomes, in addition to the rDNA repeat units, associate with nucleoli in a reproducible and heritable manner. NADs have in common a high density of AT-rich sequence elements, low gene density and a statistically significant enrichment in transcriptionally repressed genes. Unexpectedly, both the direct DNA sequencing and fluorescence photoactivation data show that certain chromatin loci can specifically associate with either the nucleolus, or the nuclear envelope. PMID:20826608
van Koningsbruggen, Silvana; Gierlinski, Marek; Schofield, Pietá; Martin, David; Barton, Geoffey J; Ariyurek, Yavuz; den Dunnen, Johan T; Lamond, Angus I
2010-11-01
The nuclear space is mostly occupied by chromosome territories and nuclear bodies. Although this organization of chromosomes affects gene function, relatively little is known about the role of nuclear bodies in the organization of chromosomal regions. The nucleolus is the best-studied subnuclear structure and forms around the rRNA repeat gene clusters on the acrocentric chromosomes. In addition to rDNA, other chromatin sequences also surround the nucleolar surface and may even loop into the nucleolus. These additional nucleolar-associated domains (NADs) have not been well characterized. We present here a whole-genome, high-resolution analysis of chromatin endogenously associated with nucleoli. We have used a combination of three complementary approaches, namely fluorescence comparative genome hybridization, high-throughput deep DNA sequencing and photoactivation combined with time-lapse fluorescence microscopy. The data show that specific sequences from most human chromosomes, in addition to the rDNA repeat units, associate with nucleoli in a reproducible and heritable manner. NADs have in common a high density of AT-rich sequence elements, low gene density and a statistically significant enrichment in transcriptionally repressed genes. Unexpectedly, both the direct DNA sequencing and fluorescence photoactivation data show that certain chromatin loci can specifically associate with either the nucleolus, or the nuclear envelope.
Lücke, S; Xu, G L; Palfi, Z; Cross, M; Bellofatto, V; Bindereif, A
1996-01-01
In trypanosomes mRNAs are generated through trans splicing. The spliced leader (SL) RNA, which donates the 5'-terminal mini-exon to each of the protein coding exons, plays a central role in the trans splicing process. We have established in vivo assays to study in detail trans splicing, cap4 modification, and RNP assembly of the SL RNA in the trypanosomatid species Leptomonas seymouri. First, we found that extensive sequences within the mini-exon are required for SL RNA function in vivo, although a conserved length of 39 nt is not essential. In contrast, the intron sequence appears to be surprisingly tolerant to mutation; only the stem-loop II structure is indispensable. The asymmetry of the sequence requirements in the stem I region suggests that this domain may exist in different functional conformations. Second, distinct mini-exon sequences outside the modification site are important for efficient cap4 formation. Third, all SL RNA mutations tested allowed core RNP assembly, suggesting flexible requirements for core protein binding. In sum, the results of our mutational analysis provide evidence for a discrete domain structure of the SL RNA and help to explain the strong phylogenetic conservation of the mini-exon sequence and of the overall SL RNA secondary structure; they also suggest that there may be certain differences between trans splicing in nematodes and trypanosomes. This approach provides a basis for studying RNA-RNA interactions in the trans spliceosome. Images PMID:8861965
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gaul, Alexander; Holzinger, Dennis; Müglich, Nicolas David
A magnetic domain texture has been deterministically engineered in a topographically flat exchange-biased (EB) thin film system. The texture consists of long-range periodically arranged unit cells of four individual domains, characterized by individual anisotropies, individual geometry, and with non-collinear remanent magnetizations. The texture has been engineered by a sequence of light-ion bombardment induced magnetic patterning of the EB layer system. The magnetic texture's in-plane spatial magnetization distribution and the corresponding domain walls have been characterized by scanning electron microscopy with polarization analysis (SEMPA). The influence of magnetic stray fields emerging from neighboring domain walls and the influence of the differentmore » anisotropies of the adjacent domains on the Néel type domain wall core's magnetization rotation sense and widths were investigated. It is shown that the usual energy degeneracy of clockwise and counterclockwise rotating magnetization through the walls is revoked, suppressing Bloch lines along the domain wall. Estimates of the domain wall widths for different domain configurations based on material parameters determined by vibrating sample magnetometry were quantitatively compared to the SEMPA data.« less
Dixit, Radhika; Arakane, Yasuyuki; Specht, Charles A; Richard, Chad; Kramer, Karl J; Beeman, Richard W; Muthukrishnan, Subbaratnam
2008-04-01
A bioinformatics investigation of four insect species with annotated genome sequences identified a family of genes encoding chitin deacetylase (CDA)-like proteins, with five to nine members depending on the species. CDAs (EC 3.5.1.41) are chitin-modifying enzymes that deacetylate the beta-1,4-linked N-acetylglucosamine homopolymer. Partial deacetylation forms a heteropolysaccharide that also contains some glucosamine residues, while complete deacetylation produces the homopolymer chitosan, consisting exclusively of glucosamine. The genomes of the red flour beetle, Tribolium castaneum, the fruit fly, Drosophila melanogaster, the malaria mosquito, Anopheles gambiae, and the honey bee, Apis mellifera contain 9, 6, 5 and 5 genes, respectively, that encode proteins with a chitin deacetylase motif. The presence of alternative exons in two of the genes, TcCDA2 and TcCDA5, increases the protein diversity further. Insect CDA-like proteins were classified into five orthologous groups based on phylogenetic analysis and the presence of additional motifs. Group I enzymes include CDA1 and isoforms of CDA2, each containing in addition to a polysaccharide deacetylase-like catalytic domain, a chitin-binding peritrophin-A domain (ChBD) and a low-density lipoprotein receptor class A domain (LDLa). Group II is composed of CDA3 orthologs from each insect species with the same domain organization as group I CDAs, but differing substantially in sequence. Group III includes CDA4s, which have the ChBD domain but do not have the LDLa domain. Group IV comprises CDA5s, which are the largest CDAs because of a very long intervening region separating the ChBD and catalytic domains. Among the four insect species, Tribolium is unique in having four CDA genes in group V, whereas the other insect genomes have either one or none. Most of the CDA-like proteins have a putative signal peptide consistent with their role in modifying extracellular chitin in both cuticle and peritrophic membrane during morphogenesis and molting.
Morintides: cargo-free chitin-binding peptides from Moringa oleifera.
Kini, Shruthi G; Wong, Ka H; Tan, Wei Liang; Xiao, Tianshu; Tam, James P
2017-03-31
Hevein-like peptides are a family of cysteine-rich and chitin-binding peptides consisting of 29-45 amino acids. Their chitin-binding property is essential for plant defense against fungi. Based on the number of cysteine residues in their sequences, they are divided into three sub-families: 6C-, 8C- and 10C-hevein-like peptides. All three subfamilies contain a three-domain precursor comprising a signal peptide, a mature hevein-like peptide and a C-terminal domain comprising a hinge region with protein cargo in 8C- and 10C-hevein-like peptides. Here we report the isolation and characterization of two novel 8C-hevein-like peptides, designated morintides (mO1 and mO2), from the drumstick tree Moringa oleifera, a drought-resistant tree belonging to the Moringaceae family. Proteomic analysis revealed that morintides comprise 44 amino acid residues and are rich in cysteine, glycine and hydrophilic amino acid residues such as asparagine and glutamine. Morintides are resistant to thermal and enzymatic degradation, able to bind to chitin and inhibit the growth of phyto-pathogenic fungi. Transcriptomic analysis showed that they contain a three-domain precursor comprising an endoplasmic reticulum (ER) signal sequence, a mature peptide domain and a C-terminal domain. A striking feature distinguishing morintides from other 8C-hevein-like peptides is a short and protein-cargo-free C-terminal domain. Previously, a similar protein-cargo-free C-terminal domain has been observed only in ginkgotides, the 8C-hevein-like peptides from a gymnosperm Ginkgo biloba. Thus, morintides, with a cargo-free C-terminal domain, are a stand-alone class of 8C-hevein-like peptides from angiosperms. Our results expand the existing library of hevein-like peptides and shed light on molecular diversity within the hevein-like peptide family. Our work also sheds light on the anti-fungal activity and stability of 8C-hevein-like peptides.
Iyer, Lakshminarayan M; Burroughs, A Maxwell; Aravind, L
2006-01-01
Background Ubiquitin (Ub)-mediated signaling is one of the hallmarks of all eukaryotes. Prokaryotic homologs of Ub (ThiS and MoaD) and E1 ligases have been studied in relation to sulfur incorporation reactions in thiamine and molybdenum/tungsten cofactor biosynthesis. However, there is no evidence for entire protein modification systems with Ub-like proteins and deconjugation by deubiquitinating enzymes in prokaryotes. Hence, the evolutionary assembly of the eukaryotic Ub-signaling apparatus remains unclear. Results We systematically analyzed prokaryotic Ub-related β-grasp fold proteins using sensitive sequence profile searches and structural analysis. Consequently, we identified novel Ub-related proteins beyond the characterized ThiS, MoaD, TGS, and YukD domains. To understand their functional associations, we sought and recovered several conserved gene neighborhoods and domain architectures. These included novel associations involving diverse sulfur metabolism proteins, siderophore biosynthesis and the gene encoding the transfer mRNA binding protein SmpB, as well as domain fusions between Ub-like domains and PIN-domain related RNAses. Most strikingly, we found conserved gene neighborhoods in phylogenetically diverse bacteria combining genes for JAB domains (the primary de-ubiquitinating isopeptidases of the proteasomal complex), along with E1-like adenylating enzymes and different Ub-related proteins. Further sequence analysis of other conserved genes in these neighborhoods revealed several Ub-conjugating enzyme/E2-ligase related proteins. Genes for an Ub-like protein and a JAB domain peptidase were also found in the tail assembly gene cluster of certain caudate bacteriophages. Conclusion These observations imply that members of the Ub family had already formed strong functional associations with E1-like proteins, UBC/E2-related proteins, and JAB peptidases in the bacteria. Several of these Ub-like proteins and the associated protein families are likely to function together in signaling systems just as in eukaryotes. PMID:16859499
Manríquez, René A; Vera, Tamara; Villalba, Melina V; Mancilla, Alejandra; Vakharia, Vikram N; Yañez, Alejandro J; Cárcamo, Juan G
2017-01-31
The infectious pancreatic necrosis virus (IPNV) causes significant economic losses in Chilean salmon farming. For effective sanitary management, the IPNV strains present in Chile need to be fully studied, characterized, and constantly updated at the molecular level. In this study, 36 Chilean IPNV isolates collected over 6 years (2006-2011) from Salmo salar, Oncorhynchus mykiss, and Oncorhynchus kisutch were genotypically characterized. Salmonid samples were obtained from freshwater, estuary, and seawater sources from central, southern, and the extreme-south of Chile (35° to 53°S). Sequence analysis of the VP2 gene classified 10 IPNV isolates as genogroup 1 and 26 as genogroup 5. Analyses indicated a preferential, but not obligate, relationship between genogroup 5 isolates and S. salar infection. Fifteen genogroup 5 and nine genogroup 1 isolates presented VP2 gene residues associated with high virulence (i.e. Thr, Ala, and Thr at positions 217, 221, and 247, respectively). Four genogroup 5 isolates presented an oddly long VP5 deduced amino acid sequence (29.6 kDa). Analysis of the VP2 amino acid motifs associated with clinical and subclinical infections identified the clinical fingerprint in only genogroup 5 isolates; in contrast, the genogroup 1 isolates presented sequences predominantly associated with the subclinical fingerprint. Predictive analysis of VP5 showed an absence of transmembrane domains and plasma membrane tropism signals. WebLogo analysis of the VP5 BH domains revealed high identities with the marine birnavirus Y-6 and Japanese IPNV strain E1-S. Sequence analysis for putative 25 kDa proteins, coded by the ORF between VP2 and VP4, exhibited three putative nuclear localization sequences and signals of mitochondrial tropism in two isolates. This study provides important advances in updating the characterizations of IPNV strains present in Chile. The results from this study will help in identifying epidemiological links and generating specific biotechnological tools for controlling IPNV outbreaks in Chilean salmon farming.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kwon, Gihan; Kokhan, Oleksandr; Han, Ali
Amorphous thin film oxygen evolving catalysts, OECs, of first-row transition metals show promise to serve as self-assembling photoanode materials in solar-driven, photoelectrochemical `artificial leaf' devices. This report demonstrates the ability to use high-energy X-ray scattering and atomic pair distribution function analysis, PDF, to resolve structure in amorphous metal oxide catalyst films. The analysis is applied here to resolve domain structure differences induced by oxyanion substitution during the electrochemical assembly of amorphous cobalt oxide catalyst films, Co-OEC. PDF patterns for Co-OEC films formed using phosphate, Pi, methylphosphate, MPi, and borate, Bi, electrolyte buffers show that the resulting domains vary in sizemore » following the sequence Pi < MPi < Bi. The increases in domain size for CoMPi and CoBi were found to be correlated with increases in the contributions from bilayer and trilayer stacked domains having structures intermediate between those of the LiCoOO and CoO(OH) mineral forms. The lattice structures and offset stacking of adjacent layers in the partially stacked CoMPi and CoBi domains were best matched to those in the LiCoOO layered structure. The results demonstrate the ability of PDF analysis to elucidate features of domain size, structure, defect content and mesoscale organization for amorphous metal oxide catalysts that are not readily accessed by other X-ray techniques. Finally, PDF structure analysis is shown to provide a way to characterize domain structures in different forms of amorphous oxide catalysts, and hence provide an opportunity to investigate correlations between domain structure and catalytic activity.« less
Kwon, Gihan; Kokhan, Oleksandr; Han, Ali; ...
2015-12-01
Amorphous thin film oxygen evolving catalysts, OECs, of first-row transition metals show promise to serve as self-assembling photoanode materials in solar-driven, photoelectrochemical `artificial leaf' devices. This report demonstrates the ability to use high-energy X-ray scattering and atomic pair distribution function analysis, PDF, to resolve structure in amorphous metal oxide catalyst films. The analysis is applied here to resolve domain structure differences induced by oxyanion substitution during the electrochemical assembly of amorphous cobalt oxide catalyst films, Co-OEC. PDF patterns for Co-OEC films formed using phosphate, Pi, methylphosphate, MPi, and borate, Bi, electrolyte buffers show that the resulting domains vary in sizemore » following the sequence Pi < MPi < Bi. The increases in domain size for CoMPi and CoBi were found to be correlated with increases in the contributions from bilayer and trilayer stacked domains having structures intermediate between those of the LiCoOO and CoO(OH) mineral forms. The lattice structures and offset stacking of adjacent layers in the partially stacked CoMPi and CoBi domains were best matched to those in the LiCoOO layered structure. The results demonstrate the ability of PDF analysis to elucidate features of domain size, structure, defect content and mesoscale organization for amorphous metal oxide catalysts that are not readily accessed by other X-ray techniques. Finally, PDF structure analysis is shown to provide a way to characterize domain structures in different forms of amorphous oxide catalysts, and hence provide an opportunity to investigate correlations between domain structure and catalytic activity.« less
Goyal, K; Browne, J A; Burnell, A M; Tunnacliffe, A
2005-06-01
Accumulation of the non-reducing disaccharide trehalose is associated with desiccation tolerance during anhydrobiosis in a number of invertebrates, but there is little information on trehalose biosynthetic genes in these organisms. We have identified two trehalose-6-phosphate synthase (tps) genes in the anhydrobiotic nematode Aphelenchus avenae and determined full length cDNA sequences for both; for comparison, full length tps cDNAs from the model nematode, Caenorhabditis elegans, have also been obtained. The A. avenae genes encode very similar proteins containing the catalytic domain characteristic of the GT-20 family of glycosyltransferases and are most similar to tps-2 of C. elegans; no evidence was found for a gene in A. avenae corresponding to Ce-tps-1. Analysis of A. avenae tps cDNAs revealed several features of interest, including alternative trans-splicing of spliced leader sequences in Aav-tps-1, and four different, novel SL1-related trans-spliced leaders, which were different to the canonical SL1 sequence found in all other nematodes studied. The latter observation suggests that A. avenae does not comply with the strict evolutionary conservation of SL1 sequences observed in other species. Unusual features were also noted in predicted nematode TPS proteins, which distinguish them from homologues in other higher eukaryotes (plants and insects) and in micro-organisms. Phylogenetic analysis confirmed their membership of the GT-20 glycosyltransferase family, but indicated an accelerated rate of molecular evolution. Furthermore, nematode TPS proteins possess N- and C-terminal domains, which are unrelated to those of other eukaryotes: nematode C-terminal domains, for example, do not contain trehalose-6-phosphate phosphatase-like sequences, as seen in plant and insect homologues. During onset of anhydrobiosis, both tps genes in A. avenae are upregulated, but exposure to cold or increased osmolarity also results in gene induction, although to a lesser extent. Trehalose seems likely therefore to play a role in a number of stress responses in nematodes.
In silico analysis of the polygalacturonase inhibiting protein 1 from apple, Malus domestica.
Matsaunyane, Lerato Bt; Oelofse, Dean; Dubery, Ian A
2015-03-11
The Malus domestica polygalacturonase inhibiting protein 1 (MdPGIP1) gene, encoding the M. domestica polygalacturonase inhibiting protein 1 (MdPGIP1), was isolated from the Granny Smith apple cultivar (GenBank accession no. DQ185063). The gene was used to transform tobacco and potato for enhanced resistance against fungal diseases. Analysis of the MdPGIP1 nucleotide sequence revealed that the gene comprises 993 nucleotides that encode a 330 amino acid polypeptide. In silico characterization of the MdPGIP1 polypeptide revealed domains typical of PGIP proteins, which include a 24 amino acid putative signal peptide, a potential cleavage site [Alanine-Leucine-Serine (ALS)] for the signal peptide, a 238 amino acid leucine-rich repeat (LRR) domain, a 46 amino acid N-terminal domain and a 22 amino acid C-terminal domain. The hydropathic evaluation of MdPGIP1 indicated a repetitive hydrophobic motif in the LRR domain and a hydrophilic surface area consistent with a globular protein. The typical consensus glycosylation sequence of Asn-X-Ser/Thr was identified in MdPGIP1, indicating potential N-linked glycosylation of MdPGIP1. The molecular mass of non-glycosylated MdPGIP1 was calculated as 36.615 kDa and the theoretical isoelectric point as 6.98. Furthermore, the secondary and tertiary structure of MdPGIP1 was modelled, and revealed that MdPGIP1 is a curved and elongated molecule that contains sheet B1, sheet B2 and 310-helices on its LRR domain. The overall properties of the MdPGIP1 protein is similar to that of the prototypical Phaseolus vulgaris PGIP 2 (PvPGIP2), and the detected differences supported its use in biotechnological applications as an inhibitor of targeted fungal polygalacturonases (PGs).
Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi
2014-09-18
Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.
Kontogiannatos, Dimitrios; Gkouvitsas, Theodoros; Kourti, Anna
2017-06-01
To obtain clues to the link between the molecular mechanism of circadian and photoperiod clocks, we have cloned the circadian clock gene cycle (Sncyc) in the corn stalk borer, Sesamia nonagrioides, which undergoes facultative diapause controlled by photoperiod. Sequence analysis revealed a high degree of conservation among insects for this gene. SnCYC consists of 667 amino acids and structural analysis showed that it contains a BCTR domain in its C-terminal in addition to the common domains found in Drosophila CYC, i.e. bHLH, PAS-A, PAS-B domains. The results revealed that the sequence of Sncyc showed a similarity to that of its mammalian orthologue, Bmal1. We also investigated the expression patterns of Sncyc in the brain of larvae growing under long-day 16L: 8D (LD), constant darkness (DD) and short-day 10L: 14D (SD) conditions using qRT-PCR assays. The mRNAs of Sncyc expression was rhythmic in LD, DD and SD cycles. Also, it is remarkable that the photoperiodic conditions affect the expression patterns and/or amplitudes of circadian clock gene Sncyc. This gene is associated with diapause in S. nonagrioides, because under SD (diapause conditions) the photoperiodic signal altered mRNA accumulation. Sequence and expression analysis of cyc in S. nonagrioides shows interesting differences compared to Drosophila where this gene does not oscillate or change in expression patterns in response to photoperiod, suggesting that this species is an interesting new model to study the molecular control of insect circadian and photoperiodic clocks. Copyright © 2017 Elsevier Inc. All rights reserved.
Dynamics of domain coverage of the protein sequence universe.
Rekapalli, Bhanu; Wuichet, Kristin; Peterson, Gregory D; Zhulin, Igor B
2012-11-16
The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its "dark matter". Here we suggest that true size of "dark matter" is much larger than stated by current definitions. We propose an approach to reducing the size of "dark matter" by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of "dark matter"; however, its absolute size increases substantially with the growth of sequence data.
Van Holle, Sofie; Rougé, Pierre; Van Damme, Els J M
2017-03-01
The Nictaba family groups all proteins that show homology to Nictaba, the tobacco lectin. So far, Nictaba and an Arabidopsis thaliana homologue have been shown to be implicated in the plant stress response. The availability of more than 50 sequenced plant genomes provided the opportunity for a genome-wide identification of Nictaba -like genes in 15 species, representing members of the Fabaceae, Poaceae, Solanaceae, Musaceae, Arecaceae, Malvaceae and Rubiaceae. Additionally, phylogenetic relationships between the different species were explored. Furthermore, this study included domain organization analysis, searching for orthologous genes in the legume family and transcript profiling of the Nictaba -like lectin genes in soybean. Using a combination of BLASTp, InterPro analysis and hidden Markov models, the genomes of Medicago truncatula , Cicer arietinum , Lotus japonicus , Glycine max , Cajanus cajan , Phaseolus vulgaris , Theobroma cacao , Solanum lycopersicum , Solanum tuberosum , Coffea canephora , Oryza sativa , Zea mays, Sorghum bicolor , Musa acuminata and Elaeis guineensis were searched for Nictaba -like genes. Phylogenetic analysis was performed using RAxML and additional protein domains in the Nictaba-like sequences were identified using InterPro. Expression analysis of the soybean Nictaba -like genes was investigated using microarray data. Nictaba -like genes were identified in all studied species and analysis of the duplication events demonstrated that both tandem and segmental duplication contributed to the expansion of the Nictaba gene family in angiosperms. The single-domain Nictaba protein and the multi-domain F-box Nictaba architectures are ubiquitous among all analysed species and microarray analysis revealed differential expression patterns for all soybean Nictaba-like genes. Taken together, the comparative genomics data contributes to our understanding of the Nictaba -like gene family in species for which the occurrence of Nictaba domains had not yet been investigated. Given the ubiquitous nature of these genes, they have probably acquired new functions over time and are expected to take on various roles in plant development and defence. © The Author 2017. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Tjhung, Katrina F; Deiss, Frédérique; Tran, Jessica; Chou, Ying; Derda, Ratmir
2015-01-01
In this paper, we describe multivalent display of peptide and protein sequences typically censored from traditional N-terminal display on protein pIII of filamentous bacteriophage M13. Using site-directed mutagenesis of commercially available M13KE phage cloning vector, we introduced sites that permit efficient cloning using restriction enzymes between domains N1 and N2 of the pIII protein. As infectivity of phage is directly linked to the integrity of the connection between N1 and N2 domains, intra-domain phage display (ID-PhD) allows for simple quality control of the display and the natural variations in the displayed sequences. Additionally, direct linkage to phage propagation allows efficient monitoring of sequence cleavage, providing a convenient system for selection and evolution of protease-susceptible or protease-resistant sequences. As an example of the benefits of such an ID-PhD system, we displayed a negatively charged FLAG sequence, which is known to be post-translationally excised from pIII when displayed on the N-terminus, as well as positively charged sequences which suppress production of phage when displayed on the N-terminus. ID-PhD of FLAG exhibited sub-nanomolar apparent Kd suggesting multivalent nature of the display. A TEV-protease recognition sequence (TEVrs) co-expressed in tandem with FLAG, allowed us to demonstrate that 99.9997% of the phage displayed the FLAG-TEVrs tandem and can be recognized and cleaved by TEV-protease. The residual 0.0003% consisted of phage clones that have excised the insert from their genome. ID-PhD is also amenable to display of protein mini-domains, such as the 33-residue minimized Z-domain of protein A. We show that it is thus possible to use ID-PhD for multivalent display and selection of mini-domain proteins (Affibodies, scFv, etc.).
A novel class of dual-family immunophilins.
Adams, Brian; Musiyenko, Alla; Kumar, Rajinder; Barik, Sailen
2005-07-01
Immunophilins are protein chaperones with peptidylprolyl isomerase activity that belong to one of two large families, the cyclosporin-binding cyclophilins (CyPs) and the FK506-binding proteins (FKBPs). Each family displays characteristic and conserved sequence features that differ between the two families. We report a novel group of dual-family immunophilins that contain both CyP and FKBP domains for which we propose the name FCBP (FK506- and cyclosporin-binding protein). The FCBP of Toxoplasma gondii, a protozoan parasite, contained N-terminal FKBP and C-terminal CyP domains joined by tetratricopeptide repeats. Structure-function analysis revealed that both domains were functional and exhibited family-specific drug sensitivity. The individual domains of FCBP inhibited calcineurin (protein phosphatase 2B) in the presence of the appropriate drugs. In binding studies, FCBP recruited calcineurin in the presence of FK506 and a putative target of rapamycin homolog in the presence of rapamycin. Two additional FCBP sequences in Flavobacterium and one in Treponema (spirochete) were also identified in which the CyP and FKBP domains were in the reverse order. T. gondii growth was inhibited by cyclosporin and FK506 in a moderately synergistic manner. The knockdown of FCBP by RNA interference revealed its essentiality for T. gondii growth. Clearly, the FCBPs are novel chaperones and potential targets of multiple immunosuppressant drugs.
Peterson, Thomas A; Nehrt, Nathan L; Park, DoHwan
2012-01-01
Background and objective With recent breakthroughs in high-throughput sequencing, identifying deleterious mutations is one of the key challenges for personalized medicine. At the gene and protein level, it has proven difficult to determine the impact of previously unknown variants. A statistical method has been developed to assess the significance of disease mutation clusters on protein domains by incorporating domain functional annotations to assist in the functional characterization of novel variants. Methods Disease mutations aggregated from multiple databases were mapped to domains, and were classified as either cancer- or non-cancer-related. The statistical method for identifying significantly disease-associated domain positions was applied to both sets of mutations and to randomly generated mutation sets for comparison. To leverage the known function of protein domain regions, the method optionally distributes significant scores to associated functional feature positions. Results Most disease mutations are localized within protein domains and display a tendency to cluster at individual domain positions. The method identified significant disease mutation hotspots in both the cancer and non-cancer datasets. The domain significance scores (DS-scores) for cancer form a bimodal distribution with hotspots in oncogenes forming a second peak at higher DS-scores than non-cancer, and hotspots in tumor suppressors have scores more similar to non-cancers. In addition, on an independent mutation benchmarking set, the DS-score method identified mutations known to alter protein function with very high precision. Conclusion By aggregating mutations with known disease association at the domain level, the method was able to discover domain positions enriched with multiple occurrences of deleterious mutations while incorporating relevant functional annotations. The method can be incorporated into translational bioinformatics tools to characterize rare and novel variants within large-scale sequencing studies. PMID:22319177
Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon
2011-01-01
Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. Conclusion The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns. PMID:21599934
Near-Complete Genome Sequence of a Novel Single-Stranded RNA Virus Discovered in Indoor Air.
Rosario, Karyna; Fierer, Noah; Breitbart, Mya
2018-03-22
Viral metagenomic analysis of heating, ventilation, and air conditioning (HVAC) filters recovered the near-complete genome sequence of a novel virus, named HVAC-associated R NA v irus 1 (HVAC-RV1). The HVAC-RV1 genome is most similar to those of picorna-like viruses identified in arthropods but encodes a small domain observed only in negative-sense single-stranded RNA viruses. Copyright © 2018 Rosario et al.
Rand, Tim A.; Ginalski, Krzysztof; Grishin, Nick V.; Wang, Xiaodong
2004-01-01
RNA interference is carried out by the small double-stranded RNA-induced silencing complex (RISC). The RISC-bound small RNA guides the RISC complex to identify and cleave mRNAs with complementary sequences. The proteins that make up the RISC complex and cleave mRNA have not been unequivocally defined. Here, we report the biochemical purification of RISC activity to homogeneity from Drosophila Schnieder 2 cell extracts. Argonaute 2 (Ago-2) is the sole protein component present in the purified, functional RISC. By using a bioinformatics method that combines sequence-profile analysis with predicted protein secondary structure, we found homology between the PIWI domain of Ago-2 and endonuclease V and identified potential active-site amino acid residues within the PIWI domain of Ago-2. PMID:15452342
Rand, Tim A; Ginalski, Krzysztof; Grishin, Nick V; Wang, Xiaodong
2004-10-05
RNA interference is carried out by the small double-stranded RNA-induced silencing complex (RISC). The RISC-bound small RNA guides the RISC complex to identify and cleave mRNAs with complementary sequences. The proteins that make up the RISC complex and cleave mRNA have not been unequivocally defined. Here, we report the biochemical purification of RISC activity to homogeneity from Drosophila Schnieder 2 cell extracts. Argonaute 2 (Ago-2) is the sole protein component present in the purified, functional RISC. By using a bioinformatics method that combines sequence-profile analysis with predicted protein secondary structure, we found homology between the PIWI domain of Ago-2 and endonuclease V and identified potential active-site amino acid residues within the PIWI domain of Ago-2.
Goad, David M; Zhu, Chuanmei; Kellogg, Elizabeth A
2017-10-01
CLV3/ESR (CLE) proteins are important signaling peptides in plants. The short CLE peptide (12-13 amino acids) is cleaved from a larger pre-propeptide and functions as an extracellular ligand. The CLE family is large and has resisted attempts at classification because the CLE domain is too short for reliable phylogenetic analysis and the pre-propeptide is too variable. We used a model-based search for CLE domains from 57 plant genomes and used the entire pre-propeptide for comprehensive clustering analysis. In total, 1628 CLE genes were identified in land plants, with none recognizable from green algae. These CLEs form 12 groups within which CLE domains are largely conserved and pre-propeptides can be aligned. Most clusters contain sequences from monocots, eudicots and Amborella trichopoda, with sequences from Picea abies, Selaginella moellendorffii and Physcomitrella patens scattered in some clusters. We easily identified previously known clusters involved in vascular differentiation and nodulation. In addition, we found a number of discrete groups whose function remains poorly characterized. Available data indicate that CLE proteins within a cluster are likely to share function, whereas those from different clusters play at least partially different roles. Our analysis provides a foundation for future evolutionary and functional studies. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
Frazier, Taylor P.; Palmer, Nathan A.; Xie, Fuliang; ...
2016-11-08
Switchgrass ( Panicum virgatum L.) is a warm-season perennial grass that can be used as a second generation bioenergy crop. However, foliar fungal pathogens, like switchgrass rust, have the potential to significantly reduce switchgrass biomass yield. Despite its importance as a prominent bioenergy crop, a genome-wide comprehensive analysis of NB-LRR disease resistance genes has yet to be performed in switchgrass. In this study, we used a homology-based computational approach to identify 1011 potential NB-LRR resistance gene homologs (RGHs) in the switchgrass genome (v 1.1). In addition, we identified 40 RGHs that potentially contain unique domains including major sperm protein domain,more » jacalin-like binding domain, calmodulin-like binding, and thioredoxin. RNA-sequencing analysis of leaf tissue from ‘Alamo’, a rust-resistant switchgrass cultivar, and ‘Dacotah’, a rust-susceptible switchgrass cultivar, identified 2634 high quality variants in the RGHs between the two cultivars. RNA-sequencing data from field-grown cultivar ‘Summer’ plants indicated that the expression of some of these RGHs was developmentally regulated. Our results provide useful insight into the molecular structure, distribution, and expression patterns of members of the NB-LRR gene family in switchgrass. These results also provide a foundation for future work aimed at elucidating the molecular mechanisms underlying disease resistance in this important bioenergy crop.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Frazier, Taylor P.; Palmer, Nathan A.; Xie, Fuliang
Switchgrass ( Panicum virgatum L.) is a warm-season perennial grass that can be used as a second generation bioenergy crop. However, foliar fungal pathogens, like switchgrass rust, have the potential to significantly reduce switchgrass biomass yield. Despite its importance as a prominent bioenergy crop, a genome-wide comprehensive analysis of NB-LRR disease resistance genes has yet to be performed in switchgrass. In this study, we used a homology-based computational approach to identify 1011 potential NB-LRR resistance gene homologs (RGHs) in the switchgrass genome (v 1.1). In addition, we identified 40 RGHs that potentially contain unique domains including major sperm protein domain,more » jacalin-like binding domain, calmodulin-like binding, and thioredoxin. RNA-sequencing analysis of leaf tissue from ‘Alamo’, a rust-resistant switchgrass cultivar, and ‘Dacotah’, a rust-susceptible switchgrass cultivar, identified 2634 high quality variants in the RGHs between the two cultivars. RNA-sequencing data from field-grown cultivar ‘Summer’ plants indicated that the expression of some of these RGHs was developmentally regulated. Our results provide useful insight into the molecular structure, distribution, and expression patterns of members of the NB-LRR gene family in switchgrass. These results also provide a foundation for future work aimed at elucidating the molecular mechanisms underlying disease resistance in this important bioenergy crop.« less
Structure and stability of the ankyrin domain of the Drosophila Notch receptor.
Zweifel, Mark E; Leahy, Daniel J; Hughson, Frederick M; Barrick, Doug
2003-11-01
The Notch receptor contains a conserved ankyrin repeat domain that is required for Notch-mediated signal transduction. The ankyrin domain of Drosophila Notch contains six ankyrin sequence repeats previously identified as closely matching the ankyrin repeat consensus sequence, and a putative seventh C-terminal sequence repeat that exhibits lower similarity to the consensus sequence. To better understand the role of the Notch ankyrin domain in Notch-mediated signaling and to examine how structure is distributed among the seven ankyrin sequence repeats, we have determined the crystal structure of this domain to 2.0 angstroms resolution. The seventh, C-terminal, ankyrin sequence repeat adopts a regular ankyrin fold, but the first, N-terminal ankyrin repeat, which contains a 15-residue insertion, appears to be largely disordered. The structure reveals a substantial interface between ankyrin polypeptides, showing a high degree of shape and charge complementarity, which may be related to homotypic interactions suggested from indirect studies. However, the Notch ankyrin domain remains largely monomeric in solution, demonstrating that this interface alone is not sufficient to promote tight association. Using the structure, we have classified reported mutations within the Notch ankyrin domain that are known to disrupt signaling into those that affect buried residues and those restricted to surface residues. We show that the buried substitutions greatly decrease protein stability, whereas the surface substitutions have only a marginal affect on stability. The surface substitutions are thus likely to interfere with Notch signaling by disrupting specific Notch-effector interactions and map the sites of these interactions.
Mittal, Anuradha; Holehouse, Alex S; Cohan, Megan C; Pappu, Rohit V
2018-05-12
Intrinsically disordered proteins and regions (IDPs / IDRs) are characterized by well-defined sequence-to-conformation relationships (SCRs). These relationships refer to the sequence-specific preferences for average sizes, shapes, residue-specific secondary structure propensities, and amplitudes of multiscale conformational fluctuations. SCRs are discerned from the sequence-specific conformational ensembles of IDPs. A vast majority of IDPs are actually tethered to folded domains (FDs). This raises the question of whether or not SCRs inferred for IDPs are applicable to IDRs tethered to folded domains. Here, we use atomistic simulations based on a well-established forcefield paradigm and an enhanced sampling method to obtain comparative assessments of SCRs for thirteen archetypal IDRs modeled as autonomous units, as C-terminal tails connected to folded domains, and as linkers between pairs of folded domains. Our studies uncover a set of general observations regarding context-independent versus context-dependent SCRs of IDRs. SCRs are minimally perturbed upon tethering to folded domains if the IDRs are deficient in charged residues and for polyampholytic IDRs where the oppositely charged residues within the sequence of the IDR are separated into distinct blocks. In contrast, the interplay between IDRs and tethered folded domains has a significant modulatory effect on SCRs if the IDRs have intermediate fractions of charged residues or if they have sequence-intrinsic conformational preferences for canonical random coils. Our findings suggest that IDRs with context-independent SCRs might be independent evolutionary modules whereas IDRs with context-dependent intrinsic SCRs might co-evolve with the FDs to which they are tethered. Copyright © 2018. Published by Elsevier Ltd.
Aravind, Penmatsa; Wistow, Graeme; Sharma, Yogendra; Sankaranarayanan, Rajan
2008-01-01
βγ-Crystallins belong to a superfamily of proteins in prokaryotes and eukaryotes that are based on duplications of a characteristic, highly conserved Greek Key motif. Most members of the superfamily in vertebrates are structural proteins of the eye lens that contain four motifs arranged as two structural domains. Absent in melanoma-1 (AIM1), an unusual member of the superfamily whose expression is associated with suppression of malignancy in melanoma, contains 12 βγ-crystallin motifs in six domains. Some of these motifs diverge considerably from the canonical motif sequence. AIM1g1, the first βγ-crystallin domain of AIM1, is the most variant of βγ-crystallin domains currently known. In order to understand the limits of sequence variation on the structure, we report the crystal structure of AIM1g1 at 1.9Å resolution. In spite of having changes in key residues, the domain retains the overall βγ-crystallin fold. The domain also contains an unusual extended surface loop that significantly alters the shape of the domain and its charge profile. This structure illustrates the resilience of the βγ fold to considerable sequence changes and its remarkable ability to adapt for novel functions. PMID:18582473
Moll, R; Schmidtke, S; Schäfer, G
1999-01-01
In this study we provide, for the first time, experimental evidence that a protein homologous to bacterial Ffh is part of an SRP-like ribonucleoprotein complex in hyperthermophilic archaea. The gene encoding the Ffh homologue in the hyperthermophilic archaeote Acidianus ambivalens has been cloned and sequenced. Recombinant Ffh protein was expressed in E. coli and subjected to biochemical and functional studies. A. ambivalens Ffh encodes a 50.4-kDa protein that is structured by three distinct regions: the N-terminal hydrophilic N-region (N), the GTP/GDP-binding domain (G) and a C-terminal located C-domain (C). The A. ambivalens Ffh sequence shares 44-46% sequence similarity with Ffh of methanogenic archaea, 34-36% similarity with eukaryal SRP54 and 30-34% similarity with bacterial Ffh. A polyclonal antiserum raised against the first two domains of A. ambivalens Ffh reacts specifically with a single protein (apparent molecular mass: 46 kDa, termed p46) present in cytosolic and in plasmamembrane cell fractions of A. ambivalens. Recombinant Ffh has a melting point of tm = 89 degreesC. Its intrinsic GTPase activity obviously depends on neutral pH and low ionic strength with a preference for chloride and acetate salts. Highest rates of GTP hydrolysis have been achieved at 81 degreesC in presence of 0.1-1 mm Mg2+. GTP hydrolysis is significantly inhibited by high glycerol concentrations, and the GTP hydrolysis rate also markedly decreases by addition of detergents. The Km for GTP is 13.7 microm at 70 degreesC and GTP hydrolysis is strongly inhibited by GDP (Ki = 8 microm). A. ambivalens Ffh, which includes an RNA-binding motif in the C-terminal domain, is shown to bind specifically to 7S RNA of the related crenarchaeote Sulfolobus solfataricus. Comparative sequence analysis reveals the presence of typical signal sequences in plasma membrane as well as extracellular proteins of hyperthermophilic crenarchaea which strongly supposes recognition events by an Ffh containing SRP-like particle in these organisms.
NASA Technical Reports Server (NTRS)
Hsieh, H. L.; Tong, C. G.; Thomas, C.; Roux, S. J.
1996-01-01
A CDNA encoding a 47 kDa nucleoside triphosphatase (NTPase) that is associated with the chromatin of pea nuclei has been cloned and sequenced. The translated sequence of the cDNA includes several domains predicted by known biochemical properties of the enzyme, including five motifs characteristic of the ATP-binding domain of many proteins, several potential casein kinase II phosphorylation sites, a helix-turn-helix region characteristic of DNA-binding proteins, and a potential calmodulin-binding domain. The deduced primary structure also includes an N-terminal sequence that is a predicted signal peptide and an internal sequence that could serve as a bipartite-type nuclear localization signal. Both in situ immunocytochemistry of pea plumules and immunoblots of purified cell fractions indicate that most of the immunodetectable NTPase is within the nucleus, a compartment proteins typically reach through nuclear pores rather than through the endoplasmic reticulum pathway. The translated sequence has some similarity to that of human lamin C, but not high enough to account for the earlier observation that IgG against human lamin C binds to the NTPase in immunoblots. Northern blot analysis shows that the NTPase MRNA is strongly expressed in etiolated plumules, but only poorly or not at all in the leaf and stem tissues of light-grown plants. Accumulation of NTPase mRNA in etiolated seedlings is stimulated by brief treatments with both red and far-red light, as is characteristic of very low-fluence phytochrome responses. Southern blotting with pea genomic DNA indicates the NTPase is likely to be encoded by a single gene.
Ahmed, Md Atique; Fauzi, Muh; Han, Eun-Taek
2018-03-14
Human infections due to the monkey malaria parasite Plasmodium knowlesi is on the rise in most Southeast Asian countries specifically Malaysia. The C-terminal 19 kDa domain of PvMSP1P is a potential vaccine candidate, however, no study has been conducted in the orthologous gene of P. knowlesi. This study investigates level of polymorphisms, haplotypes and natural selection of full-length pkmsp1p in clinical samples from Malaysia. A total of 36 full-length pkmsp1p sequences along with the reference H-strain and 40 C-terminal pkmsp1p sequences from clinical isolates of Malaysia were downloaded from published genomes. Genetic diversity, polymorphism, haplotype and natural selection were determined using DnaSP 5.10 and MEGA 5.0 software. Genealogical relationships were determined using haplotype network tree in NETWORK software v5.0. Population genetic differentiation index (F ST ) and population structure of parasite was determined using Arlequin v3.5 and STRUCTURE v2.3.4 software. Comparison of 36 full-length pkmsp1p sequences along with the H-strain identified 339 SNPs (175 non-synonymous and 164 synonymous substitutions). The nucleotide diversity across the full-length gene was low compared to its ortholog pvmsp1p. The nucleotide diversity was higher toward the N-terminal domains (pkmsp1p-83 and 30) compared to the C-terminal domains (pkmsp1p-38, 33 and 19). Phylogenetic analysis of full-length genes identified 2 distinct clusters of P. knowlesi from Malaysian Borneo. The 40 pkmsp1p-19 sequences showed low polymorphisms with 16 polymorphisms leading to 18 haplotypes. In total there were 10 synonymous and 6 non-synonymous substitutions and 12 cysteine residues were intact within the two EGF domains. Evidence of strong purifying selection was observed within the full-length sequences as well in all the domains. Shared haplotypes of 40 pkmsp1p-19 were identified within Malaysian Borneo haplotypes. This study is the first to report on the genetic diversity and natural selection of pkmsp1p. A low level of genetic diversity and strong evidence of negative selection was detected and observed in all the domains of pkmsp1p of P. knowlesi indicating functional constrains. Shared haplotypes were identified within pkmsp1p-19 highlighting further evaluation using larger number of clinical samples from Malaysia.
Müller, M; Schnitzler, P; Koonin, E V; Darai, G
1995-05-01
Cytoplasmic DNA viruses encode a DNA-dependent RNA polymerase (DdRP) that is essential for transcription of viral genes. The amino acid sequences of the known largest subunits of DdRPs from different species contain highly conserved regions. Oligonucleotide primers, deduced from two conserved domains (RQP[T/S]LH and NADFDGDE) were used for detecting the corresponding gene of fish lymphocystis disease virus (FLCDV), a member of the family Iridoviridae, which replicates in the cytoplasm of infected cells of flatfish. The gene coding for the largest subunit of the DdRP was identified using a PCR-derived probe. The screening of the complete EcoRI gene library of the viral genome led to the identification of the gene locus of the largest subunit of the DdRP within the EcoRI DNA fragment B (12.4 kbp, 0.034 to 0.165 map units). The nucleotide sequence of a part (8334 bp) of the EcoRI DNA fragment B was determined and a large ORF on the lower strand (ATG = 5787; TAA = 2190) was detected which encodes a protein of 1199 amino acids. Comparison of the amino acid sequences of the largest subunits of the DdRP (RPO1) of FLCDV and Chilo iridescent virus (CIV) revealed a dramatic difference in their domain organization. Unlike the 1051 aa RPO1 of CIV, which lacks the C-terminal domain conserved in eukaryotic, eubacterial and other viral RNA polymerases, the 1199 aa RPO1 of FLCDV is fully collinear with its cellular and viral homologues. Despite this difference, comparative analysis of the amino acid sequences of viral and cellular RNA polymerases suggests a common origin for the largest RNA polymerase subunits of FLCDV and CIV.
Dolichol phosphate mannose synthase: a Glycosyltransferase with Unity in molecular diversities.
Banerjee, Dipak K; Zhang, Zhenbo; Baksi, Krishna; Serrano-Negrón, Jesús E
2017-08-01
N-glycans provide structural and functional stability to asparagine-linked (N-linked) glycoproteins, and add flexibility. Glycan biosynthesis is elaborative, multi-compartmental and involves many glycosyltransferases. Failure to assemble N-glycans leads to phenotypic changes developing infection, cancer, congenital disorders of glycosylation (CDGs) among others. Biosynthesis of N-glycans begins at the endoplasmic reticulum (ER) with the assembly of dolichol-linked tetra-decasaccharide (Glc 3 Man 9 GlcNAc 2 -PP-Dol) where dolichol phosphate mannose synthase (DPMS) plays a central role. DPMS is also essential for GPI anchor biosynthesis as well as for O- and C-mannosylation of proteins in yeast and in mammalian cells. DPMS has been purified from several sources and its gene has been cloned from 39 species (e.g., from protozoan parasite to human). It is an inverting GT-A folded enzyme and classified as GT2 by CAZy (carbohydrate active enZyme; http://www.cazy.org ). The sequence alignment detects the presence of a metal binding DAD signature in DPMS from all 39 species but finds cAMP-dependent protein phosphorylation motif (PKA motif) in only 38 species. DPMS also has hydrophobic region(s). Hydropathy analysis of amino acid sequences from bovine, human, S. crevisiae and A. thaliana DPMS show PKA motif is present between the hydrophobic domains. The location of PKA motif as well as the hydrophobic domain(s) in the DPMS sequence vary from species to species. For example, the domain(s) could be located at the center or more towards the C-terminus. Irrespective of their catalytic similarity, the DNA sequence, the amino acid identity, and the lack of a stretch of hydrophobic amino acid residues at the C-terminus, DPMS is still classified as Type I and Type II enzyme. Because of an apparent bio-sensing ability, extracellular signaling and microenvironment regulate DPMS catalytic activity. In this review, we highlight some important features and the molecular diversities of DPMS.
Cvicek, Vaclav; Goddard, William A.; Abrol, Ravinder
2016-01-01
The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs. These residues can be used to make testable hypotheses about the structural basis of receptor function and about the molecular basis of disease-associated single nucleotide polymorphisms. PMID:27028541
Xu, Qifang; Dunbrack, Roland L
2012-11-01
Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM-HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly.
Odronitz, Florian; Kollmar, Martin
2006-11-29
Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein.
A generalized analysis of hydrophobic and loop clusters within globular protein sequences
Eudes, Richard; Le Tuan, Khanh; Delettré, Jean; Mornon, Jean-Paul; Callebaut, Isabelle
2007-01-01
Background Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. However, its efficiency and application are currently limited by the need of user expertise. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. This analysis has been extended to loop-forming clusters, using an appropriate loop alphabet. Results The structural behavior of hydrophobic cluster species, which are typical of protein globular domains, was investigated within banks of experimental structures, considered at different levels of sequence redundancy. The 294 more frequent hydrophobic cluster species were analyzed with regard to their association with the different secondary structures (frequencies of association with secondary structures and secondary structure propensities). Hydrophobic cluster species are predominantly associated with regular secondary structures, and a large part (60 %) reveals preferences for α-helices or β-strands. Moreover, the analysis of the hydrophobic cluster amino acid composition generally allows for finer prediction of the regular secondary structure associated with the considered cluster within a cluster species. We also investigated the behavior of loop forming clusters, using a "PGDNS" alphabet. These loop clusters do not overlap with hydrophobic clusters and are highly associated with coils. Finally, the structural information contained in the hydrophobic structural words, as deduced from experimental structures, was compared to the PSI-PRED predictions, revealing that β-strands and especially α-helices are generally over-predicted within the limits of typical β and α hydrophobic clusters. Conclusion The dictionary of hydrophobic clusters described here can help the HCA user to interpret and compare the HCA plots of globular protein sequences, as well as provides an original fundamental insight into the structural bricks of protein folds. Moreover, the novel loop cluster analysis brings additional information for secondary structure prediction on the whole sequence through a generalized cluster analysis (GCA), and not only on regular secondary structures. Such information lays the foundations for developing a new and original tool for secondary structure prediction. PMID:17210072
Phylogenetic relationship of Ornithobacterium rhinotracheale strains.
DE Oca-Jimenez, Roberto Montes; Vega-Sanchez, Vicente; Morales-Erasto, Vladimir; Salgado-Miranda, Celene; Blackall, Patrick J; Soriano-Vargas, Edgardo
2018-04-10
The bacterium Ornithobacterium rhinotracheale is associated with respiratory disease in wild birds and poultry. In this study, the phylogenetic analysis of nine reference strains of O. rhinotracheale belonging to serovars A to I, and eight Mexican isolates belonging to serovar A, was performed. The analysis was extended to include available sequences from another 23 strains available in the public domain. The analysis showed that the 40 sequences formed six clusters, I to VI. All eight Mexican field isolates were placed in cluster I. One of the reference strains appears to present genetic diversity not previously recognized and was placed in a new genetic cluster. In conclusion, the phylogenetic analysis of O. rhinotracheale strains, based on the 16S rRNA gene, is a suitable tool for epidemiologic studies.
Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P; Marians, Kenneth J; Erdjument-Bromage, Hediye
2016-07-01
In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods.
Patel, Rajesh; Mevada, Vishal; Prajapati, Dhaval; Dudhagara, Pravin; Koringa, Prakash; Joshi, C G
2015-03-01
We report Metagenome from the saline desert soil sample of Little Rann of Kutch, Gujarat State, India. Metagenome consisted of 633,760 sequences with size 141,307,202 bp and 56% G + C content. Metagenome sequence data are available at EBI under EBI Metagenomics database with accession no. ERP005612. Community metagenomics revealed total 1802 species belonged to 43 different phyla with dominating Marinobacter (48.7%) and Halobacterium (4.6%) genus in bacterial and archaeal domain respectively. Remarkably, 18.2% sequences in a poorly characterized group and 4% gene for various stress responses along with versatile presence of commercial enzyme were evident in a functional metagenome analysis.
Gasek, Nathan S; Nyland, Lori R; Vigoreaux, Jim O
2016-04-27
Flightin is a myosin binding protein present in Pancrustacea. In Drosophila, flightin is expressed in the indirect flight muscles (IFM), where it is required for the flexural rigidity, structural integrity, and length determination of thick filaments. Comparison of flightin sequences from multiple Drosophila species revealed a tripartite organization indicative of three functional domains subject to different evolutionary constraints. We use atomic force microscopy to investigate the functional roles of the N-terminal domain and the C-terminal domain that show different patterns of sequence conservation. Thick filaments containing a C-terminal domain truncated flightin (fln(ΔC44)) are significantly shorter (2.68 ± 0.06 μm; p < 0.005) than thick filaments containing a full length flightin (fln⁺; 3.21 ± 0.05 μm) and thick filaments containing an N-terminal domain truncated flightin (fln(ΔN62); 3.21 ± 0.06 μm). Persistence length was significantly reduced in fln(ΔN62) (418 ± 72 μm; p < 0.005) compared to fln⁺ (1386 ± 196μm) and fln(ΔC44)(1128 ± 193 μm). Statistical polymer chain analysis revealed that the C-terminal domain fulfills a secondary role in thick filament bending propensity. Our results indicate that the flightin amino and carboxy terminal domains make distinct contributions to thick filament biomechanics. We propose these distinct roles arise from the interplay between natural selection and sexual selection given IFM's dual role in flight and courtship behaviors.
Gasek, Nathan S.; Nyland, Lori R.; Vigoreaux, Jim O.
2016-01-01
Flightin is a myosin binding protein present in Pancrustacea. In Drosophila, flightin is expressed in the indirect flight muscles (IFM), where it is required for the flexural rigidity, structural integrity, and length determination of thick filaments. Comparison of flightin sequences from multiple Drosophila species revealed a tripartite organization indicative of three functional domains subject to different evolutionary constraints. We use atomic force microscopy to investigate the functional roles of the N-terminal domain and the C-terminal domain that show different patterns of sequence conservation. Thick filaments containing a C-terminal domain truncated flightin (flnΔC44) are significantly shorter (2.68 ± 0.06 μm; p < 0.005) than thick filaments containing a full length flightin (fln+; 3.21 ± 0.05 μm) and thick filaments containing an N-terminal domain truncated flightin (flnΔN62; 3.21 ± 0.06 μm). Persistence length was significantly reduced in flnΔN62 (418 ± 72 μm; p < 0.005) compared to fln+ (1386 ± 196μm) and flnΔC44(1128 ± 193 μm). Statistical polymer chain analysis revealed that the C-terminal domain fulfills a secondary role in thick filament bending propensity. Our results indicate that the flightin amino and carboxy terminal domains make distinct contributions to thick filament biomechanics. We propose these distinct roles arise from the interplay between natural selection and sexual selection given IFM’s dual role in flight and courtship behaviors. PMID:27128952
Sri, Tanu; Mayee, Pratiksha; Singh, Anandita
2015-09-01
Whole genome sequence analyses allow unravelling such evolutionary consequences of meso-triplication event in Brassicaceae (∼14-20 million years ago (MYA)) as differential gene fractionation and diversification in homeologous sub-genomes. This study presents a simple gene-centric approach involving microsynteny and natural genetic variation analysis for understanding SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1) homeolog evolution in Brassica. Analysis of microsynteny in Brassica rapa homeologous regions containing SOC1 revealed differential gene fractionation correlating to reported fractionation status of sub-genomes of origin, viz. least fractionated (LF), moderately fractionated 1 (MF1) and most fractionated (MF2), respectively. Screening 18 cultivars of 6 Brassica species led to the identification of 8 genomic and 27 transcript variants of SOC1, including splice-forms. Co-occurrence of both interrupted and intronless SOC1 genes was detected in few Brassica species. In silico analysis characterised Brassica SOC1 as MADS intervening, K-box, C-terminal (MIKC(C)) transcription factor, with highly conserved MADS and I domains relative to K-box and C-terminal domain. Phylogenetic analyses and multiple sequence alignments depicting shared pattern of silent/non-silent mutations assigned Brassica SOC1 homologs into groups based on shared diploid base genome. In addition, a sub-genome structure in uncharacterised Brassica genomes was inferred. Expression analysis of putative MF2 and LF (Brassica diploid base genome A (AA)) sub-genome-specific SOC1 homeologs of Brassica juncea revealed near identical expression pattern. However, MF2-specific homeolog exhibited significantly higher expression implying regulatory diversification. In conclusion, evidence for polyploidy-induced sequence and regulatory evolution in Brassica SOC1 is being presented wherein differential homeolog expression is implied in functional diversification.
2012-01-01
Background The NCBI Conserved Domain Database (CDD) consists of a collection of multiple sequence alignments of protein domains that are at various stages of being manually curated into evolutionary hierarchies based on conserved and divergent sequence and structural features. These domain models are annotated to provide insights into the relationships between sequence, structure and function via web-based BLAST searches. Results Here we automate the generation of conserved domain (CD) hierarchies using a combination of heuristic and Markov chain Monte Carlo (MCMC) sampling procedures and starting from a (typically very large) multiple sequence alignment. This procedure relies on statistical criteria to define each hierarchy based on the conserved and divergent sequence patterns associated with protein functional-specialization. At the same time this facilitates the sequence and structural annotation of residues that are functionally important. These statistical criteria also provide a means to objectively assess the quality of CD hierarchies, a non-trivial task considering that the protein subgroups are often very distantly related—a situation in which standard phylogenetic methods can be unreliable. Our aim here is to automatically generate (typically sub-optimal) hierarchies that, based on statistical criteria and visual comparisons, are comparable to manually curated hierarchies; this serves as the first step toward the ultimate goal of obtaining optimal hierarchical classifications. A plot of runtimes for the most time-intensive (non-parallelizable) part of the algorithm indicates a nearly linear time complexity so that, even for the extremely large Rossmann fold protein class, results were obtained in about a day. Conclusions This approach automates the rapid creation of protein domain hierarchies and thus will eliminate one of the most time consuming aspects of conserved domain database curation. At the same time, it also facilitates protein domain annotation by identifying those pattern residues that most distinguish each protein domain subgroup from other related subgroups. PMID:22726767
New Stopping Criteria for Segmenting DNA Sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Wentian
2001-06-18
We propose a solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns. This new stopping criterion is based on Bayesian information criterion in the model selection framework. When this criterion is applied to telomere of S.cerevisiae and the complete sequence of E.coli, borders of biologically meaningful units were identified, and a more reasonable number of domains was obtained. We also introduce a measure called segmentation strength which can be used to control the delineation of large domains. The relationship between the average domain size and the threshold of segmentation strength is determined for several genomemore » sequences.« less
The complete mitochondrial genome of Lota lota (Gadiformes: Gadidae) from the Burqin River in China.
Lu, Zhichuang; Zhang, Nan; Song, Na; Gao, Tianxiang
2016-05-01
In this study, the complete mitochondrial genome (mitogenome) sequence of Lota lota has been determined by long polymerase chain reaction and primer walking methods. The mitogenome is a circular molecule of 16,519 bp in length and contains 37 mitochondrial genes including 13 protein-coding genes, 2 ribosomal RNA (rRNA), 22 transfer RNA (tRNA) and a control region as other bony fishes. Within the control region, we identified the termination-associated sequence domain (TAS), the central conserved sequence block domains (CSB-F and CSB-D), and the conserved sequence block domains (CSB-1, CSB-2 and CSB-3).
Jothi, Raja; Cherukuri, Praveen F.; Tasneem, Asba; Przytycka, Teresa M.
2006-01-01
Recent advances in functional genomics have helped generate large-scale high-throughput protein interaction data. Such networks, though extremely valuable towards molecular level understanding of cells, do not provide any direct information about the regions (domains) in the proteins that mediate the interaction. Here, we performed co-evolutionary analysis of domains in interacting proteins in order to understand the degree of co-evolution of interacting and non-interacting domains. Using a combination of sequence and structural analysis, we analyzed protein–protein interactions in F1-ATPase, Sec23p/Sec24p, DNA-directed RNA polymerase and nuclear pore complexes, and found that interacting domain pair(s) for a given interaction exhibits higher level of co-evolution than the noninteracting domain pairs. Motivated by this finding, we developed a computational method to test the generality of the observed trend, and to predict large-scale domain–domain interactions. Given a protein–protein interaction, the proposed method predicts the domain pair(s) that is most likely to mediate the protein interaction. We applied this method on the yeast interactome to predict domain–domain interactions, and used known domain–domain interactions found in PDB crystal structures to validate our predictions. Our results show that the prediction accuracy of the proposed method is statistically significant. Comparison of our prediction results with those from two other methods reveals that only a fraction of predictions are shared by all the three methods, indicating that the proposed method can detect known interactions missed by other methods. We believe that the proposed method can be used with other methods to help identify previously unrecognized domain–domain interactions on a genome scale, and could potentially help reduce the search space for identifying interaction sites. PMID:16949097
Dynamics of domain coverage of the protein sequence universe
2012-01-01
Background The currently known protein sequence space consists of millions of sequences in public databases and is rapidly expanding. Assigning sequences to families leads to a better understanding of protein function and the nature of the protein universe. However, a large portion of the current protein space remains unassigned and is referred to as its “dark matter”. Results Here we suggest that true size of “dark matter” is much larger than stated by current definitions. We propose an approach to reducing the size of “dark matter” by identifying and subtracting regions in protein sequences that are not likely to contain any domain. Conclusions Recent improvements in computational domain modeling result in a decrease, albeit slowly, in the relative size of “dark matter”; however, its absolute size increases substantially with the growth of sequence data. PMID:23157439
Sicot, F X; Mesnage, M; Masselot, M; Exposito, J Y; Garrone, R; Deutsch, J; Gaill, F
2000-09-29
The annelid Alvinella pompejana is probably the most heat-tolerant metazoan organism known. Previous results have shown that the level of thermal stability of its interstitial collagen is significantly greater than that of coastal annelids and of vent organisms, such as the vestimentiferan Riftia pachyptila, living in colder parts of the deep-sea hydrothermal environment. In order to investigate the molecular basis of this thermal behavior, we cloned and sequenced a large cDNA molecule coding the fibrillar collagen of Alvinella, including one half of the helical domain and the entire C-propeptide domain. For comparison, we also cloned the 3' part of the homologous cDNA from Riftia. Comparison of the corresponding helical domains of these two species, together with that of the previously sequenced domain of the coastal lugworm Arenicola marina, showed that the increase in proline content and in the number of stabilizing triplets correlate with the outstanding thermostability of the interstitial collagen of A. pompejana. Phylogenetic analysis showed that triple helical and the C-propeptide parts of the same collagen molecule evolve at different rates, in favor of an adaptive mechanism at the molecular level. Copyright 2000 Academic Press.
Visual Sequence Learning in Infancy: Domain-General and Domain-Specific Associations with Language
ERIC Educational Resources Information Center
Shafto, Carissa L.; Conway, Christopher M.; Field, Suzanne L.; Houston, Derek M.
2012-01-01
Research suggests that nonlinguistic sequence learning abilities are an important contributor to language development (Conway, Bauernschmidt, Huang, & Pisoni, 2010). The current study investigated visual sequence learning (VSL) as a possible predictor of vocabulary development in infants. Fifty-eight 8.5-month-old infants were presented with a…
Povinelli, C M
1992-01-01
In order to detect sequence-based information predictive for the location of eukaryotic transcriptional regulatory domains, the frequencies and distributions of the 36 possible purine/pyrimidine reverse complement hexamer pairs was determined for test sets of real and random sequences. The distribution of one of the hexamer pairs (RRYYRR/YYRRYY, referred to as M1) was further examined in a larger set of sequences (> 32 genes, 230 kb). Predominant clusters of M1 and the locations of eukaryotic transcriptional regulatory domains were found to be associated and non-randomly distributed along the DNA consistent with a periodicity of approximately 1.2 kb. In the context of higher ordered chromatin this would align promoters, enhancers and the predominant clusters of M1 longitudinally along one face of a 30 nm fiber. Using only information about the distribution of the M1 motif, 50-70% of a sequence could be eliminated as being unlikely to contain transcriptional regulatory domains with an 87% recovery of the regulatory domains present.
Afrache, Hassnae; Pontarotti, Pierre; Abi-Rached, Laurent; Olive, Daniel
2017-06-01
The butyrophilin 3 (BTN3) receptors are implicated in the T lymphocytes regulation and present a wide plasticity in mammals. In order to understand how these genes have been diversified, we studied their evolution and show that the three human BTN3 are the result of two successive duplications in Primates and that the three genes are present in Hominoids and the Old World Monkey groups. A thorough phylogenetic analysis reveals a concerted evolution of BTN3 characterized by a strong and recurrent homogenization of the region encoding the signal peptide and the immunoglobulin variable (IgV) domain in Hominoids, where the sequences of BTN3A1 or BTN3A3 are replaced by BTN3A2 sequence. In human, the analysis of the diversity of these genes in 1683 individuals representing 26 worldwide populations shows that the three genes are polymorphic, with more than 46 alleles for each gene, and marked by extreme homogenization of the IgV sequences. The same analysis performed for the BTN2 genes shows also a concerted evolution; however, it is not as strong and recurrent as for BTN3. This study shows that BTN3 receptors are marked by extreme concerted evolution at the IgV domain and that BTN3A2 plays a central role in this evolution.
Somaraju Chalasani, Madhavi Latha; Muppirala, Madhavi; G Ponnam, Surya Prakash; Kannabiran, Chitra; Swarup, Ghanshyam
2013-01-01
Mutations in the eye lens gap junction protein connexin 50 cause cataract. Earlier we identified a frameshift mutant of connexin 50 (c.670insA; p.Thr203AsnfsX47) in a family with autosomal recessive cataract. The mutant protein is smaller and contains 46 aberrant amino acids at the C-terminus after amino acid 202. Here, we have analysed this frameshift mutant and observed that it localized to the endoplasmic reticulum (ER) but not in the plasma membrane. Moreover, overexpression of the mutant resulted in disintegration of the ER-Golgi intermediate compartment (ERGIC), reduction in the level of ERGIC-53 protein and breakdown of the Golgi in many cells. Overexpression of the frameshift mutant partially inhibited the transport of wild type connexin 50 to the plasma membrane. A deletion mutant lacking the aberrant sequence showed predominant localization in the ER and inhibited anterograde protein transport suggesting, therefore, that the aberrant sequence is not responsible for improper localization of the frameshift mutant. Further deletion analysis showed that the fourth transmembrane domain and a membrane proximal region (231-294 amino acids) of the cytoplasmic domain are needed for transport from the ER and localization to the plasma membrane. Our results show that a frameshift mutant of connexin 50 mislocalizes to the ER and causes disintegration of the ERGIC and Golgi. We have also identified a sequence of connexin 50 crucial for transport from the ER and localization to the plasma membrane.
Journalism as Model for Civic and Information Literacies
ERIC Educational Resources Information Center
Smirnov, Natalia; Saiyed, Gulnaz; Easterday, Matthew W.; Lam, Wan Shun Eva
2018-01-01
Journalism can serve as a generative disciplinary context for developing civic and information literacies needed to meaningfully participate in an increasingly networked and mediated public sphere. Using interviews with journalists, we developed a cognitive task analysis model, identifying an iterative sequence of production and domain-specific…
Bernardes, Juliana; Zaverucha, Gerson; Vaquero, Catherine; Carbone, Alessandra
2016-01-01
Traditional protein annotation methods describe known domains with probabilistic models representing consensus among homologous domain sequences. However, when relevant signals become too weak to be identified by a global consensus, attempts for annotation fail. Here we address the fundamental question of domain identification for highly divergent proteins. By using high performance computing, we demonstrate that the limits of state-of-the-art annotation methods can be bypassed. We design a new strategy based on the observation that many structural and functional protein constraints are not globally conserved through all species but might be locally conserved in separate clades. We propose a novel exploitation of the large amount of data available: 1. for each known protein domain, several probabilistic clade-centered models are constructed from a large and differentiated panel of homologous sequences, 2. a decision-making protocol combines outcomes obtained from multiple models, 3. a multi-criteria optimization algorithm finds the most likely protein architecture. The method is evaluated for domain and architecture prediction over several datasets and statistical testing hypotheses. Its performance is compared against HMMScan and HHblits, two widely used search methods based on sequence-profile and profile-profile comparison. Due to their closeness to actual protein sequences, clade-centered models are shown to be more specific and functionally predictive than the broadly used consensus models. Based on them, we improved annotation of Plasmodium falciparum protein sequences on a scale not previously possible. We successfully predict at least one domain for 72% of P. falciparum proteins against 63% achieved previously, corresponding to 30% of improvement over the total number of Pfam domain predictions on the whole genome. The method is applicable to any genome and opens new avenues to tackle evolutionary questions such as the reconstruction of ancient domain duplications, the reconstruction of the history of protein architectures, and the estimation of protein domain age. Website and software: http://www.lcqb.upmc.fr/CLADE. PMID:27472895
Nilojan, Jehanathan; Bathige, S D N K; Thulasitha, W S; Kwon, Hyukjae; Jung, Sumi; Kim, Myoung-Jin; Nam, Bo-Hye; Lee, Jehee
2018-04-01
C1-inhibitor (C1inh) plays a crucial role in assuring homeostasis and is the central regulator of the complement activation involved in immunity and inflammation. A C1-inhibitor gene from Sebastes schlegelii was identified and designated as SsC1inh. The identified genomic DNA and cDNA sequences were 6837 bp and 2161 bp, respectively. The genomic DNA possessed 11 exons, interrupted by 10 introns. The amino acid sequence possessed two immunoglobulin-like domains and a serpin domain. Multiple sequence alignment revealed that the serpin domain of SsC1inh was highly conserved among analyzed species where the two immunoglobulin-like domains showed divergence. The distinctiveness of teleost C1inh from other homologs was indicated by the phylogenetic analysis, genomic DNA organization, and their extended N-terminal amino acid sequences. Under normal physiological conditions, SsC1inh mRNA was most expressed in the liver, followed by the gills. The involvement of SsC1inh in homeostasis was demonstrated by modulated transcription profiles in the liver and spleen upon pathogenic stress by different immune stimulants. The protease inhibitory potential of recombinant SsC1inh (rSsC1inh) and the potentiation effect of heparin on rSsC1inh was demonstrated against C1esterase and thrombin. For the first time, the anti-protease activity of the teleost C1inh against its natural substrates C1r and C1s was proved in this study. The protease assay conducted with recombinant black rockfish C1r and C1s proteins in the presence or absence of rSsC1inh showed that the activities of both proteases were significantly diminished by rSsC1inh. Taken together, results from the present study indicate that SsC1inh actively plays a significant role in maintaining homeostasis in the immune system of black rock fish. Copyright © 2018. Published by Elsevier Ltd.
Monteiro, Rose A; Souza, Emanuel M; Geoffrey Yates, M; Steffens, M Berenice R; Pedrosa, Fábio O; Chubatsu, Leda S
2003-02-01
The Herbaspirillum seropedicae NifA protein is responsible for nif gene expression. The C-terminal domain of the H. seropedicae NifA protein, fused to a His-Tag sequence (His-Tag-C-terminal), was over-expressed and purified by metal-affinity chromatography to yield a highly purified and active protein. Band-shift assays showed that the NifA His-Tag-C-terminal bound specifically to the H. seropedicae nifB promoter region in vitro. In vivo analysis showed that this protein inhibited the Central + C-terminal domains of NifA protein from activating the nifH promoter of K. pneumoniae in Escherichia coli, indicating that the protein must be bound to the NifA-binding site (UAS site) at the nifH promoter region to activate transcription. Copyright 2002 Elsevier Science (USA)
Evaluating, Comparing, and Interpreting Protein Domain Hierarchies
2014-01-01
Abstract Arranging protein domain sequences hierarchically into evolutionarily divergent subgroups is important for investigating evolutionary history, for speeding up web-based similarity searches, for identifying sequence determinants of protein function, and for genome annotation. However, whether or not a particular hierarchy is optimal is often unclear, and independently constructed hierarchies for the same domain can often differ significantly. This article describes methods for statistically evaluating specific aspects of a hierarchy, for probing the criteria underlying its construction and for direct comparisons between hierarchies. Information theoretical notions are used to quantify the contributions of specific hierarchical features to the underlying statistical model. Such features include subhierarchies, sequence subgroups, individual sequences, and subgroup-associated signature patterns. Underlying properties are graphically displayed in plots of each specific feature's contributions, in heat maps of pattern residue conservation, in “contrast alignments,” and through cross-mapping of subgroups between hierarchies. Together, these approaches provide a deeper understanding of protein domain functional divergence, reveal uncertainties caused by inconsistent patterns of sequence conservation, and help resolve conflicts between competing hierarchies. PMID:24559108
Functional domains of the poliovirus receptor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koike, Satoshi; Ise, Iku; Nomoto, Akio
1991-05-15
A number of mutant cDNAs of the human poliovirus receptor were constructed to identify essential regions of the molecule as the receptor. All mutant cDNAs carrying the sequence coding for the entire N-terminal immunoglobulin-like domain (domain I) confer permissiveness for poliovirus to mouse L cells, but a mutant cDNA lacking the sequence for domain I does not. The transformants permissive for poliovirus were able to bind the virus and were also recognized by monoclonal antibody D171, which competes with poliovirus for the cellular receptor. These results strongly suggest that the poliovirus binding site resides in domain I of the receptor.more » Mutant cDNAs for the sequence encoding the intracellular peptide were also constructed and expressed in mouse L cells. Susceptibility of these cells to poliovirus revealed that the entire putative cytoplasmic domain is not essential for virus infection. Thus, the cytoplasmic domain of the molecule appears not to play a role in the penetration of poliovirus.« less
Functional display of platelet-binding VWF fragments on filamentous bacteriophage.
Yee, Andrew; Tan, Fen-Lai; Ginsburg, David
2013-01-01
von Willebrand factor (VWF) tethers platelets to sites of vascular injury via interaction with the platelet surface receptor, GPIb. To further define the VWF sequences required for VWF-platelet interaction, a phage library displaying random VWF protein fragments was screened against formalin-fixed platelets. After 3 rounds of affinity selection, DNA sequencing of platelet-bound clones identified VWF peptides mapping exclusively to the A1 domain. Aligning these sequences defined a minimal, overlapping segment spanning P1254-A1461, which encompasses the C1272-C1458 cystine loop. Analysis of phage carrying a mutated A1 segment (C1272/1458A) confirmed the requirement of the cystine loop for optimal binding. Four rounds of affinity maturation of a randomly mutagenized A1 phage library identified 10 and 14 unique mutants associated with enhanced platelet binding in the presence and absence of botrocetin, respectively, with 2 mutants (S1370G and I1372V) common to both conditions. These results demonstrate the utility of filamentous phage for studying VWF protein structure-function and identify a minimal, contiguous peptide that bind to formalin-fixed platelets, confirming the importance of the VWF A1 domain with no evidence for another independently platelet-binding segment within VWF. These findings also point to key structural elements within the A1 domain that regulate VWF-platelet adhesion.
Huang, Hua; Yury, Patskovsky; Toro, Rafael; Farelli, Jeremiah D.; Pandya, Chetanya; Almo, Steven C.; Allen, Karen N.; Dunaway-Mariano, Debra
2012-01-01
The explosion of protein sequence information requires that current strategies for function assignment must evolve to complement experimental approaches with computationally-based function prediction. This necessitates the development of strategies based on the identification of sequence markers in the form of specificity determinants and a more informed definition of orthologues. Herein, we have undertaken the function assignment of the unknown Haloalkanoate Dehalogenase superfamily member BT2127 (Uniprot accession # Q8A5V9) from Bacteroides thetaiotaomicron using an integrated bioinformatics/structure/mechanism approach. The substrate specificity profile and steady-state rate constants of BT2127 (with kcat/Km value for pyrophosphate of ∼1 × 105 M−1 s−1), together with the gene context, supports the assigned in vivo function as an inorganic pyrophosphatase. The X-ray structural analysis of the wild-type BT2127 and several variants generated by site-directed mutagenesis shows that substrate discrimination is based, in part, on active site space restrictions imposed by the cap domain (specifically by residues Tyr76 and Glu47). Structure guided site directed mutagenesis coupled with kinetic analysis of the mutant enzymes identified the residues required for catalysis, substrate binding, and domain-domain association. Based on this structure-function analysis, the catalytic residues Asp11, Asp13, Thr113, and Lys147 as well the metal binding residues Asp171, Asn172 and Glu47 were used as markers to confirm BT2127 orthologues identified via sequence searches. This bioinformatic analysis demonstrated that the biological range of BT2127 orthologue is restricted to the phylum Bacteroidetes/Chlorobi. The key structural determinants in the divergence of BT2127 and its closest homologue β-phosphoglucomutase control the leaving group size (phosphate vs. glucose-phosphate) and the position of the Asp acid/base in the open vs. closed conformations. HADSF pyrophosphatases represent a third mechanistic and fold type for bacterial pyrophosphatases. PMID:21894910
Shamoo, Yousif; Sun, Siyang
2014-06-10
Chimeric proteins comprising a sequence nonspecific single-stranded nucleic-acid-binding domain joined to a catalytic nucleic-acid-modifying domain are provided. Methods comprising contacting a nucleic acid molecule with a chimeric protein, as well as systems comprising a nucleic acid molecule, a chimeric protein, and an aqueous solution are also provided. The joining of sequence nonspecific single-stranded nucleic-acid-binding domain and a catalytic nucleic-acid-modifying domain in chimeric proteins, among other things, may prevent the separation of the two domains due to their weak association and thereby enhances processivity while maintaining fidelity.
Churchill, M E; Jones, D N; Glaser, T; Hefner, H; Searles, M A; Travers, A A
1995-01-01
The high mobility group (HMG) protein HMG-D from Drosophila melanogaster is a highly abundant chromosomal protein that is closely related to the vertebrate HMG domain proteins HMG1 and HMG2. In general, chromosomal HMG domain proteins lack sequence specificity. However, using both NMR spectroscopy and standard biochemical techniques we show that binding of HMG-D to a single DNA site is sequence selective. The preferred duplex DNA binding site comprises at least 5 bp and contains the deformable dinucleotide TG embedded in A/T-rich sequences. The TG motif constitutes a common core element in the binding sites of the well-characterized sequence-specific HMG domain proteins. We show that a conserved aromatic residue in helix 1 of the HMG domain may be involved in recognition of this core sequence. In common with other HMG domain proteins HMG-D binds preferentially to DNA sites that are stably bent and underwound, therefore HMG-D can be considered an architecture-specific protein. Finally, we show that HMG-D bends DNA and may confer a superhelical DNA conformation at a natural DNA binding site in the Drosophila fushi tarazu scaffold-associated region. Images PMID:7720717
Liu, Li Na; Cui, Jing; Zhang, Xi; Wei, Tong; Jiang, Peng; Wang, Zhong Quan
2013-01-01
Spirometra erinaceieuropaei cysteine protease (SeCP) in sparganum ES proteins recognized by early infection sera was identified by MALDI-TOF/TOF-MS. The aim of this study was to predict the structures and functions of SeCP protein by using the full length cDNA sequence of SeCP gene with online sites and software programs. The SeCP gene sequence was of 1 053 bp length with a 1011 bp biggest ORF encoding 336-amino acid protein with a complete cathepsin propeptide inhibitor domain and a peptidase C1A conserved domain. The predicted molecular weight and isoelectric point of SeCP were 37.87 kDa and 6.47, respectively. The SeCP has a signal peptide site and no transmembrane domain, located outside the membrane. The secondary structure of SeCP contained 8 α-helixes, 7 β-strands, and 20 coils. The SeCP had 15 potential antigenic epitopes and 19 HLA-I restricted epitopes. Based on the phylogenetic analysis of SeCP, S. erinaceieuropaei has the closest evolutionary status with S. mansonoides. SeCP was a kind of proteolytic enzyme with a variety of biological functions and its antigenic epitopes could provide important insights on the diagnostic antigens and target molecular of antisparganum drugs. PMID:24392448
Putative Monofunctional Type I Polyketide Synthase Units: A Dinoflagellate-Specific Feature?
Eichholz, Karsten; Beszteri, Bánk; John, Uwe
2012-01-01
Marine dinoflagellates (alveolata) are microalgae of which some cause harmful algal blooms and produce a broad variety of most likely polyketide synthesis derived phycotoxins. Recently, novel polyketide synthesase (PKS) transcripts have been described from the Florida red tide dinoflagellate Karenia brevis (gymnodiniales) which are evolutionarily related to Type I PKS but were apparently expressed as monofunctional proteins, a feature typical of Type II PKS. Here, we investigated expression units of PKS I-like sequences in Alexandrium ostenfeldii (gonyaulacales) and Heterocapsa triquetra (peridiniales) at the transcript and protein level. The five full length transcripts we obtained were all characterized by polyadenylation, a 3′ UTR and the dinoflagellate specific spliced leader sequence at the 5′end. Each of the five transcripts encoded a single ketoacylsynthase (KS) domain showing high similarity to K. brevis KS sequences. The monofunctional structure was also confirmed using dinoflagellate specific KS antibodies in Western Blots. In a maximum likelihood phylogenetic analysis of KS domains from diverse PKSs, dinoflagellate KSs formed a clade placed well within the protist Type I PKS clade between apicomplexa, haptophytes and chlorophytes. These findings indicate that the atypical PKS I structure, i.e., expression as putative monofunctional units, might be a dinoflagellate specific feature. In addition, the sequenced transcripts harbored a previously unknown, apparently dinoflagellate specific conserved N-terminal domain. We discuss the implications of this novel region with regard to the putative monofunctional organization of Type I PKS in dinoflagellates. PMID:23139807
LenVarDB: database of length-variant protein domains.
Mutt, Eshita; Mathew, Oommen K; Sowdhamini, Ramanathan
2014-01-01
Protein domains are functionally and structurally independent modules, which add to the functional variety of proteins. This array of functional diversity has been enabled by evolutionary changes, such as amino acid substitutions or insertions or deletions, occurring in these protein domains. Length variations (indels) can introduce changes at structural, functional and interaction levels. LenVarDB (freely available at http://caps.ncbs.res.in/lenvardb/) traces these length variations, starting from structure-based sequence alignments in our Protein Alignments organized as Structural Superfamilies (PASS2) database, across 731 structural classification of proteins (SCOP)-based protein domain superfamilies connected to 2 730 625 sequence homologues. Alignment of sequence homologues corresponding to a structural domain is available, starting from a structure-based sequence alignment of the superfamily. Orientation of the length-variant (indel) regions in protein domains can be visualized by mapping them on the structure and on the alignment. Knowledge about location of length variations within protein domains and their visual representation will be useful in predicting changes within structurally or functionally relevant sites, which may ultimately regulate protein function. Non-technical summary: Evolutionary changes bring about natural changes to proteins that may be found in many organisms. Such changes could be reflected as amino acid substitutions or insertions-deletions (indels) in protein sequences. LenVarDB is a database that provides an early overview of observed length variations that were set among 731 protein families and after examining >2 million sequences. Indels are followed up to observe if they are close to the active site such that they can affect the activity of proteins. Inclusion of such information can aid the design of bioengineering experiments.
Stoops, Janelle; Byrd, Samantha; Hasegawa, Haruki
2012-10-01
Russell bodies are intracellular aggregates of immunoglobulins. Although the mechanism of Russell body biogenesis has been extensively studied by using truncated mutant heavy chains, the importance of the variable domain sequences in this process and in immunoglobulin biosynthesis remains largely unknown. Using a panel of structurally and functionally normal human immunoglobulin Gs, we show that individual immunoglobulin G clones possess distinctive Russell body inducing propensities that can surface differently under normal and abnormal cellular conditions. Russell body inducing predisposition unique to each immunoglobulin G clone was corroborated by the intrinsic physicochemical properties encoded in the heavy chain variable domain/light chain variable domain sequence combinations that define each immunoglobulin G clone. While the sequence based intrinsic factors predispose certain immunoglobulin G clones to be more prone to induce Russell bodies, extrinsic factors such as stressful cell culture conditions also play roles in unmasking Russell body propensity from immunoglobulin G clones that are normally refractory to developing Russell bodies. By taking advantage of heterologous expression systems, we dissected the roles of individual subunit chains in Russell body formation and examined the effect of non-cognate subunit chain pair co-expression on Russell body forming propensity. The results suggest that the properties embedded in the variable domain of individual light chain clones and their compatibility with the partnering heavy chain variable domain sequences underscore the efficiency of immunoglobulin G biosynthesis, the threshold for Russell body induction, and the level of immunoglobulin G secretion. We propose that an interplay between the unique properties encoded in variable domain sequences and the state of protein homeostasis determines whether an immunoglobulin G expressing cell will develop the Russell body phenotype in a dynamic cellular setting. Copyright © 2012 Elsevier B.V. All rights reserved.
Pastar, Irena; Tonic, Ivana; Golic, Natasa; Kojic, Milan; van Kranenburg, Richard; Kleerebezem, Michiel; Topisirovic, Ljubisa; Jovanovic, Goran
2003-01-01
A novel proteinase, PrtR, produced by the human vaginal isolate Lactobacillus rhamnosus strain BGT10 was identified and genetically characterized. The prtR gene and flanking regions were cloned and sequenced. The deduced amino acid sequence of PrtR shares characteristics that are common for other cell envelope proteinases (CEPs) characterized to date, but in contrast to the other cell surface subtilisin-like serine proteinases, it has a smaller and somewhat different B domain and lacks the helix domain, and the anchor domain has a rare sorting signal sequence. Furthermore, PrtR lacks the insert domain, which otherwise is situated inside the catalytic serine protease domain of all CEPs, and has a different cell wall spacer (W) domain similar to that of the cell surface antigen I and II polypeptides expressed by oral and vaginal streptococci. Moreover, the PrtR W domain exhibits significant sequence homology to the consensus sequence that has been shown to be the hallmark of human intestinal mucin protein. According to its αS1- and β-casein cleavage efficacy, PrtR is an efficient proteinase at pH 6.5 and is distributed throughout all L. rhamnosus strains tested. Proteinase extracts of the BGT10 strain obtained with Ca2+-free buffer at pH 6.5 were proteolytically active. The prtR promoter-like sequence was determined, and the minimal promoter region was defined by use of prtR-gusA operon fusions. The prtR expression is Casitone dependent, emphasizing that nitrogen depletion elevates its transcription. This is in correlation with the catalytic activity of the PrtR proteinase. PMID:14532028
Smith, Adam C.; Suzuki, Masako; Thompson, Reid; Choufani, Sanaa; Higgins, Michael J.; Chiu, Idy W.; Squire, Jeremy A.; Greally, John M.; Weksberg, Rosanna
2015-01-01
Beckwith-Wiedemann syndrome (BWS) is an overgrowth syndrome associated with genetic or epigenetic alterations in one of two imprinted domains on chromosome 11p15.5. Rarely, chromosomal translocations or inversions of chromosome 11p15.5 are associated with BWS but the molecular pathophysiology in such cases is not understood. In our series of 3 translocation and 2 inversion patients with BWS, the chromosome 11p15.5 breakpoints map within the centromeric imprinted domain, 2. We hypothesized that either microdeletions/microduplications adjacent to the breakpoints could disrupt genomic sequences important for imprinted gene regulation. An alternate hypothesis was that epigenetic alterations of as yet unknown regulatory DNA sequences, result in the BWS phenotype. A high resolution Nimblegen custom microarray was designed representing all non-repetitive sequences in the telomeric 33 MB of the short arm of human chromosome 11. For the BWS-associated chromosome 11p15.5 translocations and inversions, we found no evidence of microdeletions/microduplications. DNA methylation was also tested on this microarray using the HpaII tiny fragment enrichment by ligation-mediated PCR (HELP) assay. This high-resolution DNA methylation microarray analysis revealed a gain of DNA methylation in the translocation/inversion patients affecting the p-ter segment of chromosome 11p15, including both imprinted domains. BWS patients that inherited a maternal translocation or inversion also demonstrated reduced expression of the growth suppressing imprinted gene, CDKN1C in Domain 2. In summary, our data demonstrate that translocations and inversions involving imprinted domain 2 on chromosome 11p15.5, alter regional DNA methylation patterns and imprinted gene expression in cis, suggesting that these epigenetic alterations are generated by an alteration in “chromatin context”. PMID:22079941
Ito, T M; Polido, P B; Rampim, M C; Kaschuk, G; Souza, S G H
2014-09-26
Sweet orange (Citrus sinensis) plays an important role in the economy of more than 140 countries, but it is grown in areas with intermittent stressful soil and climatic conditions. The stress tolerance could be addressed by manipulating the ethylene response factor (ERF) transcription factors because they orchestrate plant responses to environmental stress. We performed an in silico study on the ERFs in the expressed sequence tag database of C. sinensis to identify potential genes that regulate plant responses to stress. We identified 108 putative genes encoding protein sequences of the AP2/ERF superfamily distributed within 10 groups of amino acid sequences. Ninety-one genes were assembled from the ERF family containing only one AP2/ERF domain, 13 genes were assembled from the AP2 family containing two AP2/ERF domains, and four other genes were assembled from the RAV family containing one AP2/ERF domain and a B3 domain. Some conserved domains of the ERF family genes were disrupted into a few segments by introns. This irregular distribution of genes in the AP2/ERF superfamily in different plant species could be a result of genomic losses or duplication events in a common ancestor. The in silico gene expression revealed that 67% of AP2/ERF genes are expressed in tissues with usual plant development, and 14% were expressed in stressed tissues. Because the AP2/ERF superfamily is expressed in an orchestrated way, it is possible that the manipulation of only one gene may result in changes in the whole plant function, which could result in more tolerant crops.
Camicia, Federico; Paredes, Rodolfo; Chalar, Cora; Galanti, Norbel; Kamenetzky, Laura; Gutierrez, Ariana; Rosenzvit, Mara C
2008-03-31
We have sequenced and partially characterized an Echinococcus granulosus cDNA, termed egat1, from a protoscolex signal sequence trap (SST) cDNA library. The isolated 1627 bp long cDNA contains an ORF of 489 amino acids and shows an amino acid identity of 30% with neutral and excitatory amino acid transporters members of the Dicarboxylate/Amino Acid Na+ and/or H+ Cation Symporter family (DAACS) (TC 2.A.23). Additional bioinformatics analysis of EgAT1, confirmed the results obtained by similarity searches and showed the presence of 9 to 10 transmembrane domains, consensus sequences for N-glycosylation between the third and fourth transmembrane domain, a highly similar hydropathy profile with ASCT1 (a known member of DAACS family), high score with SDF (Sodium Dicarboxilate Family) and similar motifs with EDTRANSPORT, a fingerprint of excitatory amino acid transporters. The localization of the putative amino acid transporter was analyzed by in situ hybridization and immunofluorescence in protoscoleces and associated germinal layer. The in situ hybridization labelling indicates the distribution of egat1 mRNA throughout the tegument. EgAT1 protein, which showed in Western blots a molecular mass of approximately 60 kD, is localized in the subtegumental region of the metacestode, particularly around suckers and rostellum of protoscoleces and layers from brood capsules. The sequence and expression analyses of EgAT1 pave the way for functional analysis of amino acids transporters of E. granulosus and its evaluation as new drug targets against cystic echinococcosis.
Motevalli Haghi, A; Nateghpour, M; Edrissian, Ghh; Sepehrizadeh, Z; Mohebali, M; Khoramizade, Mr; Shahrbabak, S Sabouri; Moghimi, H
2012-01-01
Plasmodium vivax is responsible for approximately 80 million malaria cases in the world. Apical membrane antigen1 (AMA-1) is a type I integral membrane protein present in all Plasmodium species. AMA-1 interferes in critical steps of invasion of human hepatocytes by sporozoites and red blood cells by merozoites and is one of the most immunodominant antigens for eliciting a protective immune response in human. It is considered as a promising antigen for inclusion in a vaccine against P. vivax. Since more knowledge is needed to lighten the scope of such antigen we compared genetic variation in P. vivax AMA-1from an Iranian isolate with those reported from some of the other malarious countries so far. P. vivax genomic DNA was extracted from the whole blood of an Iranian patient with patent P. vivax infection. The nucleotide sequence for 446 amino acid (AA) residues (42-488 of PvAMA-1) was amplified by PCR and cloned in pUC19 vector for sequencing. Sequence analysis of the antigen showed a high degree of identity (99%) with strong homology to the PvAMA-1 gene of P. vivax S3 and SKO814 isolates from India and Korea (Asian isolates) respectively, and 96% similarity with P. vivax Sal-1 AMA-1 gene from El Salvador. We cloned and characterized three domains of PvAMA-1 gene from an Iranian patient. Predicted protein sequence of this gene showed some discrepancies in corresponding protein in comparing with similar genes reported from other malarious countries.
MutationAligner: a resource of recurrent mutation hotspots in protein domains in cancer.
Gauthier, Nicholas Paul; Reznik, Ed; Gao, Jianjiong; Sumer, Selcuk Onur; Schultz, Nikolaus; Sander, Chris; Miller, Martin L
2016-01-04
The MutationAligner web resource, available at http://www.mutationaligner.org, enables discovery and exploration of somatic mutation hotspots identified in protein domains in currently (mid-2015) more than 5000 cancer patient samples across 22 different tumor types. Using multiple sequence alignments of protein domains in the human genome, we extend the principle of recurrence analysis by aggregating mutations in homologous positions across sets of paralogous genes. Protein domain analysis enhances the statistical power to detect cancer-relevant mutations and links mutations to the specific biological functions encoded in domains. We illustrate how the MutationAligner database and interactive web tool can be used to explore, visualize and analyze mutation hotspots in protein domains across genes and tumor types. We believe that MutationAligner will be an important resource for the cancer research community by providing detailed clues for the functional importance of particular mutations, as well as for the design of functional genomics experiments and for decision support in precision medicine. MutationAligner is slated to be periodically updated to incorporate additional analyses and new data from cancer genomics projects. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Bailey, Paul C; Schudoma, Christian; Jackson, William; Baggs, Erin; Dagdas, Gulay; Haerty, Wilfried; Moscou, Matthew; Krasileva, Ksenia V
2018-02-19
The plant immune system is innate and encoded in the germline. Using it efficiently, plants are capable of recognizing a diverse range of rapidly evolving pathogens. A recently described phenomenon shows that plant immune receptors are able to recognize pathogen effectors through the acquisition of exogenous protein domains from other plant genes. We show that plant immune receptors with integrated domains are distributed unevenly across their phylogeny in grasses. Using phylogenetic analysis, we uncover a major integration clade, whose members underwent repeated independent integration events producing diverse fusions. This clade is ancestral in grasses with members often found on syntenic chromosomes. Analyses of these fusion events reveals that homologous receptors can be fused to diverse domains. Furthermore, we discover a 43 amino acid long motif associated with this dominant integration clade which is located immediately upstream of the fusion site. Sequence analysis reveals that DNA transposition and/or ectopic recombination are the most likely mechanisms of formation for nucleotide binding leucine rich repeat proteins with integrated domains. The identification of this subclass of plant immune receptors that is naturally adapted to new domain integration will inform biotechnological approaches for generating synthetic receptors with novel pathogen "baits."
Dunbrack, Roland L.
2012-01-01
Motivation: Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. Results: We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM–HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. Availability: The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly. Contact: Roland.Dunbracks@fccc.edu PMID:22942020
Saito, T; Ochiai, H
1999-10-01
cDNA fragments putatively encoding amino acid sequences characteristic of the fatty acid desaturase were obtained using expressed sequence tag (EST) information of the Dictyostelium cDNA project. Using this sequence, we have determined the cDNA sequence and genomic sequence of a desaturase. The cloned cDNA is 1489 nucleotides long and the deduced amino acid sequence comprised 464 amino acid residues containing an N-terminal cytochrome b5 domain. The whole sequence was 38.6% identical to the initially identified Delta5-desaturase of Mortierella alpina. We have confirmed its function as Delta5-desaturase by over expression mutation in D. discoideum and also the gain of function mutation in the yeast Saccharomyces cerevisiae. Analysis of the lipids from transformed D. discoideum and yeast demonstrated the accumulation of Delta5-desaturated products. This is the first report concering fatty acid desaturase in cellular slime molds.
Wieczorek, Anna; McHenry, Charles S
2006-05-05
The alpha subunit of the replicase of all bacteria contains a php domain, initially identified by its similarity to histidinol phosphatase but of otherwise unknown function (Aravind, L., and Koonin, E. V. (1998) Nucleic Acids Res. 26, 3746-3752). Deletion of 60 residues from the NH2 terminus of the alpha php domain destroys epsilon binding. The minimal 255-residue php domain, estimated by sequence alignment with homolog YcdX, is insufficient for epsilon binding. However, a 320-residue segment including sequences that immediately precede the polymerase domain binds epsilon with the same affinity as the 1160-residue full-length alpha subunit. A subset of mutations of a conserved acidic residue (Asp43 in Escherichia coli alpha) present in the php domain of all bacterial replicases resulted in defects in epsilon binding. Using sequence alignments, we show that the prototypical gram+ Pol C, which contains the polymerase and proofreading activities within the same polypeptide chain, has an epsilon-like sequence inserted in a surface loop near the center of the homologous YcdX protein. These findings suggest that the php domain serves as a platform to enable coordination of proofreading and polymerase activities during chromosomal replication.
Evolutionary analysis of the TPP-dependent enzyme family.
Costelloe, Seán J; Ward, John M; Dalby, Paul A
2008-01-01
The evolutionary relationships of the thiamine pyrophosphate (TPP)-dependent family of enzymes was investigated by generation of a neighbor joining phylogenetic tree using sequences from the conserved pyrophosphate (PP) and pyrimidine (Pyr) binding domains of 17 TPP-dependent enzymes. This represents the most comprehensive analysis of TPP-dependent enzyme evolution to date. The phylogeny was shown to be robust by comparison with maximum likelihood trees generated for each individual enzyme and also broadly confirms the evolutionary history proposed recently from structural comparisons alone (Duggleby 2006). The phylogeny is most parsimonious with the TPP enzymes having arisen from a homotetramer which subsequently diverged into an alpha(2)beta(2) heterotetramer. The relationship between the PP- and Pyr-domains and the recruitment of additional protein domains was examined using the transketolase C-terminal (TKC)-domain as an example. This domain has been recruited by several members of the family and yet forms no part of the active site and has unknown function. Removal of the TKC-domain was found to increase activity toward beta-hydroxypyruvate and glycolaldehyde. Further truncations of the Pyr-domain yielded several variants with retained activity. This suggests that the influence of TKC-domain recruitment on the evolution of the mechanism and specificity of transketolase (TK) has been minor, and that the smallest functioning unit of TK comprises the PP- and Pyr-domains, whose evolutionary histories extend to all TPP-dependent enzymes.
Deineko, Viktor
2006-01-01
Human multisynthetase complex auxiliary component, protein p43 is an endothelial monocyte-activating polypeptide II precursor. In this study, comprehensive sequence analysis of N-terminus has been performed to identify structural domains, motifs, sites of post-translation modification and other functionally important parameters. The spatial structure model of full-chain protein p43 is obtained.
USDA-ARS?s Scientific Manuscript database
Independent surveys of yeasts associated with lignocellulosic-related materials led to the discovery of a novel yeast species belonging to the Cyberlindnera clade (Saccharomycotina, Ascomycota). Analysis of the sequences of the internal transcribed spacer (ITS) region and the D1/D2 domains of the la...
Molecular analysis of two cDNA clones encoding acidic class I chitinase in maize.
Wu, S; Kriz, A L; Widholm, J M
1994-01-01
The cloning and analysis of two different cDNA clones encoding putative maize (Zea mays L.) chitinases obtained by polymerase chain reaction (PCR) and cDNA library screening is described. The cDNA library was made from poly(A)+ RNA from leaves challenged with mercuric chloride for 2 d. The two clones, pCh2 and pCh11, appear to encode class I chitinase isoforms with cysteine-rich domains (not found in pCh11 due to the incomplete sequence) and proline-/glycine-rich or proline-rich hinge domains, respectively. The pCh11 clone resembles a previously reported maize seed chitinase; however, the deduced proteins were found to have acidic isoelectric points. Analysis of all monocot chitinase sequences available to date shows that not all class I chitinases possess the basic isoelectric points usually found in dicotyledonous plants and that monocot class II chitinases do not necessarily exhibit acidic isoelectric points. Based on sequence analysis, the pCh2 protein is apparently synthesized as a precursor polypeptide with a signal peptide. Although these two clones belong to class I chitinases, they share only about 70% amino acid homology in the catalytic domain region. Southern blot analysis showed that pCh2 may be encoded by a small gene family, whereas pCh11 was single copy. Northern blot analysis demonstrated that these genes are differentially regulated by mercuric chloride treatment. Mercuric chloride treatment caused rapid induction of pCh2 from 6 to 48 h, whereas pCh11 responded only slightly to the same treatment. During seed germination, embryos constitutively expressed both chitinase genes and the phytohormone abscisic acid had no effect on the expression. The fungus Aspergillus flavus was able to induce both genes to comparable levels in aleurone layers and embryos but not in endosperm tissue. Maize callus growth on the same plate with A. flavus for 1 week showed induction of the transcripts corresponding to pCh2 but not to pCh11. These studies indicate that the different chitinase isoforms in maize might have different functions in the plant, since they show differential expression patterns under different conditions. PMID:7972490
Hu, Guobin; Yin, Xiangyan; Lou, Huimin; Xia, Jun; Dong, Xianzhi; Zhang, Jianyie; Liu, Qiuming
2011-02-01
Two cDNAs with different 3'-untranslated region (UTR) encoding an interferon regulatory factor 3 (IRF-3) were cloned from head kidney of Japanese flounder, Paralichthys olivaceus, by reverse transcription polymerase chain reaction (RT-PCR) and rapid amplification of cDNA ends (RACE) methods. Sequence analysis reveals that they were generated by alternative polyadenylation. The predicted protein consists of 467 amino acid residues which shares the highest identity of 50.7-57.6% to fish IRF-3 and possesses a DNA-binding domain (DBD), an IRF association domain (IAD) and a serine-rich domain (SRD) of vertebrate IRF-3. The presence of these domains along with phylogenetic analysis places it into the IRF-3 group of the IRF-3 subfamily. RT-PCR analysis revealed that flounder IRF-3 was expressed constitutively in limited tissue types including head kidney, spleen, kidney, heart, gill, intestine and liver. A quantitative real time PCR assay was employed to monitor expression of IRF-3, type I interferon (IFN) and Mx in flounder head kidney and gill. All three genes were up-regulated by polyinosinic:polycytidylic acid (polyI:C) and lymphocystis disease virus (LCDV) with an earlier but slight and less persistent increase in transcription levels seen for the IRF-3. Finally, flounder IRF-3 was proved to induce fish type I IFN promoter in FG9307 cells, a flounder gill cell line, by a luciferase assay. These results provide insights into the roles of fish IRF-3 in the antiviral immunity. Copyright © 2010 Elsevier Ltd. All rights reserved.
Anomaly Detection in Large Sets of High-Dimensional Symbol Sequences
NASA Technical Reports Server (NTRS)
Budalakoti, Suratna; Srivastava, Ashok N.; Akella, Ram; Turkov, Eugene
2006-01-01
This paper addresses the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. The approach taken uses unsupervised clustering of sequences using the normalized longest common subsequence (LCS) as a similarity measure, followed by detailed analysis of outliers to detect anomalies. As the LCS measure is expensive to compute, the first part of the paper discusses existing algorithms, such as the Hunt-Szymanski algorithm, that have low time-complexity. We then discuss why these algorithms often do not work well in practice and present a new hybrid algorithm for computing the LCS that, in our tests, outperforms the Hunt-Szymanski algorithm by a factor of five. The second part of the paper presents new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence was deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence, compared to more normal sequences. The algorithms we present are general and domain-independent, so we discuss applications in related areas such as anomaly detection.
Miller, Thomas F.
2017-01-01
We present a coarse-grained simulation model that is capable of simulating the minute-timescale dynamics of protein translocation and membrane integration via the Sec translocon, while retaining sufficient chemical and structural detail to capture many of the sequence-specific interactions that drive these processes. The model includes accurate geometric representations of the ribosome and Sec translocon, obtained directly from experimental structures, and interactions parameterized from nearly 200 μs of residue-based coarse-grained molecular dynamics simulations. A protocol for mapping amino-acid sequences to coarse-grained beads enables the direct simulation of trajectories for the co-translational insertion of arbitrary polypeptide sequences into the Sec translocon. The model reproduces experimentally observed features of membrane protein integration, including the efficiency with which polypeptide domains integrate into the membrane, the variation in integration efficiency upon single amino-acid mutations, and the orientation of transmembrane domains. The central advantage of the model is that it connects sequence-level protein features to biological observables and timescales, enabling direct simulation for the mechanistic analysis of co-translational integration and for the engineering of membrane proteins with enhanced membrane integration efficiency. PMID:28328943
Coiled-coil length: Size does matter.
Surkont, Jaroslaw; Diekmann, Yoan; Ryder, Pearl V; Pereira-Leal, Jose B
2015-12-01
Protein evolution is governed by processes that alter primary sequence but also the length of proteins. Protein length may change in different ways, but insertions, deletions and duplications are the most common. An optimal protein size is a trade-off between sequence extension, which may change protein stability or lead to acquisition of a new function, and shrinkage that decreases metabolic cost of protein synthesis. Despite the general tendency for length conservation across orthologous proteins, the propensity to accept insertions and deletions is heterogeneous along the sequence. For example, protein regions rich in repetitive peptide motifs are well known to extensively vary their length across species. Here, we analyze length conservation of coiled-coils, domains formed by an ubiquitous, repetitive peptide motif present in all domains of life, that frequently plays a structural role in the cell. We observed that, despite the repetitive nature, the length of coiled-coil domains is generally highly conserved throughout the tree of life, even when the remaining parts of the protein change, including globular domains. Length conservation is independent of primary amino acid sequence variation, and represents a conservation of domain physical size. This suggests that the conservation of domain size is due to functional constraints. © 2015 Wiley Periodicals, Inc.
Effect of the SH3-SH2 domain linker sequence on the structure of Hck kinase.
Meiselbach, Heike; Sticht, Heinrich
2011-08-01
The coordination of activity in biological systems requires the existence of different signal transduction pathways that interact with one another and must be precisely regulated. The Src-family tyrosine kinases, which are found in many signaling pathways, differ in their physiological function despite their high overall structural similarity. In this context, the differences in the SH3-SH2 domain linkers might play a role for differential regulation, but the structural consequences of linker sequence remain poorly understood. We have therefore performed comparative molecular dynamics simulations of wildtype Hck and of a mutant Hck in which the SH3-SH2 domain linker is replaced by the corresponding sequence from the homologous kinase Lck. These simulations reveal that linker replacement not only affects the orientation of the SH3 domain itself, but also leads to an alternative conformation of the activation segment in the Hck kinase domain. The sequence of the SH3-SH2 domain linker thus exerts a remote effect on the active site geometry and might therefore play a role in modulating the structure of the inactive kinase or in fine-tuning the activation process itself.
Yang, Chi; Ma, Lu; Ying, Zhenghe; Jiang, Xiaoling; Lin, Yanquan
2017-04-01
Light is a necessary environmental factor for fruit body formation and development of the cauliflower mushroom Sparassis latifolia, a well-known edible and medicinal fungus. In this study, we firstly characterized the SP-C strain, which belonged to S. latifolia. And then we cloned and sequenced a photoreceptor gene (Slwc-1) from S. latifolia. The product of Slwc-1, SlWC-1 (872 aa residues) contained a coiled-coil region, a LOV domain, and two PAS domains. Phylogenetic tree result showed that SLWC-1 was most close to GfWC-1 from Grifola frondosa in edible and medicinal fungus. The Slwc-1 gene was found to be enhanced by light. This report will help to open the still-unexplored field of fruit body development for this fungus.
Ape parasite origins of human malaria virulence genes
Larremore, Daniel B.; Sundararaman, Sesh A.; Liu, Weimin; Proto, William R.; Clauset, Aaron; Loy, Dorothy E.; Speede, Sheri; Plenderleith, Lindsey J.; Sharp, Paul M.; Hahn, Beatrice H.; Rayner, Julian C.; Buckee, Caroline O.
2015-01-01
Antigens encoded by the var gene family are major virulence factors of the human malaria parasite Plasmodium falciparum, exhibiting enormous intra- and interstrain diversity. Here we use network analysis to show that var architecture and mosaicism are conserved at multiple levels across the Laverania subgenus, based on var-like sequences from eight single-species and three multi-species Plasmodium infections of wild-living or sanctuary African apes. Using select whole-genome amplification, we also find evidence of multi-domain var structure and synteny in Plasmodium gaboni, one of the ape Laverania species most distantly related to P. falciparum, as well as a new class of Duffy-binding-like domains. These findings indicate that the modular genetic architecture and sequence diversity underlying var-mediated host-parasite interactions evolved before the radiation of the Laverania subgenus, long before the emergence of P. falciparum. PMID:26456841
The LANL hemorrhagic fever virus database, a new platform for analyzing biothreat viruses.
Kuiken, Carla; Thurmond, Jim; Dimitrijevic, Mira; Yoon, Hyejin
2012-01-01
Hemorrhagic fever viruses (HFVs) are a diverse set of over 80 viral species, found in 10 different genera comprising five different families: arena-, bunya-, flavi-, filo- and togaviridae. All these viruses are highly variable and evolve rapidly, making them elusive targets for the immune system and for vaccine and drug design. About 55,000 HFV sequences exist in the public domain today. A central website that provides annotated sequences and analysis tools will be helpful to HFV researchers worldwide. The HFV sequence database collects and stores sequence data and provides a user-friendly search interface and a large number of sequence analysis tools, following the model of the highly regarded and widely used Los Alamos HIV database [Kuiken, C., B. Korber, and R.W. Shafer, HIV sequence databases. AIDS Rev, 2003. 5: p. 52-61]. The database uses an algorithm that aligns each sequence to a species-wide reference sequence. The NCBI RefSeq database [Sayers et al. (2011) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 39, D38-D51.] is used for this; if a reference sequence is not available, a Blast search finds the best candidate. Using this method, sequences in each genus can be retrieved pre-aligned. The HFV website can be accessed via http://hfv.lanl.gov.
Identification of species by multiplex analysis of variable-length sequences
Pereira, Filipe; Carneiro, João; Matthiesen, Rune; van Asch, Barbara; Pinto, Nádia; Gusmão, Leonor; Amorim, António
2010-01-01
The quest for a universal and efficient method of identifying species has been a longstanding challenge in biology. Here, we show that accurate identification of species in all domains of life can be accomplished by multiplex analysis of variable-length sequences containing multiple insertion/deletion variants. The new method, called SPInDel, is able to discriminate 93.3% of eukaryotic species from 18 taxonomic groups. We also demonstrate that the identification of prokaryotic and viral species with numeric profiles of fragment lengths is generally straightforward. A computational platform is presented to facilitate the planning of projects and includes a large data set with nearly 1800 numeric profiles for species in all domains of life (1556 for eukaryotes, 105 for prokaryotes and 130 for viruses). Finally, a SPInDel profiling kit for discrimination of 10 mammalian species was successfully validated on highly processed food products with species mixtures and proved to be easily adaptable to multiple screening procedures routinely used in molecular biology laboratories. These results suggest that SPInDel is a reliable and cost-effective method for broad-spectrum species identification that is appropriate for use in suboptimal samples and is amenable to different high-throughput genotyping platforms without the need for DNA sequencing. PMID:20923781
Identification of the chitinase genes from the diamondback moth, Plutella xylostella.
Liao, Z H; Kuo, T C; Kao, C H; Chou, T M; Kao, Y H; Huang, R N
2016-12-01
Chitinases have an indispensable function in chitin metabolism and are well characterized in numerous insect species. Although the diamondback moth (DBM) Plutella xylostella, which has a high reproductive potential, short generation time, and characteristic adaptation to adverse environments, has become one of the most serious pests of cruciferous plants worldwide, the information on the chitinases of the moth is presently limited. In the present study, using degenerated polymerase chain reaction (PCR) and rapid amplification of cDNA ends-PCR strategies, four chitinase genes of P. xylostella were cloned, and an exhaustive search was conducted for chitinase-like sequences from the P. xylostella genome and transcriptomic database. Based on the domain analysis of the deduced amino acid sequences and the phylogenetic analysis of the catalytic domain sequences, we identified 15 chitinase genes from P. xylostella. Two of the gut-specific chitinases did not cluster with any of the known phylogenetic groups of chitinases and might be in a new group of the chitinase family. Moreover, in our study, group VIII chitinase was not identified. The structures, classifications and expression patterns of the chitinases of P. xylostella were further delineated, and with this information, further investigations on the functions of chitinase genes in DBM could be facilitated.
Polanco, Carlos; Samaniego Mendoza, José Lino; Buhse, Thomas; Uversky, Vladimir N; Bañuelos Chao, Ingrid Paola; Bañuelos Cedano, Marcela Angola; Tavera, Fernando Michel; Tavera, Daniel Michel; Falconi, Manuel; Ponce de León, Abelardo Vela
2018-03-06
The number of fatalities and economic losses caused by the Ebola virus infection across the planet culminated in the havoc that occurred between August and November 2014. However, little is known about the molecular protein profile of this devastating virus. This work represents a thorough bioinformatics analysis of the regularities of charge distribution (polar profiles) in two groups of proteins and their functional domains associated with Ebola virus disease: Ebola virus proteins and Human proteins interacting with Ebola virus. Our analysis reveals that a fragment exists in each of these proteins-one named the "functional domain"-with the polar profile similar to the polar profile of the protein that contains it. Each protein is formed by a group of short sub-sequences, where each fragment has a different and distinctive polar profile and where the polar profile between adjacent short sub-sequences changes orderly and gradually to coincide with the polar profile of the whole protein. When using the charge distribution as a metric, it was observed that it effectively discriminates the proteins from their functional domains. As a counterexample, the same test was applied to a set of synthetic proteins built for that purpose, revealing that any of the regularities reported here for the Ebola virus proteins and human proteins interacting with Ebola virus were not present in the synthetic proteins. Our results indicate that the polar profile of each protein studied and its corresponding functional domain are similar. Thus, when building each protein from its functional domai-adding one amino acid at a time and plotting each time its polar profile-it was observed that the resulting graphs can be divided into groups with similar polar profiles.
Recognition of p63 by the E3 ligase ITCH: Effect of an ectodermal dysplasia mutant.
Bellomaria, A; Barbato, Gaetano; Melino, G; Paci, M; Melino, Sonia
2010-09-15
The E3 ubiquitin ligase Itch mediates the degradation of the p63 protein. Itch contains four WW domains which are pivotal for the substrate recognition process. Indeed, this domain is implicated in several signalling complexes crucially involved in human diseases including Muscular Dystrophy, Alzheimer's Disease and Huntington Disease. WW domains are highly compact protein-protein binding modules that interact with short proline-rich sequences. The four WW domains present in Itch belong to the Group I type, which binds polypeptides with a PY motif characterized by a PP xY consensus sequence, where x can be any residue. Accordingly, the Itch-p63 interaction results from a direct binding of Itch-WW2 domain with the PY motif of p63. Here, we report a structural analysis of the Itch-p63 interaction by fluorescence, CD and NMR spectroscopy. Indeed, we studied the in vitro interaction between Itch-WW2 domain and p63(534-551), an 18-mer peptide encompassing a fragment of the p63 protein including the PY motif. In addition, we evaluated the conformation and the interaction with Itch-WW2 of a site specific mutant of p63, I549T, that has been reported in both Hay-Wells syndrome and Rapp-Hodgkin syndrome. Based on our results, we propose an extended PP xY motif for the Itch recognition motif (P-P-P-Y-x(4)-[ST]-[ILV]), which includes these C-terminal residues to the PP xY motif.
cyclostratigraphy, sequence stratigraphy and organic matter accumulation mechanism
NASA Astrophysics Data System (ADS)
Cong, F.; Li, J.
2016-12-01
The first member of Maokou Formation of Sichuan basin is composed of well preserved carbonate ramp couplets of limestone and marlstone/shale. It acts as one of the potential shale gas source rock, and is suitable for time-series analysis. We conducted time-series analysis to identify high-frequency sequences, reconstruct high-resolution sedimentation rate, estimate detailed primary productivity for the first time in the study intervals and discuss organic matter accumulation mechanism of source rock under sequence stratigraphic framework.Using the theory of cyclostratigraphy and sequence stratigraphy, the high-frequency sequences of one outcrop profile and one drilling well are identified. Two third-order sequences and eight fourth-order sequences are distinguished on outcrop profile based on the cycle stacking patterns. For drilling well, sequence boundary and four system tracts is distinguished by "integrated prediction error filter analysis" (INPEFA) of Gamma-ray logging data, and eight fourth-order sequences is identified by 405ka long eccentricity curve in depth domain which is quantified and filtered by integrated analysis of MTM spectral analysis, evolutive harmonic analysis (EHA), evolutive average spectral misfit (eASM) and band-pass filtering. It suggests that high-frequency sequences correlate well with Milankovitch orbital signals recorded in sediments, and it is applicable to use cyclostratigraphy theory in dividing high-frequency(4-6 orders) sequence stratigraphy.High-resolution sedimentation rate is reconstructed through the study interval by tracking the highly statistically significant short eccentricity component (123ka) revealed by EHA. Based on sedimentation rate, measured TOC and density data, the burial flux, delivery flux and primary productivity of organic carbon was estimated. By integrating redox proxies, we can discuss the controls on organic matter accumulation by primary production and preservation under the high-resolution sequence stratigraphic framework. Results show that high average organic carbon contents in the study interval are mainly attributed to high primary production. The results also show a good correlation between high organic carbon accumulation and intervals of transgression.
Li, Chibo; Ding, Xi-Qin; O’Brien, John; Al-Ubaidi, Muayyad R.
2010-01-01
PURPOSE A great deal of information about functionally significant domains of a protein may be obtained by comparison of primary sequences of gene homologues over a broad phylogenetic base. This study was designed to identify evolutionarily conserved domains of the photoreceptor disc membrane protein peripherin/rds by analysis of the homologue in a primitive vertebrate, the skate. METHODS A skate retinal cDNA library was screened using a mouse peripherin/rds clone. The 5′ and 3′ untranslated regions of the skate peripherin/rds (srds) cDNA were isolated by the rapid amplification of cDNA ends (RACE) approach. The gene structure was characterized by PCR amplification and sequencing of genomic fragments. Northern and Western blot analyses were used to identify srds transcript and protein, respectively. RESULTS A new homologue of peripherin/rds was identified from the skate retinal cDNA library. SRDS is a glycoprotein with a predicted molecular mass of 40.2 kDa. The srds gene consists of two exons and one small intron and transcribes into a single 6-kb message. Phylogenetic analysis places SRDS at the base of peripherin/rds family and near the division of that group and the branch leading to rds-like and rom-1 genes. SRDS protein is 54.5% identical with peripherin/rds across species. Identity is significantly higher (73%) in the intradiscal domains. Sequence comparison revealed the conservation of all residues that have been shown, on mutation, to associate with retinitis pigmentosa and showed conservation of most residues associated with macular dystrophies. Comparison with ROM-1 and other rds-like proteins revealed the presence of a highly conserved domain in the large intradiscal loop. CONCLUSIONS Srds represents the skate orthologue of mammalian peripherin/rds genes. Conservation of most of the residues associated with human retinal diseases indicates that these residues serve important functional roles. The high degree of conservation of a short stretch within the large intradiscal loop also suggests an important function for this domain. PMID:12766040
Odronitz, Florian; Kollmar, Martin
2006-01-01
Background Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Description Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. Conclusion We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein. PMID:17134497
Jiang, Yi-Fan; Chou, Chung-Hsi; Lin, En-Chung; Chiu, Chih-Hsien
2011-02-01
Hypoxia-inducible factor 1 (HIF-1) is a transcription factor that senses and adapts cells to hypoxic environmental conditions. HIF-1 is composed of an oxygen-regulated α subunit (HIF-1α) and a constitutively expressed β subunit (HIF-1β). Taiwan voles (Microtus kikuchii) are an endemic species in Taiwan, found only in mountainous areas greater than 2000m above sea level. In this study, the full-length HIF-1α cDNA was cloned and sequenced from liver tissues of Taiwan voles. We found that HIF-1α of Taiwan voles had high sequence similarity to HIF-1α of other species. Sequence alignment of HIF-1α functional domains indicated basic helix-loop-helix (bHLH), PER-ARNT-SIM (PAS) and C-terminal transactivation (TAD-C) domains were conserved among species, but sequence variations were found between the oxygen-dependent degradation domains (ODDD). To measure Taiwan vole HIF-1α responses to hypoxia, animals were challenged with cobalt chloride, and HIF-1α mRNA and protein expression in brain, lung, heart, liver, kidney, and muscle was assessed by quantitative RT-PCR and Western blot analysis. Upon induction of hypoxic stress with cobalt chloride, an increase in HIF-1α mRNA levels was detected in lung, heart, kidney, and muscle tissue. In contrast, protein expression levels showed greater variation between individual animals. These results suggest that the regulation of HIF-1α may be important to the Taiwan vole under cobalt chloride treatments. But more details regarding the evolutionary effect of environmental pressure on HIF-1α primary sequence, HIF-1α function and regulation in Taiwan voles remain to be identified. Copyright © 2010 Elsevier Inc. All rights reserved.
Linke, Christian; Siemens, Nikolai; Middleditch, Martin J.; Kreikemeyer, Bernd; Baker, Edward N.
2012-01-01
The extracellular protein Epf from Streptococcus pyogenes is important for streptococcal adhesion to human epithelial cells. However, Epf has no sequence identity to any protein of known structure or function. Thus, several predicted domains of the 205 kDa protein Epf were cloned separately and expressed in Escherichia coli. The N-terminal domain of Epf was crystallized in space groups P21 and P212121 in the presence of the protease chymotrypsin. Mass spectrometry showed that the species crystallized corresponded to a fragment comprising residues 52–357 of Epf. Complete data sets were collected to 2.0 and 1.6 Å resolution, respectively, at the Australian Synchrotron. PMID:22750867
Genome-wide analysis of putative peroxiredoxin in unicellular and filamentous cyanobacteria.
Cui, Hongli; Wang, Yipeng; Wang, Yinchu; Qin, Song
2012-11-16
Cyanobacteria are photoautotrophic prokaryotes with wide variations in genome sizes and ecological habitats. Peroxiredoxin (PRX) is an important protein that plays essential roles in protecting own cells against reactive oxygen species (ROS). PRXs have been identified from mammals, fungi and higher plants. However, knowledge on cyanobacterial PRXs still remains obscure. With the availability of 37 sequenced cyanobacterial genomes, we performed a comprehensive comparative analysis of PRXs and explored their diversity, distribution, domain structure and evolution. Overall 244 putative prx genes were identified, which were abundant in filamentous diazotrophic cyanobacteria, Acaryochloris marina MBIC 11017, and unicellular cyanobacteria inhabiting freshwater and hot-springs, while poor in all Prochlorococcus and marine Synechococcus strains. Among these putative genes, 25 open reading frames (ORFs) encoding hypothetical proteins were identified as prx gene family members and the others were already annotated as prx genes. All 244 putative PRXs were classified into five major subfamilies (1-Cys, 2-Cys, BCP, PRX5_like, and PRX-like) according to their domain structures. The catalytic motifs of the cyanobacterial PRXs were similar to those of eukaryotic PRXs and highly conserved in all but the PRX-like subfamily. Classical motif (CXXC) of thioredoxin was detected in protein sequences from the PRX-like subfamily. Phylogenetic tree constructed of catalytic domains coincided well with the domain structures of PRXs and the phylogenies based on 16s rRNA. The distribution of genes encoding PRXs in different unicellular and filamentous cyanobacteria especially those sub-families like PRX-like or 1-Cys PRX correlate with the genome size, eco-physiology, and physiological properties of the organisms. Cyanobacterial and eukaryotic PRXs share similar conserved motifs, indicating that cyanobacteria adopt similar catalytic mechanisms as eukaryotes. All cyanobacterial PRX proteins share highly similar structures, implying that these genes may originate from a common ancestor. In this study, a general framework of the sequence-structure-function connections of the PRXs was revealed, which may facilitate functional investigations of PRXs in various organisms.
Genome-wide analysis of putative peroxiredoxin in unicellular and filamentous cyanobacteria
2012-01-01
Background Cyanobacteria are photoautotrophic prokaryotes with wide variations in genome sizes and ecological habitats. Peroxiredoxin (PRX) is an important protein that plays essential roles in protecting own cells against reactive oxygen species (ROS). PRXs have been identified from mammals, fungi and higher plants. However, knowledge on cyanobacterial PRXs still remains obscure. With the availability of 37 sequenced cyanobacterial genomes, we performed a comprehensive comparative analysis of PRXs and explored their diversity, distribution, domain structure and evolution. Results Overall 244 putative prx genes were identified, which were abundant in filamentous diazotrophic cyanobacteria, Acaryochloris marina MBIC 11017, and unicellular cyanobacteria inhabiting freshwater and hot-springs, while poor in all Prochlorococcus and marine Synechococcus strains. Among these putative genes, 25 open reading frames (ORFs) encoding hypothetical proteins were identified as prx gene family members and the others were already annotated as prx genes. All 244 putative PRXs were classified into five major subfamilies (1-Cys, 2-Cys, BCP, PRX5_like, and PRX-like) according to their domain structures. The catalytic motifs of the cyanobacterial PRXs were similar to those of eukaryotic PRXs and highly conserved in all but the PRX-like subfamily. Classical motif (CXXC) of thioredoxin was detected in protein sequences from the PRX-like subfamily. Phylogenetic tree constructed of catalytic domains coincided well with the domain structures of PRXs and the phylogenies based on 16s rRNA. Conclusions The distribution of genes encoding PRXs in different unicellular and filamentous cyanobacteria especially those sub-families like PRX-like or 1-Cys PRX correlate with the genome size, eco-physiology, and physiological properties of the organisms. Cyanobacterial and eukaryotic PRXs share similar conserved motifs, indicating that cyanobacteria adopt similar catalytic mechanisms as eukaryotes. All cyanobacterial PRX proteins share highly similar structures, implying that these genes may originate from a common ancestor. In this study, a general framework of the sequence-structure-function connections of the PRXs was revealed, which may facilitate functional investigations of PRXs in various organisms. PMID:23157370
TIR-NBS-LRR genes are rare in monocots: evidence from diverse monocot orders
Tarr, D Ellen K; Alexander, Helen M
2009-01-01
Background Plant resistance (R) gene products recognize pathogen effector molecules. Many R genes code for proteins containing nucleotide binding site (NBS) and C-terminal leucine-rich repeat (LRR) domains. NBS-LRR proteins can be divided into two groups, TIR-NBS-LRR and non-TIR-NBS-LRR, based on the structure of the N-terminal domain. Although both classes are clearly present in gymnosperms and eudicots, only non-TIR sequences have been found consistently in monocots. Since most studies in monocots have been limited to agriculturally important grasses, it is difficult to draw conclusions. The purpose of our study was to look for evidence of these sequences in additional monocot orders. Findings Using degenerate PCR, we amplified NBS sequences from four monocot species (C. blanda, D. marginata, S. trifasciata, and Spathiphyllum sp.), a gymnosperm (C. revoluta) and a eudicot (C. canephora). We successfully amplified TIR-NBS-LRR sequences from dicot and gymnosperm DNA, but not from monocot DNA. Using databases, we obtained NBS sequences from additional monocots, magnoliids and basal angiosperms. TIR-type sequences were not present in monocot or magnoliid sequences, but were present in the basal angiosperms. Phylogenetic analysis supported a single TIR clade and multiple non-TIR clades. Conclusion We were unable to find monocot TIR-NBS-LRR sequences by PCR amplification or database searches. In contrast to previous studies, our results represent five monocot orders (Poales, Zingiberales, Arecales, Asparagales, and Alismatales). Our results establish the presence of TIR-NBS-LRR sequences in basal angiosperms and suggest that although these sequences were present in early land plants, they have been reduced significantly in monocots and magnoliids. PMID:19785756
Emergence of novel domains in proteins
2013-01-01
Background Proteins are composed of a combination of discrete, well-defined, sequence domains, associated with specific functions that have arisen at different times during evolutionary history. The emergence of novel domains is related to protein functional diversification and adaptation. But currently little is known about how novel domains arise and how they subsequently evolve. Results To gain insights into the impact of recently emerged domains in protein evolution we have identified all human young protein domains that have emerged in approximately the past 550 million years. We have classified them into vertebrate-specific and mammalian-specific groups, and compared them to older domains. We have found 426 different annotated young domains, totalling 995 domain occurrences, which represent about 12.3% of all human domains. We have observed that 61.3% of them arose in newly formed genes, while the remaining 38.7% are found combined with older domains, and have very likely emerged in the context of a previously existing protein. Young domains are preferentially located at the N-terminus of the protein, indicating that, at least in vertebrates, novel functional sequences often emerge there. Furthermore, young domains show significantly higher non-synonymous to synonymous substitution rates than older domains using human and mouse orthologous sequence comparisons. This is also true when we compare young and old domains located in the same protein, suggesting that recently arisen domains tend to evolve in a less constrained manner than older domains. Conclusions We conclude that proteins tend to gain domains over time, becoming progressively longer. We show that many proteins are made of domains of different age, and that the fastest evolving parts correspond to the domains that have been acquired more recently. PMID:23425224
Taylor, William R; Stoye, Jonathan P; Taylor, Ian A
2017-04-04
The Spumaretrovirinae (foamy viruses) and the Orthoretrovirinae (e.g. HIV) share many similarities both in genome structure and the sequences of the core viral encoded proteins, such as the aspartyl protease and reverse transcriptase. Similarity in the gag region of the genome is less obvious at the sequence level but has been illuminated by the recent solution of the foamy virus capsid (CA) structure. This revealed a clear structural similarity to the orthoretrovirus capsids but with marked differences that left uncertainty in the relationship between the two domains that comprise the structure. We have applied protein structure comparison methods in order to try and resolve this ambiguous relationship. These included both the DALI method and the SAP method, with rigorous statistical tests applied to the results of both methods. For this, we employed collections of artificial fold 'decoys' (generated from the pair of native structures being compared) to provide a customised background distribution for each comparison, thus allowing significance levels to be estimated. We have shown that the relationship of the two domains conforms to a simple linear correspondence rather than a domain transposition. These similarities suggest that the origin of both viral capsids was a common ancestor with a double domain structure. In addition, we show that there is also a significant structural similarity between the amino and carboxy domains in both the foamy and ortho viruses. These results indicate that, as well as the duplication of the double domain capsid, there may have been an even more ancient gene-duplication that preceded the double domain structure. In addition, our structure comparison methodology demonstrates a general approach to problems where the components have a high intrinsic level of similarity.
Klippel, Stefan; Wieczorek, Marek; Schümann, Michael; Krause, Eberhard; Marg, Berenice; Seidel, Thorsten; Meyer, Tim; Knapp, Ernst-Walter; Freund, Christian
2011-11-04
The high abundance of repetitive but nonidentical proline-rich sequences in spliceosomal proteins raises the question of how these known interaction motifs recruit their interacting protein domains. Whereas complex formation of these adaptors with individual motifs has been studied in great detail, little is known about the binding mode of domains arranged in tandem repeats and long proline-rich sequences including multiple motifs. Here we studied the interaction of the two adjacent WW domains of spliceosomal protein FBP21 with several ligands of different lengths and composition to elucidate the hallmarks of multivalent binding for this class of recognition domains. First, we show that many of the proteins that define the cellular proteome interacting with FBP21-WW1-WW2 contain multiple proline-rich motifs. Among these is the newly identified binding partner SF3B4. Fluorescence resonance energy transfer (FRET) analysis reveals the tandem-WW domains of FBP21 to interact with splicing factor 3B4 (SF3B4) in nuclear speckles where splicing takes place. Isothermal titration calorimetry and NMR shows that the tandem arrangement of WW domains and the multivalency of the proline-rich ligands both contribute to affinity enhancement. However, ligand exchange remains fast compared with the NMR time scale. Surprisingly, a N-terminal spin label attached to a bivalent ligand induces NMR line broadening of signals corresponding to both WW domains of the FBP21-WW1-WW2 protein. This suggests that distinct orientations of the ligand contribute to a delocalized and semispecific binding mode that should facilitate search processes within the spliceosome.
Klippel, Stefan; Wieczorek, Marek; Schümann, Michael; Krause, Eberhard; Marg, Berenice; Seidel, Thorsten; Meyer, Tim; Knapp, Ernst-Walter; Freund, Christian
2011-01-01
The high abundance of repetitive but nonidentical proline-rich sequences in spliceosomal proteins raises the question of how these known interaction motifs recruit their interacting protein domains. Whereas complex formation of these adaptors with individual motifs has been studied in great detail, little is known about the binding mode of domains arranged in tandem repeats and long proline-rich sequences including multiple motifs. Here we studied the interaction of the two adjacent WW domains of spliceosomal protein FBP21 with several ligands of different lengths and composition to elucidate the hallmarks of multivalent binding for this class of recognition domains. First, we show that many of the proteins that define the cellular proteome interacting with FBP21-WW1-WW2 contain multiple proline-rich motifs. Among these is the newly identified binding partner SF3B4. Fluorescence resonance energy transfer (FRET) analysis reveals the tandem-WW domains of FBP21 to interact with splicing factor 3B4 (SF3B4) in nuclear speckles where splicing takes place. Isothermal titration calorimetry and NMR shows that the tandem arrangement of WW domains and the multivalency of the proline-rich ligands both contribute to affinity enhancement. However, ligand exchange remains fast compared with the NMR time scale. Surprisingly, a N-terminal spin label attached to a bivalent ligand induces NMR line broadening of signals corresponding to both WW domains of the FBP21-WW1-WW2 protein. This suggests that distinct orientations of the ligand contribute to a delocalized and semispecific binding mode that should facilitate search processes within the spliceosome. PMID:21917930
The origin and evolution of Basigin(BSG) gene: A comparative genomic and phylogenetic analysis.
Zhu, Xinyan; Wang, Shenglan; Shao, Mingjie; Yan, Jie; Liu, Fei
2017-07-01
Basigin (BSG), also known as extracellular matrix metalloproteinase inducer (EMMPRIN) or cluster of differentiation 147 (CD147), plays various fundamental roles in the intercellular recognition involved in immunologic phenomena, differentiation, and development. In this study, we aimed to compare the similarities and differences of BSG among organisms and explore possible evolutionary relationships based on the comparison result. We used the extensive BLAST tool to search the metazoan genomes, N-glycosylation sites, the transmembrane region and other functional sites. We then identified BSG homologs from genomic sequences and analyzed their phylogenetic relationships. We identified that BSG genes exist not only in the vertebrate metazoans but also in the invertebrate metazoans such as Amphioxus B. floridae, D. melanogaster, A. mellifera, S. japonicum, C. gigas, and T. patagoniensis. After sequence analysis, we confirmed that only vertebrate metazoans and Cephalochordate (amphioxus B. floridae) have the classic structure (a signal peptide, two Ig-like domains (IgC2 and IgI), a transmembrane region, and an intracellular domain). The invertebrate metazoans (excluding amphioxus B. floridae) lack the N-terminal signal peptides and IgC2 domain. We then generated a phylogenetic tree, genome organization comparison, and chromosomal disposition analysis based on the biological information obtained from the NCBI and Ensembl databases. Finally, we established the possible evolutionary scenario of the BSG gene, which showed the restricted exon rearrangement that has occurred during evolution, forming the present-day BSG gene. Copyright © 2017 Elsevier Ltd. All rights reserved.
Prevalence of the F-type lectin domain.
Bishnoi, Ritika; Khatri, Indu; Subramanian, Srikrishna; Ramya, T N C
2015-08-01
F-type lectins are fucolectins with characteristic fucose and calcium-binding sequence motifs and a unique lectin fold (the "F-type" fold). F-type lectins are phylogenetically widespread with selective distribution. Several eukaryotic F-type lectins have been biochemically and structurally characterized, and the F-type lectin domain (FLD) has also been studied in the bacterial proteins, Streptococcus mitis lectinolysin and Streptococcus pneumoniae SP2159. However, there is little knowledge about the extent of occurrence of FLDs and their domain organization, especially, in bacteria. We have now mined the extensive genomic sequence information available in the public databases with sensitive sequence search techniques in order to exhaustively survey prokaryotic and eukaryotic FLDs. We report 437 FLD sequence clusters (clustered at 80% sequence identity) from eukaryotic, eubacterial and viral proteins. Domain architectures are diverse but mostly conserved in closely related organisms, and domain organizations of bacterial FLD-containing proteins are very different from their eukaryotic counterparts, suggesting unique specialization of FLDs to suit different requirements. Several atypical phylogenetic associations hint at lateral transfer. Among eukaryotes, we observe an expansion of FLDs in terms of occurrence and domain organization diversity in the taxa Mollusca, Hemichordata and Branchiostomi, perhaps coinciding with greater emphasis on innate immune strategies in these organisms. The naturally occurring FLDs with diverse domain organizations that we have identified here will be useful for future studies aimed at creating designer molecular platforms for directing desired biological activities to fucosylated glycoconjugates in target niches. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
[Cloning and functional characterization of phytoene desaturase in Andrographis paniculata].
Shen, Qin-qin; Li, Li-xia; Zhan, Peng-lin; Wang, Qiang
2015-10-01
A full-length cDNA of phytoene desaturase (PDS) gene from Andrographis paniculata was obtained through RACE-PCR. The cDNA sequence consists of 2 224 bp with an intact ORF of 1 752 bp (GeneBank: KP982892), encoding a ploypeptide of 584 amino acids. Homology analysis showed that the deduced protein has extensive sequence similarities to PDS from other plants, and contains a conserved NAD ( H) -binding domain of plant dehydrase cofactor binding-domain in N-terminal. Phylogenetic analysis demonstrated that ApPDS was more related to PDS of Sesamum indicum and Pogostemon cablin. The semi-quantitative RT-PCR analysis revealed that ApPDS expressed in whole aboveground tissues with the highest expression in leaves. Virus induced gene silencing (VIGS) was performed to characterize the functional of ApPDS in planta. Significant photobleaching was not observed in infiltrated leaves, while the PDS gene has been down-regulated significantly at the yellowish area. To the best of our knowledge, this represents the first report of PDS gene cloning and functional characterization from A. paniculata, which lays the foundation for further investigation of new genes, especially that correlative to andrographolide biosynthetic pathway.
NASA Astrophysics Data System (ADS)
Yang, Kun; Rong, Wei; Qi, Lin; Li, Jiarui; Wei, Xuening; Zhang, Zengyan
2013-10-01
Cysteine-rich receptor kinases (CRKs) belong to the receptor-like kinase family. Little is known about CRK genes in wheat. We isolated a wheat CRK gene TaCRK1 from Rhizoctonia cerealis-resistant wheat CI12633 based on a differentially expressed sequence identified by RNA-Sequencing (RNA-Seq) analysis. TaCRK1 was more highly expressed in CI12633 than in susceptible Wenmai 6. Transcription of TaCRK1 in wheat was induced in CI12633 after R. cerealis infection and exogenous abscisic acid (ABA) treatment. The deduced TaCRK1 protein contained a signal peptide, two DUF26 domains, a transmembrane domain, and a serine/threonine protein kinase domain. Transient expression of a green fluorescence protein fused with TaCRK1 in wheat and onion indicated that TaCRK1 may localize to plasma membranes. Characterization of TaCRK1 silencing induced by virus-mediated method in CI12633 showed that the downregulation of TaCRK1 transcript did not obviously impair resistance to R. cerealis. This study paves the way to further CRK research in wheat.
Certain topological properties and duals of the domain of a triangle matrix in a sequence space
NASA Astrophysics Data System (ADS)
Altay, Bilâl; Basar, Feyzi
2007-12-01
The matrix domain of the particular limitation methods Cesàro, Riesz, difference, summation and Euler were studied by several authors. In the present paper, certain topological properties and [beta]- and [gamma]-duals of the domain of a triangle matrix in a sequence space have been examined as an application of the characterization of the related matrix classes.
NASA Astrophysics Data System (ADS)
Yamamoto, Tetsuya; Takeda, Kazuki; Adachi, Fumiyuki
Frequency-domain equalization (FDE) based on the minimum mean square error (MMSE) criterion can provide a better bit error rate (BER) performance than rake combining. To further improve the BER performance, cyclic delay transmit diversity (CDTD) can be used. CDTD simultaneously transmits the same signal from different antennas after adding different cyclic delays to increase the number of equivalent propagation paths. Although a joint use of CDTD and MMSE-FDE for direct sequence code division multiple access (DS-CDMA) achieves larger frequency diversity gain, the BER performance improvement is limited by the residual inter-chip interference (ICI) after FDE. In this paper, we propose joint FDE and despreading for DS-CDMA using CDTD. Equalization and despreading are simultaneously performed in the frequency-domain to suppress the residual ICI after FDE. A theoretical conditional BER analysis is presented for the given channel condition. The BER analysis is confirmed by computer simulation.
Supplementary motor area as key structure for domain-general sequence processing: A unified account.
Cona, Giorgia; Semenza, Carlo
2017-01-01
The Supplementary Motor Area (SMA) is considered as an anatomically and functionally heterogeneous region and is implicated in several functions. We propose that SMA plays a crucial role in domain-general sequence processes, contributing to the integration of sequential elements into higher-order representations regardless of the nature of such elements (e.g., motor, temporal, spatial, numerical, linguistic, etc.). This review emphasizes the domain-general involvement of the SMA, as this region has been found to support sequence operations in a variety of cognitive domains that, albeit different, share an inherent sequence processing. These include action, time and spatial processing, numerical cognition, music and language processing, and working memory. In this light, we reviewed and synthesized recent neuroimaging, stimulation and electrophysiological studies in order to compare and reconcile the distinct sources of data by proposing a unifying account for the role of the SMA. We also discussed the differential contribution of the pre-SMA and SMA-proper in sequence operations, and possible neural mechanisms by which such operations are executed. Copyright © 2016 Elsevier Ltd. All rights reserved.
Wld S protein requires Nmnat activity and a short N-terminal sequence to protect axons in mice.
Conforti, Laura; Wilbrey, Anna; Morreale, Giacomo; Janeckova, Lucie; Beirowski, Bogdan; Adalbert, Robert; Mazzola, Francesca; Di Stefano, Michele; Hartley, Robert; Babetto, Elisabetta; Smith, Trevor; Gilley, Jonathan; Billington, Richard A; Genazzani, Armando A; Ribchester, Richard R; Magni, Giulio; Coleman, Michael
2009-02-23
The slow Wallerian degeneration (Wld(S)) protein protects injured axons from degeneration. This unusual chimeric protein fuses a 70-amino acid N-terminal sequence from the Ube4b multiubiquitination factor with the nicotinamide adenine dinucleotide-synthesizing enzyme nicotinamide mononucleotide adenylyl transferase 1. The requirement for these components and the mechanism of Wld(S)-mediated neuroprotection remain highly controversial. The Ube4b domain is necessary for the protective phenotype in mice, but precisely which sequence is essential and why are unclear. Binding to the AAA adenosine triphosphatase valosin-containing protein (VCP)/p97 is the only known biochemical property of the Ube4b domain. Using an in vivo approach, we show that removing the VCP-binding sequence abolishes axon protection. Replacing the Wld(S) VCP-binding domain with an alternative ataxin-3-derived VCP-binding sequence restores its protective function. Enzyme-dead Wld(S) is unable to delay Wallerian degeneration in mice. Thus, neither domain is effective without the function of the other. Wld(S) requires both of its components to protect axons from degeneration.
Distribution and cluster analysis of predicted intrinsically disordered protein Pfam domains
Williams, Robert W; Xue, Bin; Uversky, Vladimir N; Dunker, A Keith
2013-01-01
The Pfam database groups regions of proteins by how well hidden Markov models (HMMs) can be trained to recognize similarities among them. Conservation pressure is probably in play here. The Pfam seed training set includes sequence and structure information, being drawn largely from the PDB. A long standing hypothesis among intrinsically disordered protein (IDP) investigators has held that conservation pressures are also at play in the evolution of different kinds of intrinsic disorder, but we find that predicted intrinsic disorder (PID) is not always conserved across Pfam domains. Here we analyze distributions and clusters of PID regions in 193024 members of the version 23.0 Pfam seed database. To include the maximum information available for proteins that remain unfolded in solution, we employ the 10 linearly independent Kidera factors1–3 for the amino acids, combined with PONDR4 predictions of disorder tendency, to transform the sequences of these Pfam members into an 11 column matrix where the number of rows is the length of each Pfam region. Cluster analyses of the set of all regions, including those that are folded, show 6 groupings of domains. Cluster analyses of domains with mean VSL2b scores greater than 0.5 (half predicted disorder or more) show at least 3 separated groups. It is hypothesized that grouping sets into shorter sequences with more uniform length will reveal more information about intrinsic disorder and lead to more finely structured and perhaps more accurate predictions. HMMs could be trained to include this information. PMID:28516017
The impact of p53 protein core domain structural alteration on ovarian cancer survival.
Rose, Stephen L; Robertson, Andrew D; Goodheart, Michael J; Smith, Brian J; DeYoung, Barry R; Buller, Richard E
2003-09-15
Although survival with a p53 missense mutation is highly variable, p53-null mutation is an independent adverse prognostic factor for advanced stage ovarian cancer. By evaluating ovarian cancer survival based upon a structure function analysis of the p53 protein, we tested the hypothesis that not all missense mutations are equivalent. The p53 gene was sequenced from 267 consecutive ovarian cancers. The effect of individual missense mutations on p53 structure was analyzed using the International Agency for Research on Cancer p53 Mutational Database, which specifies the effects of p53 mutations on p53 core domain structure. Mutations in the p53 core domain were classified as either explained or not explained in structural or functional terms by their predicted effects on protein folding, protein-DNA contacts, or mutation in highly conserved residues. Null mutations were classified by their mechanism of origin. Mutations were sequenced from 125 tumors. Effects of 62 of the 82 missense mutations (76%) could be explained by alterations in the p53 protein. Twenty-three (28%) of the explained mutations occurred in highly conserved regions of the p53 core protein. Twenty-two nonsense point mutations and 21 frameshift null mutations were sequenced. Survival was independent of missense mutation type and mechanism of null mutation. The hypothesis that not all missense mutations are equivalent is, therefore, rejected. Furthermore, p53 core domain structural alteration secondary to missense point mutation is not functionally equivalent to a p53-null mutation. The poor prognosis associated with p53-null mutation is independent of the mutation mechanism.
Structural genomics reveals EVE as a new ASCH/PUA-related domain
Bertonati, Claudia; Punta, Marco; Fischer, Markus; Yachdav, Guy; Forouhar, Farhad; Zhou, Weihong; Kuzin, Alexander P.; Seetharaman, Jayaraman; Abashidze, Mariam; Ramelot, Theresa A.; Kennedy, Michael A.; Cort, John R.; Belachew, Adam; Hunt, John F.; Tong, Liang; Montelione, Gaetano T.; Rost, Burkhard
2014-01-01
Summary We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE. Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links. PMID:19191354
Structural Genomics Reveals EVE as a New ASCH/PUA-Related Domain
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bertonati, C.; Punta, M; Fischer, M
2008-01-01
We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE.more » Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links.« less
A comparison of serial order short-term memory effects across verbal and musical domains.
Gorin, Simon; Mengal, Pierre; Majerus, Steve
2018-04-01
Recent studies suggest that the mechanisms involved in the short-term retention of serial order information may be shared across short-term memory (STM) domains such as verbal and visuospatial STM. Given the intrinsic sequential organization of musical material, the study of STM for musical information may be particularly informative about serial order retention processes and their domain-generality. The present experiment examined serial order STM for verbal and musical sequences in participants with no advanced musical expertise and experienced musicians. Serial order STM for verbal information was assessed via a serial order reconstruction task for digit sequences. In the musical domain, serial order STM was assessed using a novel melodic sequence reconstruction task maximizing the retention of tone order information. We observed that performance for the verbal and musical tasks was characterized by sequence length as well as primacy and recency effects. Serial order errors in both tasks were characterized by similar transposition gradients and ratios of fill-in:infill errors. These effects were observed for both participant groups, although the transposition gradients and ratios of fill-in:infill errors showed additional specificities for musician participants in the musical task. The data support domain-general serial order STM effects but also suggest the existence of additional domain-specific effects. Implications for models of serial order STM in verbal and musical domains are discussed.
Citrus and Prunuscopia-like retrotransposons.
Asíns, M J; Monforte, A J; Mestre, P F; Carbonell, E A
1999-08-01
Many of the world's most important citrus cultivars ("Washington Navel", satsumas, clementines) have arisen through somatic mutation. This phenomenon occurs fairly often in the various species and varieties of the genus.The presence of copia-like retrotransposons has been investigated in fruit trees, especially citrus, by using a PCR assay designed to detect copia-like reverse transcriptase (RT) sequences. Amplification products from a genotype of each the following species Citrus sinensis, Citrus grandis, Citrus clementina, Prunus armeniaca and Prunus amygdalus, were cloned and some of them sequenced. Southern-blot hybridization using RT clones as probes showed that multiple copies are integrated throughout the citrus genome, while only 1-3 copies are detected in the P. armeniaca genome, which is in accordance with the Citrus and Prunus genome sizes. Sequence analysis of RT clones allowed a search for homologous sequences within three gene banks. The most similar ones correspond to RT domains of copia-like retrotransposons from unrelated plant species. Cluster analysis of these sequences has shown a great heterogeneity among RT domains cloned from the same genotype. This finding supports the hypothesis that horizontal transmission of retrotransposons has occurred in the past. The species presenting a RT sequence most similar to citrus RT clones is Gnetum montanum, a gymnosperm whose distribution area coincides with two of the main centers of origin of Citrus spp. A new C-methylated restriction DNA fragment containing a RT sequence is present in navel sweet oranges, but not in Valencia oranges from which the former originated suggesting, that retrotransposon activity might be, at least in part, involved in the genetic variability among sweet orange cultivars. Given that retrotransposons are quite abundant throughout the citrus genome, their activity should be investigated thoroughly before commercializing any transgenic citrus plant where the transgene(s) is part of a viral genome in order to avoid its possible recombination with an active retroelement. Focusing on other strategies to control virus diseases is recommended in citrus.
Newell, Nicholas E
2011-12-15
The extraction of the set of features most relevant to function from classified biological sequence sets is still a challenging problem. A central issue is the determination of expected counts for higher order features so that artifact features may be screened. Cascade detection (CD), a new algorithm for the extraction of localized features from sequence sets, is introduced. CD is a natural extension of the proportional modeling techniques used in contingency table analysis into the domain of feature detection. The algorithm is successfully tested on synthetic data and then applied to feature detection problems from two different domains to demonstrate its broad utility. An analysis of HIV-1 protease specificity reveals patterns of strong first-order features that group hydrophobic residues by side chain geometry and exhibit substantial symmetry about the cleavage site. Higher order results suggest that favorable cooperativity is weak by comparison and broadly distributed, but indicate possible synergies between negative charge and hydrophobicity in the substrate. Structure-function results for the Schellman loop, a helix-capping motif in proteins, contain strong first-order features and also show statistically significant cooperativities that provide new insights into the design of the motif. These include a new 'hydrophobic staple' and multiple amphipathic and electrostatic pair features. CD should prove useful not only for sequence analysis, but also for the detection of multifactor synergies in cross-classified data from clinical studies or other sources. Windows XP/7 application and data files available at: https://sites.google.com/site/cascadedetect/home. nacnewell@comcast.net Supplementary information is available at Bioinformatics online.
Lu, Lunhui; Zhang, Jiachao; Chen, Anwei; Chen, Ming; Jiang, Min; Yuan, Yujie; Wu, Haipeng; Lai, Mingyong; He, Yibin
2014-01-01
Traditional three-domain fungal and bacterial laccases have been extensively studied for their significance in various biotechnological applications. Growing molecular evidence points to a wide occurrence of more recently recognized two-domain laccase-like multicopper oxidase (LMCO) genes in Streptomyces spp. However, the current knowledge about their ecological role and distribution in natural or artificial ecosystems is insufficient. The aim of this study was to investigate the diversity and composition of Streptomyces two-domain LMCO genes in agricultural waste composting, which will contribute to the understanding of the ecological function of Streptomyces two-domain LMCOs with potential extracellular activity and ligninolytic capacity. A new specific PCR primer pair was designed to target the two conserved copper binding regions of Streptomyces two-domain LMCO genes. The obtained sequences mainly clustered with Streptomyces coelicolor, Streptomyces violaceusniger, and Streptomyces griseus. Gene libraries retrieved from six composting samples revealed high diversity and a rapid succession of Streptomyces two-domain LMCO genes during composting. The obtained sequence types cluster in 8 distinct clades, most of which are homologous with Streptomyces two-domain LMCO genes, but the sequences of clades III and VIII do not match with any reference sequence of known streptomycetes. Both lignocellulose degradation rates and phenol oxidase activity at pH 8.0 in the composting process were found to be positively associated with the abundance of Streptomyces two-domain LMCO genes. These observations provide important clues that Streptomyces two-domain LMCOs are potentially involved in bacterial extracellular phenol oxidase activities and lignocellulose breakdown during agricultural waste composting. PMID:24657870
Loconsole, Giuliana; Onelge, Nuket; Yokomi, Raymond K; Kubaa, Raied Abou; Savino, Vito; Saponari, Maria
2013-01-01
The RNA genome of pathogenic and non-pathogenic variants of citrus Hop stunt viroid (HSVd) differ by five to six nucleotides located within the variable (V) domain referred to as the "cachexia expression motif". Sensitive hosts such as mandarin and its hybrids are seriously affected by cachexia disease. Current methods to differentiate HSVd variants rely on lengthy greenhouse biological indexing on Parson's Special mandarin and/or direct nucleotide sequence analysis of amplicons from RT-PCR of HSVd-infected plants. Two independent high throughput assays to segregate HSVd variants by real-time RT-PCR and High-Resolution Melting Temperature (HRM) analysis were developed: one based on EVAGreen dye; the other based on TaqMan probes. Primers for both assays targeted three differentiating nucleotides in the V domain which separated HSVd variants into three clusters by distinct melting temperatures with a confidence level higher than 98%. The accuracy of the HRM assays were validated by nucleotide sequencing of representative samples within each HRM cluster and by testing 45 HSVd-infected field trees from California, Italy, Spain, Syria and Turkey. To our knowledge, this is the first report of a rapid and sensitive approach to detect and differentiate HSVd variants associated with different biological behaviors. Although, HSVd is found in several crops including citrus, cachexia variants are restricted to some citrus-growing areas, particularly the Mediterranean Region. Rapid diagnosis for cachexia and non-cachexia variants is, thus, important for the management of HSVd in citrus and reduces the need for bioindexing and sequencing analysis. Copyright © 2013 Elsevier Ltd. All rights reserved.
da Fonseca, Néli José; Lima Afonso, Marcelo Querino; Pedersolli, Natan Gonçalves; de Oliveira, Lucas Carrijo; Andrade, Dhiego Souto; Bleicher, Lucas
2017-10-28
Flaviviruses are responsible for serious diseases such as dengue, yellow fever, and zika fever. Their genomes encode a polyprotein which, after cleavage, results in three structural and seven non-structural proteins. Homologous proteins can be studied by conservation and coevolution analysis as detected in multiple sequence alignments, usually reporting positions which are strictly necessary for the structure and/or function of all members in a protein family or which are involved in a specific sub-class feature requiring the coevolution of residue sets. This study provides a complete conservation and coevolution analysis on all flaviviruses non-structural proteins, with results mapped on all well-annotated available sequences. A literature review on the residues found in the analysis enabled us to compile available information on their roles and distribution among different flaviviruses. Also, we provide the mapping of conserved and coevolved residues for all sequences currently in SwissProt as a supplementary material, so that particularities in different viruses can be easily analyzed. Copyright © 2017 Elsevier Inc. All rights reserved.
de Oliveira Ceita, Geruza; Vilas-Boas, Laurival Antônio; Castilho, Marcelo Santos; Carazzolle, Marcelo Falsarella; Pirovani, Carlos Priminho; Selbach-Schnadelbach, Alessandra; Gramacho, Karina Peres; Ramos, Pablo Ivan Pereira; Barbosa, Luciana Veiga; Pereira, Gonçalo Amarante Guimarães; Góes-Neto, Aristóteles
2014-10-01
The phytopathogenic fungus Moniliophthora perniciosa (Stahel) Aime & Philips-Mora, causal agent of witches' broom disease of cocoa, causes countless damage to cocoa production in Brazil. Molecular studies have attempted to identify genes that play important roles in fungal survival and virulence. In this study, sequences deposited in the M. perniciosa Genome Sequencing Project database were analyzed to identify potential biological targets. For the first time, the ergosterol biosynthetic pathway in M. perniciosa was studied and the lanosterol 14α-demethylase gene (ERG11) that encodes the main enzyme of this pathway and is a target for fungicides was cloned, characterized molecularly and its phylogeny analyzed. ERG11 genomic DNA and cDNA were characterized and sequence analysis of the ERG11 protein identified highly conserved domains typical of this enzyme, such as SRS1, SRS4, EXXR and the heme-binding region (HBR). Comparison of the protein sequences and phylogenetic analysis revealed that the M. perniciosa enzyme was most closely related to that of Coprinopsis cinerea.
de Oliveira Ceita, Geruza; Vilas-Boas, Laurival Antônio; Castilho, Marcelo Santos; Carazzolle, Marcelo Falsarella; Pirovani, Carlos Priminho; Selbach-Schnadelbach, Alessandra; Gramacho, Karina Peres; Ramos, Pablo Ivan Pereira; Barbosa, Luciana Veiga; Pereira, Gonçalo Amarante Guimarães; Góes-Neto, Aristóteles
2014-01-01
The phytopathogenic fungus Moniliophthora perniciosa (Stahel) Aime & Philips-Mora, causal agent of witches’ broom disease of cocoa, causes countless damage to cocoa production in Brazil. Molecular studies have attempted to identify genes that play important roles in fungal survival and virulence. In this study, sequences deposited in the M. perniciosa Genome Sequencing Project database were analyzed to identify potential biological targets. For the first time, the ergosterol biosynthetic pathway in M. perniciosa was studied and the lanosterol 14α-demethylase gene (ERG11) that encodes the main enzyme of this pathway and is a target for fungicides was cloned, characterized molecularly and its phylogeny analyzed. ERG11 genomic DNA and cDNA were characterized and sequence analysis of the ERG11 protein identified highly conserved domains typical of this enzyme, such as SRS1, SRS4, EXXR and the heme-binding region (HBR). Comparison of the protein sequences and phylogenetic analysis revealed that the M. perniciosa enzyme was most closely related to that of Coprinopsis cinerea. PMID:25505843
Goonesekere, Nalin Cw
2009-01-01
The large numbers of protein sequences generated by whole genome sequencing projects require rapid and accurate methods of annotation. The detection of homology through computational sequence analysis is a powerful tool in determining the complex evolutionary and functional relationships that exist between proteins. Homology search algorithms employ amino acid substitution matrices to detect similarity between proteins sequences. The substitution matrices in common use today are constructed using sequences aligned without reference to protein structure. Here we present amino acid substitution matrices constructed from the alignment of a large number of protein domain structures from the structural classification of proteins (SCOP) database. We show that when incorporated into the homology search algorithms BLAST and PSI-blast, the structure-based substitution matrices enhance the efficacy of detecting remote homologs.
Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P
2012-03-15
Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.
Bender, Andrea; Schlimm, Dirk; Beller, Sieghard
2015-10-01
The domain of numbers provides a paradigmatic case for investigating interactions of culture, language, and cognition: Numerical competencies are considered a core domain of knowledge, and yet the development of specifically human abilities presupposes cultural and linguistic input by way of counting sequences. These sequences constitute systems with distinct structural properties, the cross-linguistic variability of which has implications for number representation and processing. Such representational effects are scrutinized for two types of verbal numeration systems-general and object-specific ones-that were in parallel use in several Oceanic languages (English with its general system is included for comparison). The analysis indicates that the object-specific systems outperform the general systems with respect to counting and mental arithmetic, largely due to their regular and more compact representation. What these findings reveal on cognitive diversity, how the conjectures involved speak to more general issues in cognitive science, and how the approach taken here might help to bridge the gap between anthropology and other cognitive sciences is discussed in the conclusion. Copyright © 2015 Cognitive Science Society, Inc.
Lai, Chen-Li; van den Ham, René; Mol, Jan; Teske, Erik
2009-09-01
Prostate cancer in the dog (cPC) has many features in common with hormone refractory human prostate cancer. As cPC is seen more often in castrated dogs, the contribution of the androgen receptor (AR) to the development of prostate cancer remains questionable. The aim of the present study was to evaluate the presence of the AR by immunohistochemistry in cPC. AR staining was observed in most tumors from intact and castrated dogs, but the proportion of positive cells and the staining intensity were much lower than in the prostate of healthy, non-castrated dogs. Most of the positive staining was seen in the cytoplasm rather than in the nuclei of the tumor cells. The predominant cytoplasmic localization was not related to mutations in exon 3 of the DNA-binding domain of the AR, as shown by sequence analysis of microdissected AR positive tumor cells. Other mechanisms that lead to an impaired androgen-AR signaling or a basal/stem cell like origin may explain the low cytoplasmic AR staining in cPC.
Pre-lithification tectonic foliation development in a clastic sedimentary sequence
NASA Astrophysics Data System (ADS)
Meere, Patrick; Mulchrone, Kieran; McCarthy, David; Timmermann, Martin; Dewey, John
2016-04-01
The current view regarding the timing of regionally developed penetrative tectonic fabrics in sedimentary rocks is that their development postdates lithification of those rocks. In this case fabric development is achieved by a number of deformation mechanisms including grain rigid body rotation, crystal-plastic deformation and pressure solution (wet diffusion). The latter is believed to be the primary mechanism responsible for shortening and the domainal structure of cleavage development commonly observed in low grade metamorphic rocks. In this study we combine field observations with strain analysis and modelling to fully characterise considerable (>50%) mid-Devonian Acadian crustal shortening in a Devonian clastic sedimentary sequence from south west Ireland. Despite these high levels of shortening and associated penetrative tectonic fabric there is a marked absence of the expected domainal cleavage structure and intra-clast deformation, which are expected with this level of deformation. In contrast to the expected deformation processes associated with conventional cleavage development, fabrics in these rocks are a product of translation, rigid body rotation and repacking of extra-formational clasts during deformation of an un-lithified clastic sedimentary sequence.
Lim, Hyoun-Sub; Park, Sang-Un; Bae, Hyeun-Jong; Natarajan, Savithiry
2014-01-01
Cinnamoyl-CoA reductase (CCR) is an important enzyme for lignin biosynthesis as it catalyzes the first specific committed step in monolignol biosynthesis. We have cloned a full length coding sequence of CCR from kenaf (Hibiscus cannabinus L.), which contains a 1,020-bp open reading frame (ORF), encoding 339 amino acids of 37.37 kDa, with an isoelectric point (pI) of 6.27 (JX524276, HcCCR2). BLAST result found that it has high homology with other plant CCR orthologs. Multiple alignment with other plant CCR sequences showed that it contains two highly conserved motifs: NAD(P) binding domain (VTGAGGFIASWMVKLLLEKGY) at N-terminal and probable catalytic domain (NWYCYGK). According to phylogenetic analysis, it was closely related to CCR sequences of Gossypium hirsutum (ACQ59094) and Populus trichocarpa (CAC07424). HcCCR2 showed ubiquitous expression in various kenaf tissues and the highest expression was detected in mature flower. HcCCR2 was expressed differentially in response to various stresses, and the highest expression was observed by drought and NaCl treatments. PMID:24723816
Sablok, Gaurav; Pérez-Pulido, Antonio J.; Do, Thac; Seong, Tan Y.; Casimiro-Soriguer, Carlos S.; La Porta, Nicola; Ralph, Peter J.; Squartini, Andrea; Muñoz-Merida, Antonio; Harikrishna, Jennifer A.
2016-01-01
Analysis of repetitive DNA sequence content and divergence among the repetitive functional classes is a well-accepted approach for estimation of inter- and intra-generic differences in plant genomes. Among these elements, microsatellites, or Simple Sequence Repeats (SSRs), have been widely demonstrated as powerful genetic markers for species and varieties discrimination. We present PlantFuncSSRs platform having more than 364 plant species with more than 2 million functional SSRs. They are provided with detailed annotations for easy functional browsing of SSRs and with information on primer pairs and associated functional domains. PlantFuncSSRs can be leveraged to identify functional-based genic variability among the species of interest, which might be of particular interest in developing functional markers in plants. This comprehensive on-line portal unifies mining of SSRs from first and next generation sequencing datasets, corresponding primer pairs and associated in-depth functional annotation such as gene ontology annotation, gene interactions and its identification from reference protein databases. PlantFuncSSRs is freely accessible at: http://www.bioinfocabd.upo.es/plantssr. PMID:27446111
Orpinomyces cellulase celf protein and coding sequences
Li, Xin-Liang; Chen, Huizhong; Ljungdahl, Lars G.
2000-09-05
A cDNA (1,520 bp), designated celF, consisting of an open reading frame (ORF) encoding a polypeptide (CelF) of 432 amino acids was isolated from a cDNA library of the anaerobic rumen fungus Orpinomyces PC-2 constructed in Escherichia coli. Analysis of the deduced amino acid sequence showed that starting from the N-terminus, CelF consists of a signal peptide, a cellulose binding domain (CBD) followed by an extremely Asn-rich linker region which separate the CBD and the catalytic domains. The latter is located at the C-terminus. The catalytic domain of CelF is highly homologous to CelA and CelC of Orpinomyces PC-2, to CelA of Neocallimastix patriciarum and also to cellobiohydrolase IIs (CBHIIs) from aerobic fungi. However, Like CelA of Neocallimastix patriciarum, CelF does not have the noncatalytic repeated peptide domain (NCRPD) found in CelA and CelC from the same organism. The recombinant protein CelF hydrolyzes cellooligosaccharides in the pattern of CBHII, yielding only cellobiose as product with cellotetraose as the substrate. The genomic celF is interrupted by a 111 bp intron, located within the region coding for the CBD. The intron of the celF has features in common with genes from aerobic filamentous fungi.
[NMR structure and dynamics of the chimeric protein SH3-F2].
Kutyshenko, V P; Gushchina, L V; Khristoforov, V S; Prokhorov, D A; Timchenko, M A; Kudrevatykh, Iu A; Fediukina, D V; Filimonov, V V
2010-01-01
For the further elucidation of structural and dynamic principles of protein self-organization and protein-ligand interactions the design of new chimeric protein SH3-F2 was made and genetically engineered construct was created. The SH3-F2 amino acid sequence consists of polyproline ligand mgAPPLPPYSA, GG linker and the sequence of spectrin SH3 domain circular permutant S19-P20s. Structural and dynamics properties of the protein were studied by high-resolution NMR. According to NMR data the tertiary structure of the chimeric protein SH3-F2 has the topology which is typical of SH3 domains in the complex with the ligand, forming polyproline type II helix, located in the conservative region of binding in the orientation II. The polyproline ligand closely adjoins with the protein globule and is stabilized by hydrophobic interactions. However the interaction of ligand and the part of globule relative to SH3 domain is not too large because the analysis of protein dynamic characteristics points to the low amplitude, high-frequency ligand tumbling in relation to the slow intramolecular motions of the main globule. The constructed chimera permits to carry out further structural and thermodynamic investigations of polyproline helix properties and its interaction with regulatory domains.
Visual management of large scale data mining projects.
Shah, I; Hunter, L
2000-01-01
This paper describes a unified framework for visualizing the preparations for, and results of, hundreds of machine learning experiments. These experiments were designed to improve the accuracy of enzyme functional predictions from sequence, and in many cases were successful. Our system provides graphical user interfaces for defining and exploring training datasets and various representational alternatives, for inspecting the hypotheses induced by various types of learning algorithms, for visualizing the global results, and for inspecting in detail results for specific training sets (functions) and examples (proteins). The visualization tools serve as a navigational aid through a large amount of sequence data and induced knowledge. They provided significant help in understanding both the significance and the underlying biological explanations of our successes and failures. Using these visualizations it was possible to efficiently identify weaknesses of the modular sequence representations and induction algorithms which suggest better learning strategies. The context in which our data mining visualization toolkit was developed was the problem of accurately predicting enzyme function from protein sequence data. Previous work demonstrated that approximately 6% of enzyme protein sequences are likely to be assigned incorrect functions on the basis of sequence similarity alone. In order to test the hypothesis that more detailed sequence analysis using machine learning techniques and modular domain representations could address many of these failures, we designed a series of more than 250 experiments using information-theoretic decision tree induction and naive Bayesian learning on local sequence domain representations of problematic enzyme function classes. In more than half of these cases, our methods were able to perfectly discriminate among various possible functions of similar sequences. We developed and tested our visualization techniques on this application.
Brady, J; Radonovich, M; Thoren, M; Das, G; Salzman, N P
1984-01-01
We have previously identified an 11-base DNA sequence, 5'-G-G-T-A-C-C-T-A-A-C-C-3' (simian virus 40 [SV40] map position 294 to 304), which is important in the control of SV40 late RNA expression in vitro and in vivo (Brady et al., Cell 31:625-633, 1982). We report here the identification of another domain of the SV40 late promoter. A series of mutants with deletions extending from SV40 map position 0 to 300 was prepared by nuclease BAL 31 treatment. The cloned templates were then analyzed for efficiency and accuracy of late SV40 RNA expression in the Manley in vitro transcription system. Our studies showed that, in addition to the promoter domain near map position 300, there are essential DNA sequences between nucleotide positions 74 and 95 that are required for efficient expression of late SV40 RNA. Included in this SV40 DNA sequence were two of the six GGGCGG SV40 repeat sequences and an 11-nucleotide segment which showed strong homology with the upstream sequences required for the efficient in vitro and in vivo expression of the histone H2A gene. This upstream promoter sequence supported transcription with the same efficiency even when it was moved 72 nucleotides closer to the major late cap site. In vitro promoter competition analysis demonstrated that the upstream promoter sequence, independent of the 294 to 304 promoter element, is capable of binding polymerase-transcription factors required for SV40 late gene transcription. Finally, we show that DNA sequences which control the specificity of RNA initiation at nucleotide 325 lie downstream of map position 294. Images PMID:6321950
Concomitant prediction of function and fold at the domain level with GO-based profiles.
Lopez, Daniel; Pazos, Florencio
2013-01-01
Predicting the function of newly sequenced proteins is crucial due to the pace at which these raw sequences are being obtained. Almost all resources for predicting protein function assign functional terms to whole chains, and do not distinguish which particular domain is responsible for the allocated function. This is not a limitation of the methodologies themselves but it is due to the fact that in the databases of functional annotations these methods use for transferring functional terms to new proteins, these annotations are done on a whole-chain basis. Nevertheless, domains are the basic evolutionary and often functional units of proteins. In many cases, the domains of a protein chain have distinct molecular functions, independent from each other. For that reason resources with functional annotations at the domain level, as well as methodologies for predicting function for individual domains adapted to these resources are required.We present a methodology for predicting the molecular function of individual domains, based on a previously developed database of functional annotations at the domain level. The approach, which we show outperforms a standard method based on sequence searches in assigning function, concomitantly predicts the structural fold of the domains and can give hints on the functionally important residues associated to the predicted function.
Suzuki, Akiko; Endo, Takeshi
2002-02-06
We have cloned a cDNA encoding a novel protein referred to as ermelin from mouse C2 skeletal muscle cells. This protein contained six hydrophobic amino acid stretches corresponding to transmembrane domains, two histidine-rich sequences, and a sequence homologous to the fusion peptides of certain fusion proteins. Ermelin also contained a novel modular sequence, designated as HELP domain, which was highly conserved among eukaryotes, from yeast to higher plants and animals. All these HELP domain-containing proteins, including mouse KE4, Drosophila Catsup, and Arabidopsis IAR1, possessed multipass transmembrane domains and histidine-rich sequences. Ermelin was predominantly expressed in brain and testis, and induced during neuronal differentiation of N1E-115 neuroblastoma cells but downregulated during myogenic differentiation of C2 cells. The mRNA was accumulated in hippocampus and cerebellum of brain and central areas of seminiferous tubules in testis. Epitope-tagging experiments located ermelin and KE4 to a network structure throughout the cytoplasm. Staining with the fluorescent dye DiOC(6)(3) identified this structure as the endoplasmic reticulum. These results suggest that at least some, if not all, of the HELP domain-containing proteins are multipass endoplasmic reticulum membrane proteins with functions conserved among eukaryotes.
Yang, Zefeng; Gu, Shiliang; Wang, Xuefeng; Li, Wenjuan; Tang, Zaixiang; Xu, Chenwu
2008-09-01
CPP-like genes are members of a small family which features the existence of two similar Cys-rich domains termed CXC domains in their protein products and are distributed widely in plants and animals but do not exist in yeast. The members of this family in plants play an important role in development of reproductive tissue and control of cell division. To gain insights into how CPP-like genes evolved in plants, we conducted a comparative phylogenetic and molecular evolutionary analysis of the CPP-like gene family in Arabidopsis and rice. The results of phylogeny revealed that both gene loss and species-specific expansion contributed to the evolution of this family in Arabidopsis and rice. Both intron gain and intron loss were observed through intron/exon structure analysis for duplicated genes. Our results also suggested that positive selection was a major force during the evolution of CPP-like genes in plants, and most amino acid residues under positive selection were disproportionately located in the region outside the CXC domains. Further analysis revealed that two CXC domains and sequences connecting them might have coevolved during the long evolutionary period.
Ishikawa, Yoshihiro; Bächinger, Hans Peter
2013-11-01
Collagen biosynthesis occurs in the rough endoplasmic reticulum, and many molecular chaperones and folding enzymes are involved in this process. The folding mechanism of type I procollagen has been well characterized, and protein disulfide isomerase (PDI) has been suggested as a key player in the formation of the correct disulfide bonds in the noncollagenous carboxyl-terminal and amino-terminal propeptides. Prolyl 3-hydroxylase 1 (P3H1) forms a hetero-trimeric complex with cartilage-associated protein and cyclophilin B (CypB). This complex is a multifunctional complex acting as a prolyl 3-hydroxylase, a peptidyl prolyl cis-trans isomerase, and a molecular chaperone. Two major domains are predicted from the primary sequence of P3H1: an amino-terminal domain and a carboxyl-terminal domain corresponding to the 2-oxoglutarate- and iron-dependent dioxygenase domains similar to the α-subunit of prolyl 4-hydroxylase and lysyl hydroxylases. The amino-terminal domain contains four CXXXC sequence repeats. The primary sequence of cartilage-associated protein is homologous to the amino-terminal domain of P3H1 and also contains four CXXXC sequence repeats. However, the function of the CXXXC sequence repeats is not known. Several publications have reported that short peptides containing a CXC or a CXXC sequence show oxido-reductase activity similar to PDI in vitro. We hypothesize that CXXXC motifs have oxido-reductase activity similar to the CXXC motif in PDI. We have tested the enzyme activities on model substrates in vitro using a GCRALCG peptide and the P3H1 complex. Our results suggest that this complex could function as a disulfide isomerase in the rough endoplasmic reticulum.
Fibronectin tetrapeptide is target for syphilis spirochete cytadherence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thomas, D.D.; Baseman, J.B.; Alderete, J.F.
1985-11-01
The syphilis bacterium, Treponema pallidum, parasitizes host cells through recognition of fibronectin (Fn) on cell surfaces. The active site of the Fn molecule has been identified as a four-amino acid sequence, arg-gly-asp-ser (RGDS), located on each monomer of the cell-binding domain. The synthetic heptapeptide gly-arg-gly-asp-ser-pro-cys (GRGDSPC), with the active site sequence RGDS, specifically competed with SVI-labeled cell-binding domain acquisition by T. pallidum. Additionally, the same heptapeptide with the RGDS sequence diminished treponemal attachment to HEp-2 and HT1080 cell monolayers. Related heptapeptides altered in one key amino acid within the RGDS sequence failed to inhibit Fn cell-binding domain acquisition or parasitismmore » of host cells by T. pallidum. The data support the view that T. pallidum cytadherence of host cells is through recognition of the RGDS sequence also important for eukaryotic cell-Fn binding.« less
Gene copy number evolution during tetraploid cotton radiation.
Rong, J; Feltus, F A; Liu, L; Lin, L; Paterson, A H
2010-11-01
After polyploid formation, retention or loss of duplicated genes is not random. Genes with some functional domains are convergently restored to 'singleton' state after many independent genome duplications, and have been referred to as 'duplication-resistant' (DR) genes. To further explore the timeframe for their restoration to the singleton state, 27 cotton homologs of genes found to be 'DR' in Arabidopsis were selected based on diagnostic Pfam domains. Their copy numbers were studied using southern hybridization and sequence analysis in five tetraploid species and their ancestral A and D genome diploids. DR genes had significantly lower copy number than gene families hybridizing to randomly selected cotton ESTs. Three DR genes showed complete loss of D genome-derived homoeologs in some or all tetraploid species. Prior analysis has shown gene loss in polyploid cotton to be rare, and herein only one randomly selected gene showed loss of a homoeolog in only one of the five tetraploid species (Gossypium mustelinum). BAC sequencing confirmed two cases of gene loss in tetraploid cotton. Divergence among 5' sequences of DR genes amplified from G. arboreum, G. raimondii, and Gossypioides kirkii was correlated with gene copy number. These results show that genes containing Pfam domains associated with duplication resistance in Arabidopsis have also been preferentially restored to low copy number after a more recent polyploidization event in cotton. In tetraploid cotton, genes from the progenitor D genome seem to experience more gene copy number divergence than genes from the A genome. Together with D subgenome-biased alterations in gene expression, perhaps gene loss may contribute to the relatively larger portion of quantitative trait variation attributable to D than A subgenome chromosomes of tetraploid cotton.
Chiu, Chi-Chien; John, Joseph Abraham Christopher; Hseu, Tzong-Hsiung; Chang, Chi-Yao
2002-03-01
The pituitary-specific transcription factor Pit-1 belongs to the family of POU-domain proteins and is known to play an important role in the differentiation of pituitary cells. Here we report the complete nucleotide sequence of cDNA encoding Pit-1 from the brackish water fish, ayu (Plecoglossus altivelis). Nucleotide sequence analysis of 1910 bp of ayu Pit-1 cDNA revealed an open reading frame of 1074 bp that encodes a protein of 358 amino acids containing a POU-specific domain, POU homeodomain, and an STA (Ser/Thr-rich activation) transactivation domain. We inserted the coding region of Pit-1 cDNA, obtained by PCR, into a pET-20b(+) plasmid to produce recombinant Pit-1 in Escherichia coli BL21 (DE3) pLysS cells. Upon induction with isopropyl beta-D-thiogalactopyranoside, Pit-1 was expressed and accumulated as inclusion bodies in E. coli. The protein was then purified in one step by affinity chromatography on a nickel-nitrilotriacetic acid agarose column under denaturing conditions. This method yielded 0.7 mg of highly pure and stable protein per 200 ml of bacterial culture. A band of 40 kDa, resolved as recombinant ayu Pit-1 by sodium dodecyl sulfate-polyacrylamide gel electrophoresis, agrees well with the molecular mass calculated from the translated cDNA sequence. The purified recombinant Pit-1 was confirmed in vitro through Western blot analysis, using its monoclonal antibody. This monoclonal antibody detected Pit-1 in the nuclei of ayu developing pituitary by immunohistochemical reaction. It serves as a good reagent for the detection of ayu Pit-1 in situ. Copyright 2002 Elsevier Science (USA).
The low expression of Dmrt7 is associated with spermatogenic arrest in cattle-yak.
Yan, Ping; Xiang, Lin; Guo, Xian; Bao, Peng-Jia; Jin, Shuai; Wu, Xiao-Yun
2014-11-01
Dmrt7 is a member of the DM domain family of genes. Dmrt7 deficiency is also a strong candidate as a cause for male cattle-yak infertility, as it is regarded as essential for male spermatogenesis, between the pachynema and diplonema stages. In our study, the coding region sequence of yak and cattle-yak Dmrt7 was cloned by molecular cloning techniques, and the sequence, conserved domains, functional sites, and secondary and tertiary structures of the Dmrt7-encoded protein were predicted and analyzed using bioinformatics methods. The coding region sequences of the Dmrt7 gene, encoding 370 amino acids, were consistent in yak and cattle-yak. The protein encoded by yak and cattle-yak Dmrt7 contains a DM domain. We detected Dmrt7 mRNA expression in testis, but not in any other tissue. Dmrt7 mRNA and protein expression was significantly higher in testis of cattle and yak than that in cattle-yak (p < 0.01). Histological analysis indicated that seminiferous tubules in male cattle-yak were highly vacuolated and contained primarily Sertoli cells and spermatogonia, while those of cattle and yak contained abundant primary spermatocytes. Male cattle-yak testis contained a significantly larger number of apoptotic cells than those in cattle and yak assessed by terminal deoxynucleotidyl transferase dUTP nick end-labeling (TUNEL) analysis. The accumulation of SCP3-positive spermatocytes indicated the arrest of spermatogenesis at the pachynema stage in the cattle-yak. These results suggest low levels of Dmrt7 expression lead to male sterility in cattle-yak. The molecular function of Dmrt7 and the regulation of its expression warrant need to be examined in future studies.
Selective Loss of Cysteine Residues and Disulphide Bonds in a Potato Proteinase Inhibitor II Family
Li, Xiu-Qing; Zhang, Tieling; Donnelly, Danielle
2011-01-01
Disulphide bonds between cysteine residues in proteins play a key role in protein folding, stability, and function. Loss of a disulphide bond is often associated with functional differentiation of the protein. The evolution of disulphide bonds is still actively debated; analysis of naturally occurring variants can promote understanding of the protein evolutionary process. One of the disulphide bond-containing protein families is the potato proteinase inhibitor II (PI-II, or Pin2, for short) superfamily, which is found in most solanaceous plants and participates in plant development, stress response, and defence. Each PI-II domain contains eight cysteine residues (8C), and two similar PI-II domains form a functional protein that has eight disulphide bonds and two non-identical reaction centres. It is still unclear which patterns and processes affect cysteine residue loss in PI-II. Through cDNA sequencing and data mining, we found six natural variants missing cysteine residues involved in one or two disulphide bonds at the first reaction centre. We named these variants Pi7C and Pi6C for the proteins missing one or two pairs of cysteine residues, respectively. This PI-II-7C/6C family was found exclusively in potato. The missing cysteine residues were in bonding pairs but distant from one another at the nucleotide/protein sequence level. The non-synonymous/synonymous substitution (Ka/Ks) ratio analysis suggested a positive evolutionary gene selection for Pi6C and various Pi7C. The selective deletion of the first reaction centre cysteine residues that are structure-level-paired but sequence-level-distant in PI-II illustrates the flexibility of PI-II domains and suggests the functionality of their transient gene versions during evolution. PMID:21494600
Mizejewski, G J
2015-01-01
Recent studies have demonstrated that the carboxyterminal third domain of alpha-fetoprotein (AFP-CD) binds with various ligands and receptors. Reports within the last decade have established that AFP-CD contains a large fragment of amino acids that interact with several different receptor types. Using computer software specifically designed to identify protein-to-protein interaction at amino acid sequence docking sites, the computer searches identified several types of scavenger-associated receptors and their amino acid sequence locations on the AFP-CD polypeptide chain. The scavenger receptors (SRs) identified were CD36, CD163, Stabilin, SSC5D, SRB1 and SREC; the SR-associated receptors included the mannose, low-density lipoprotein receptors, the asialoglycoprotein receptor, and the receptor for advanced glycation endproducts (RAGE). Interestingly, some SR interaction sites were localized on the AFP-derived Growth Inhibitory Peptide (GIP) segment at amino acids #480-500. Following the detection studies, a structural subdomain analysis of both the receptor and the AFP-CD revealed the presence of epidermal growth factor (EGF) repeats, extracellular matrix-like protein regions, amino acid-rich motifs and dimerization subdomains. For the first time, it was reported that EGF-like sequence repeats were identified on each of the three domains of AFP. Thereafter, the localization of receptors on specific cell types were reviewed and their functions were discussed.
Armijos-Jaramillo, Vinicio; Santander-Gordón, Daniela; Soria, Rosa; Pazmiño-Betancourth, Mauro; Echeverría, María Cristina
2017-09-01
Streptomyces scabies is a common soil bacterium that causes scab symptoms in potatoes. Strong evidence indicates horizontal gene transfer (HGT) among bacteria has influenced the evolution of this plant pathogen and other Streptomyces spp. To extend the study of the HGT to the Streptomyces genus, we explored the effects of the inter-domain HGT in the S. scabies genome. We employed a semi-automatic pipeline based on BLASTp searches and phylogenetic reconstruction. The data show low impact of inter-domain HGT in the S. scabies genome; however, we found a putative plant pathogenesis related 1 (PR1) sequence in the genome of S. scabies and other species of the genus. It is possible that this gene could be used by S. scabies to out-compete other soil organisms. Copyright © 2016 Elsevier Inc. All rights reserved.
Luckow, H.G.; Pavlis, T.L.; Serpa, L.F.; Guest, B.; Wagner, D.L.; Snee, L.; Hensley, T.M.; Korjenkov, A.
2005-01-01
New 1:24,000 scale mapping, geochemical analyses of volcanic rocks, and Ar/Ar and tephrochronology analyses of the Wingate Wash, northern Owlshead Mountain and Southern Panamint Mountain region document a complex structural history constrained by syntectonic volcanism and sedimentation. In this study, the region is divided into five structural domains with distinct, but related, histories: (1) The southern Panamint domain is a structurally intact, gently south-tilted block dominated by a middle Miocene volcanic center recognized as localized hypabyssal intrusives surrounded by proximal facies pyroclastic rocks. This Miocene volcanic sequence is an unusual alkaline volcanic assemblage ranging from trachybasalt to rhyolite, but dominated by trachyandesite. The volcanic rocks are overlain in the southwestern Panamint Mountains by a younger (Late Miocene?) fanglomerate sequence. (2) An upper Wingate Wash domain is characterized by large areas of Quaternary cover and complex overprinting of older structure by Quaternary deformation. Quaternary structures record ???N-S shortening concurrent with ???E-W extension accommodated by systems of strike-slip and thrust faults. (3) A central Wingate Wash domain contains a complex structural history that is closely tied to the stratigraphic evolution. In this domain, a middle Miocene volcanic package contains two distinct assemblages; a lower sequence dominated by alkaline pyroclastic rocks similar to the southern Panamint sequence and an upper basaltic sequence of alkaline basalt and basanites. This volcanic sequence is in turn overlain by a coarse clastic sedimentary sequence that records the unroofing of adjacent ranges and development of ???N-S trending, west-tilted fault blocks. We refer to this sedimentary sequence as the Lost Lake assemblage. (4) The lower Wingate Wash/northern Owlshead domain is characterized by a gently north-dipping stratigraphic sequence with an irregular unconformity at the base developed on granitic basement. The unconformity is locally overlain by channelized deposits of older Tertiary(?) red conglomerate, some of which predate the onset of extensive volcanism, but in most of the area is overlain by a moderately thick package of Middle Miocene trachybasalt, trachyandesitic, ash flows, lithic tuff, basaltic cinder, basanites, and dacitic pyroclastic, debris, and lahar flows with localized exposures of sedimentary rocks. The upper part of the Miocene stratigraphic sequence in this domain is comprised of coarse grained-clastic sediments that are apparently middle Miocene based on Ar/Ar dating of interbedded volcanic rocks. This sedimentary sequence, however, is lithologically indistinguishable from the structurally adjacent Late Miocene Lost Lake assemblage and a stratigraphically overlying Plio-Pleistocene alluvial fan; a relationship that handicaps tracing structures through this domain. This domain is also structurally complex and deformed by a series of northwest-southeast-striking, east-dipping, high-angle oblique, sinistral, normal faults that are cut by left-lateral strike-slip faults. The contact between the southern Panamint domain and the adjacent domains is a complex fault system that we interpret as a zone of Late Miocene distributed sinistral slip that is variably overprinted in different portions of the mapped area. The net sinistral slip across the Wingate Wash fault system is estimated at 7-9 km, based on offset of Proterozoic Crystal Springs Formation beneath the middle Miocene unconformity to as much as 15 km based on offset volcanic facies in Middle Miocene rocks. To the south of Wingate Wash, the northern Owlshead Mountains are also cut by a sinistral, northwest-dipping, oblique normal fault, (referred to as the Filtonny Fault) with significant slip that separates the Lower Wingate Wash and central Owlshead domains. The Filtonny Fault may represent a young conjugate fault to the dextral Southern Death Valley fault system and may be the northwest
Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso
2016-01-01
Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach. PMID:27965389
Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso
2016-12-27
Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.
Robakis, Thalia; Bak, Beata; Lin, Shu-huei; Bernard, Daniel J.; Scheiffele, Peter
2008-01-01
Precursor proteolysis is a crucial mechanism for regulating protein structure and function. Signal peptidase (SP) is an enzyme with a well defined role in cleaving N-terminal signal sequences but no demonstrated function in the proteolysis of cellular precursor proteins. We provide evidence that SP mediates intraprotein cleavage of IgSF1, a large cellular Ig domain protein that is processed into two separate Ig domain proteins. In addition, our results suggest the involvement of signal peptide peptidase (SPP), an intramembrane protease, which acts on substrates that have been previously cleaved by SP. We show that IgSF1 is processed through sequential proteolysis by SP and SPP. Cleavage is directed by an internal signal sequence and generates two separate Ig domain proteins from a polytopic precursor. Our findings suggest that SP and SPP function are not restricted to N-terminal signal sequence cleavage but also contribute to the processing of cellular transmembrane proteins. PMID:18981173
Ngcapu, Sinaye; Theys, Kristof; Libin, Pieter; Marconi, Vincent C; Sunpath, Henry; Ndung'u, Thumbi; Gordon, Michelle L
2017-11-08
The South African national treatment programme includes nucleoside reverse transcriptase inhibitors (NRTIs) in both first and second line highly active antiretroviral therapy regimens. Mutations in the RNase H domain have been associated with resistance to NRTIs but primarily in HIV-1 subtype B studies. Here, we investigated the prevalence and association of RNase H mutations with NRTI resistance in sequences from HIV-1 subtype C infected individuals. RNase H sequences from 112 NRTI treated but virologically failing individuals and 28 antiretroviral therapy (ART)-naive individuals were generated and analysed. In addition, sequences from 359 subtype C ART-naive sequences were downloaded from Los Alamos database to give a total of 387 sequences from ART-naive individuals for the analysis. Fisher's exact test was used to identify mutations and Bayesian network learning was applied to identify novel NRTI resistance mutation pathways in RNase H domain. The mutations A435L, S468A, T470S, L484I, A508S, Q509L, L517I, Q524E and E529D were more prevalent in sequences from treatment-experienced compared to antiretroviral treatment naive individuals, however, only the E529D mutation remained significant after correction for multiple comparison. Our findings suggest a potential interaction between E529D and NRTI-treatment; however, site-directed mutagenesis is needed to understand the impact of this RNase H mutation.
Isolation and cloning of a metalloproteinase from king cobra snake venom.
Guo, Xiao-Xi; Zeng, Lin; Lee, Wen-Hui; Zhang, Yun; Jin, Yang
2007-06-01
A 50 kDa fibrinogenolytic protease, ohagin, from the venom of Ophiophagus hannah was isolated by a combination of gel filtration, ion-exchange and heparin affinity chromatography. Ohagin specifically degraded the alpha-chain of human fibrinogen and the proteolytic activity was completely abolished by EDTA, but not by PMSF, suggesting it is a metalloproteinase. It dose-dependently inhibited platelet aggregation induced by ADP, TMVA and stejnulxin. The full sequence of ohagin was deduced by cDNA cloning and confirmed by protein sequencing and peptide mass fingerprinting. The full-length cDNA sequence of ohagin encodes an open reading frame of 611 amino acids that includes signal peptide, proprotein and mature protein comprising metalloproteinase, disintegrin-like and cysteine-rich domains, suggesting it belongs to P-III class metalloproteinase. In addition, P-III class metalloproteinases from the venom glands of Naja atra, Bungarus multicinctus and Bungarus fasciatus were also cloned in this study. Sequence analysis and phylogenetic analysis indicated that metalloproteinases from elapid snake venoms form a new subgroup of P-III SVMPs.
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring
2012-01-01
Background Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. Results The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Conclusions Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family. PMID:22793672
Statistical discovery of site inter-dependencies in sub-molecular hierarchical protein structuring.
Durston, Kirk K; Chiu, David Ky; Wong, Andrew Kc; Li, Gary Cl
2012-07-13
Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families. The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function. Our results demonstrate that the method we present here using a k-modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family.
Sequence-based screening for self-sufficient P450 monooxygenase from a metagenome library.
Kim, B S; Kim, S Y; Park, J; Park, W; Hwang, K Y; Yoon, Y J; Oh, W K; Kim, B Y; Ahn, J S
2007-05-01
Cytochrome P450 monooxygenases (CYPs) are useful catalysts for oxidation reactions. Self-sufficient CYPs harbour a reductive domain covalently connected to a P450 domain and are known for their robust catalytic activity with great potential as biocatalysts. In an effort to expand genetic sources of self-sufficient CYPs, we devised a sequence-based screening system to identify them in a soil metagenome. We constructed a soil metagenome library and performed sequence-based screening for self-sufficient CYP genes. A new CYP gene, syk181, was identified from the metagenome library. Phylogenetic analysis revealed that SYK181 formed a distinct phylogenic line with 46% amino-acid-sequence identity to CYP102A1 which has been extensively studied as a fatty acid hydroxylase. The heterologously expressed SYK181 showed significant hydroxylase activity towards naphthalene and phenanthrene as well as towards fatty acids. Sequence-based screening of metagenome libraries is expected to be a useful approach for searching self-sufficient CYP genes. The translated product of syk181 shows self-sufficient hydroxylase activity towards fatty acids and aromatic compounds. SYK181 is the first self-sufficient CYP obtained directly from a metagenome library. The genetic and biochemical information on SYK181 are expected to be helpful for engineering self-sufficient CYPs with broader catalytic activities towards various substrates, which would be useful for bioconversion of natural products and biodegradation of organic chemicals.
Candida ruelliae sp. nov., a novel yeast species isolated from flowers of Ruellia sp. (Acanthaceae).
Saluja, Puja; Prasad, Gandham S
2008-06-01
Two novel yeast strains designated as 16Q1 and 16Q3 were isolated from flowers of the Ruellia species of the Acanthaceae family. The D1/D2 domain and ITS sequences of these two strains were identical. Sequence analysis of the D1/D2 domain of large-subunit rRNA gene indicated their relationship to species of the Candida haemulonii cluster. However, they differ from C. haemulonii by 14% nucleotide sequence divergence, from Candida pseudohaemulonii by 16.1% and from C. haemulonii type II by 16.5%. These strains also differ in 18 physiological tests from the type strain of C. haemulonii, and 12 and 16 tests, respectively, from C. pseudohaemulonii and C. haemulonii type II. They also differ from C. haemulonii and other related species by more than 13% sequence divergence in the internal transcribed spacer region. In the SSU rRNA gene sequences, strain 16Q1 differs by 1.7% nucleotide divergence from C. haemulonii. Sporulation was not observed in pure or mixed cultures on several media examined. All these data support the assignment of these strains to a novel species; we have named them as Candida ruelliae sp. nov., and designate strain 16Q1(T)=MTCC 7739(T)=CBS10815(T) as type strain of the novel species.
Domain similarity based orthology detection.
Bitard-Feildel, Tristan; Kemena, Carsten; Greenwood, Jenny M; Bornberg-Bauer, Erich
2015-05-13
Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationally feasible in a reasonable amount of time. We propose to speed up the detection of orthologous proteins by using strings of domains to characterize the proteins. We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain content similarity, and new software, named porthoDom. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain content similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDom, the wrapper developed for proteinortho. porthoDom makes use of domain content similarity measures to group proteins together before searching for orthologs. By using domains instead of amino acid sequences, the reduction of the search space decreases the computational complexity of an all-against-all sequence comparison. We demonstrate that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. porthoDom has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho. The implementation of porthoDom is released using python and C++ languages and is available under the GNU GPL licence 3 at http://www.bornberglab.org/pages/porthoda .
The 193-Kd Vault Protein, Vparp, Is a Novel Poly(Adp-Ribose) Polymerase
Kickhoefer, Valerie A.; Siva, Amara C.; Kedersha, Nancy L.; Inman, Elisabeth M.; Ruland, Cristina; Streuli, Michel; Rome, Leonard H.
1999-01-01
Mammalian vaults are ribonucleoprotein (RNP) complexes, composed of a small ribonucleic acid and three proteins of 100, 193, and 240 kD in size. The 100-kD major vault protein (MVP) accounts for >70% of the particle mass. We have identified the 193-kD vault protein by its interaction with the MVP in a yeast two-hybrid screen and confirmed its identity by peptide sequence analysis. Analysis of the protein sequence revealed a region of ∼350 amino acids that shares 28% identity with the catalytic domain of poly(ADP-ribose) polymerase (PARP). PARP is a nuclear protein that catalyzes the formation of ADP-ribose polymers in response to DNA damage. The catalytic domain of p193 was expressed and purified from bacterial extracts. Like PARP, this domain is capable of catalyzing a poly(ADP-ribosyl)ation reaction; thus, the 193-kD protein is a new PARP. Purified vaults also contain the poly(ADP-ribosyl)ation activity, indicating that the assembled particle retains enzymatic activity. Furthermore, we show that one substrate for this vault-associated PARP activity is the MVP. Immunofluorescence and biochemical data reveal that p193 protein is not entirely associated with the vault particle, suggesting that it may interact with other protein(s). A portion of p193 is nuclear and localizes to the mitotic spindle. PMID:10477748
Regulation of the Production of Infectious Genotype 1a Hepatitis C Virus by NS5A Domain III▿
Kim, Seungtaek; Welsch, Christoph; Yi, MinKyung; Lemon, Stanley M.
2011-01-01
Although hepatitis C virus (HCV) assembly remains incompletely understood, recent studies with the genotype 2a JFH-1 strain suggest that it is dependent upon the phosphorylation of Ser residues near the C terminus of NS5A, a multifunctional nonstructural protein. Since genotype 1 viruses account for most HCV disease yet differ substantially in sequence from that of JFH-1, we studied the role of NS5A in the production of the H77S virus. While less efficient than JFH-1, genotype 1a H77S RNA produces infectious virus when transfected into permissive Huh-7 cells. The exchange of complete NS5A sequences between these viruses was highly detrimental to replication, while exchanges of the C-terminal domain III sequence (46% amino acid sequence identity) were well tolerated, with little effect on RNA synthesis. Surprisingly, the placement of the H77S domain III sequence into JFH-1 resulted in increased virus yields; conversely, H77S yields were reduced by the introduction of domain III from JFH-1. These changes in infectious virus yield correlated well with changes in the abundance of NS5A in RNA-transfected cells but not with RNA replication or core protein expression levels. Alanine replacement mutagenesis of selected Ser and Thr residues in the C-terminal domain III sequence revealed no single residue to be essential for infectious H77S virus production. However, virus production was eliminated by Ala substitutions at multiple residues and could be restored by phosphomimetic Asp substitutions at these sites. Thus, despite low overall sequence homology, the production of infectious virus is regulated similarly in JFH-1 and H77S viruses by a conserved function associated with a C-terminal Ser/Thr cluster in domain III of NS5A. PMID:21525356
Recombinant antibody mediated delivery of organelle-specific DNA pH sensors along endocytic pathways
NASA Astrophysics Data System (ADS)
Modi, Souvik; Halder, Saheli; Nizak, Clément; Krishnan, Yamuna
2013-12-01
DNA has been used to build nanomachines with potential in cellulo and in vivo applications. However their different in cellulo applications are limited by the lack of generalizable strategies to deliver them to precise intracellular locations. Here we describe a new molecular design of DNA pH sensors with response times that are nearly 20 fold faster. Further, by changing the sequence of the pH sensitive domain of the DNA sensor, we have been able to tune their pH sensitive regimes and create a family of DNA sensors spanning ranges from pH 4 to 7.6. To enable a generalizable targeting methodology, this new sensor design also incorporates a `handle' domain. We have identified, using a phage display screen, a set of three recombinant antibodies (scFv) that bind sequence specifically to the handle domain. Sequence analysis of these antibodies revealed several conserved residues that mediate specific interactions with the cognate DNA duplex. We also found that all three scFvs clustered into different branches indicating that their specificity arises from mutations in key residues. When one of these scFvs is fused to a membrane protein (furin) that traffics via the cell surface, the scFv-furin chimera binds the `handle' and ferries a family of DNA pH sensors along the furin endocytic pathway. Post endocytosis, all DNA nanodevices retain their functionality in cellulo and provide spatiotemporal pH maps of retrogradely trafficking furin inside living cells. This new molecular technology of DNA-scFv-protein chimeras can be used to site-specifically complex DNA nanostructures for bioanalytical applications.DNA has been used to build nanomachines with potential in cellulo and in vivo applications. However their different in cellulo applications are limited by the lack of generalizable strategies to deliver them to precise intracellular locations. Here we describe a new molecular design of DNA pH sensors with response times that are nearly 20 fold faster. Further, by changing the sequence of the pH sensitive domain of the DNA sensor, we have been able to tune their pH sensitive regimes and create a family of DNA sensors spanning ranges from pH 4 to 7.6. To enable a generalizable targeting methodology, this new sensor design also incorporates a `handle' domain. We have identified, using a phage display screen, a set of three recombinant antibodies (scFv) that bind sequence specifically to the handle domain. Sequence analysis of these antibodies revealed several conserved residues that mediate specific interactions with the cognate DNA duplex. We also found that all three scFvs clustered into different branches indicating that their specificity arises from mutations in key residues. When one of these scFvs is fused to a membrane protein (furin) that traffics via the cell surface, the scFv-furin chimera binds the `handle' and ferries a family of DNA pH sensors along the furin endocytic pathway. Post endocytosis, all DNA nanodevices retain their functionality in cellulo and provide spatiotemporal pH maps of retrogradely trafficking furin inside living cells. This new molecular technology of DNA-scFv-protein chimeras can be used to site-specifically complex DNA nanostructures for bioanalytical applications. Electronic supplementary information (ESI) available: Detailed description of all oligonucleotide sequences used in this study; list of figures that support claims from the main text. Mainly these show sensor sequences, phage display results, scFv purification and binding data, cell images clamped at different pH and co-localization studies with endocytic tracers. See DOI: 10.1039/c3nr03769j
The LANL hemorrhagic fever virus database, a new platform for analyzing biothreat viruses
Kuiken, Carla; Thurmond, Jim; Dimitrijevic, Mira; Yoon, Hyejin
2012-01-01
Hemorrhagic fever viruses (HFVs) are a diverse set of over 80 viral species, found in 10 different genera comprising five different families: arena-, bunya-, flavi-, filo- and togaviridae. All these viruses are highly variable and evolve rapidly, making them elusive targets for the immune system and for vaccine and drug design. About 55 000 HFV sequences exist in the public domain today. A central website that provides annotated sequences and analysis tools will be helpful to HFV researchers worldwide. The HFV sequence database collects and stores sequence data and provides a user-friendly search interface and a large number of sequence analysis tools, following the model of the highly regarded and widely used Los Alamos HIV database [Kuiken, C., B. Korber, and R.W. Shafer, HIV sequence databases. AIDS Rev, 2003. 5: p. 52–61]. The database uses an algorithm that aligns each sequence to a species-wide reference sequence. The NCBI RefSeq database [Sayers et al. (2011) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 39, D38–D51.] is used for this; if a reference sequence is not available, a Blast search finds the best candidate. Using this method, sequences in each genus can be retrieved pre-aligned. The HFV website can be accessed via http://hfv.lanl.gov. PMID:22064861
ATP interacts with the CPVT mutation-associated central domain of the cardiac ryanodine receptor.
Blayney, Lynda; Beck, Konrad; MacDonald, Ewan; D'Cruz, Leon; Nomikos, Michail; Griffiths, Julia; Thanassoulas, Angelos; Nounesis, George; Lai, F Anthony
2013-10-01
This study was designed to determine whether the cardiac ryanodine receptor (RyR2) central domain, a region associated with catecholamine polymorphic ventricular tachycardia (CPVT) mutations, interacts with the RyR2 regulators, ATP and the FK506-binding protein 12.6 (FKBP12.6). Wild-type (WT) RyR2 central domain constructs (G(2236)to G(2491)) and those containing the CPVT mutations P2328S and N2386I, were expressed as recombinant proteins. Folding and stability of the proteins were examined by circular dichroism (CD) spectroscopy and guanidine hydrochloride chemical denaturation. The far-UV CD spectra showed a soluble stably-folded protein with WT and mutant proteins exhibiting a similar secondary structure. Chemical denaturation analysis also confirmed a stable protein for both WT and mutant constructs with similar two-state unfolding. ATP and caffeine binding was measured by fluorescence spectroscopy. Both ATP and caffeine bound with an EC50 of ~200-400μM, and the affinity was the same for WT and mutant constructs. Sequence alignment with other ATP binding proteins indicated the RyR2 central domain contains the signature of an ATP binding pocket. Interaction of the central domain with FKBP12.6 was tested by glutaraldehyde cross-linking and no association was found. The RyR2 central domain, expressed as a 'correctly' folded recombinant protein, bound ATP in accord with bioinformatics evidence of conserved ATP binding sequence motifs. An interaction with FKBP12.6 was not evident. CPVT mutations did not disrupt the secondary structure nor binding to ATP. Part of the RyR2 central domain CPVT mutation cluster, can be expressed independently with retention of ATP binding. Copyright © 2013 Elsevier B.V. All rights reserved.
The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element.
Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko
2013-07-01
AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5'-NNCCAC-3' and 5'-GCGMGN'N'-3' (M:A or C; N and N' form Watson-Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences.
The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element
Fukunaga, Junichi; Nomura, Yusuke; Tanaka, Yoichiro; Amano, Ryo; Tanaka, Taku; Nakamura, Yoshikazu; Kawai, Gota; Sakamoto, Taiichi; Kozu, Tomoko
2013-01-01
AML1 (RUNX1) is a key transcription factor for hematopoiesis that binds to the Runt-binding double-stranded DNA element (RDE) of target genes through its N-terminal Runt domain. Aberrations in the AML1 gene are frequently found in human leukemia. To better understand AML1 and its potential utility for diagnosis and therapy, we obtained RNA aptamers that bind specifically to the AML1 Runt domain. Enzymatic probing and NMR analyses revealed that Apt1-S, which is a truncated variant of one of the aptamers, has a CACG tetraloop and two stem regions separated by an internal loop. All the isolated aptamers were found to contain the conserved sequence motif 5′-NNCCAC-3′ and 5′-GCGMGN′N′-3′ (M:A or C; N and N′ form Watson–Crick base pairs). The motif contains one AC mismatch and one base bulged out. Mutational analysis of Apt1-S showed that three guanines of the motif are important for Runt binding as are the three guanines of RDE, which are directly recognized by three arginine residues of the Runt domain. Mutational analyses of the Runt domain revealed that the amino acid residues used for Apt1-S binding were similar to those used for RDE binding. Furthermore, the aptamer competed with RDE for binding to the Runt domain in vitro. These results demonstrated that the Runt domain of the AML1 protein binds to the motif of the aptamer that mimics DNA. Our findings should provide new insights into RNA function and utility in both basic and applied sciences. PMID:23709277
Linke, Christian; Siemens, Nikolai; Middleditch, Martin J; Kreikemeyer, Bernd; Baker, Edward N
2012-07-01
The extracellular protein Epf from Streptococcus pyogenes is important for streptococcal adhesion to human epithelial cells. However, Epf has no sequence identity to any protein of known structure or function. Thus, several predicted domains of the 205 kDa protein Epf were cloned separately and expressed in Escherichia coli. The N-terminal domain of Epf was crystallized in space groups P2(1) and P2(1)2(1)2(1) in the presence of the protease chymotrypsin. Mass spectrometry showed that the species crystallized corresponded to a fragment comprising residues 52-357 of Epf. Complete data sets were collected to 2.0 and 1.6 Å resolution, respectively, at the Australian Synchrotron.
PrionScan: an online database of predicted prion domains in complete proteomes.
Espinosa Angarica, Vladimir; Angulo, Alfonso; Giner, Arturo; Losilla, Guillermo; Ventura, Salvador; Sancho, Javier
2014-02-05
Prions are a particular type of amyloids related to a large variety of important processes in cells, but also responsible for serious diseases in mammals and humans. The number of experimentally characterized prions is still low and corresponds to a handful of examples in microorganisms and mammals. Prion aggregation is mediated by specific protein domains with a remarkable compositional bias towards glutamine/asparagine and against charged residues and prolines. These compositional features have been used to predict new prion proteins in the genomes of different organisms. Despite these efforts, there are only a few available data sources containing prion predictions at a genomic scale. Here we present PrionScan, a new database of predicted prion-like domains in complete proteomes. We have previously developed a predictive methodology to identify and score prionogenic stretches in protein sequences. In the present work, we exploit this approach to scan all the protein sequences in public databases and compile a repository containing relevant information of proteins bearing prion-like domains. The database is updated regularly alongside UniprotKB and in its present version contains approximately 28000 predictions in proteins from different functional categories in more than 3200 organisms from all the taxonomic subdivisions. PrionScan can be used in two different ways: database query and analysis of protein sequences submitted by the users. In the first mode, simple queries allow to retrieve a detailed description of the properties of a defined protein. Queries can also be combined to generate more complex and specific searching patterns. In the second mode, users can submit and analyze their own sequences. It is expected that this database would provide relevant insights on prion functions and regulation from a genome-wide perspective, allowing researches performing cross-species prion biology studies. Our database might also be useful for guiding experimentalists in the identification of new candidates for further experimental characterization.
Atkinson, Gemma C.; Tenson, Tanel; Hauryliuk, Vasili
2011-01-01
RelA/SpoT Homologue (RSH) proteins, named for their sequence similarity to the RelA and SpoT enzymes of Escherichia coli, comprise a superfamily of enzymes that synthesize and/or hydrolyze the alarmone ppGpp, activator of the “stringent” response and regulator of cellular metabolism. The classical “long” RSHs Rel, RelA and SpoT with the ppGpp hydrolase, synthetase, TGS and ACT domain architecture have been found across diverse bacteria and plant chloroplasts, while dedicated single domain ppGpp-synthesizing and -hydrolyzing RSHs have also been discovered in disparate bacteria and animals respectively. However, there is considerable confusion in terms of nomenclature and no comprehensive phylogenetic and sequence analyses have previously been carried out to classify RSHs on a genomic scale. We have performed high-throughput sensitive sequence searching of over 1000 genomes from across the tree of life, in combination with phylogenetic analyses to consolidate previous ad hoc identification of diverse RSHs in different organisms and provide a much-needed unifying terminology for the field. We classify RSHs into 30 subgroups comprising three groups: long RSHs, small alarmone synthetases (SASs), and small alarmone hydrolases (SAHs). Members of nineteen previously unidentified RSH subgroups can now be studied experimentally, including previously unknown RSHs in archaea, expanding the “stringent response” to this domain of life. We have analyzed possible combinations of RSH proteins and their domains in bacterial genomes and compared RSH content with available RSH knock-out data for various organisms to determine the rules of combining RSHs. Through comparative sequence analysis of long and small RSHs, we find exposed sites limited in conservation to the long RSHs that we propose are involved in transmitting regulatory signals. Such signals may be transmitted via NTD to CTD intra-molecular interactions, or inter-molecular interactions either among individual RSH molecules or among long RSHs and other binding partners such as the ribosome. PMID:21858139
Graham, Kate L.; Halasz, Peter; Tan, Yan; Hewish, Marilyn J.; Takada, Yoshikazu; Mackow, Erich R.; Robinson, Martyn K.; Coulson, Barbara S.
2003-01-01
Integrins α2β1, αXβ2, and αVβ3 have been implicated in rotavirus cell attachment and entry. The virus spike protein VP4 contains the α2β1 ligand sequence DGE at amino acid positions 308 to 310, and the outer capsid protein VP7 contains the αXβ2 ligand sequence GPR. To determine the viral proteins and sequences involved and to define the roles of α2β1, αXβ2, and αVβ3, we analyzed the ability of rotaviruses and their reassortants to use these integrins for cell binding and infection and the effect of peptides DGEA and GPRP on these events. Many laboratory-adapted human, monkey, and bovine viruses used integrins, whereas all porcine viruses were integrin independent. The integrin-using rotavirus strains each interacted with all three integrins. Integrin usage related to VP4 serotype independently of sialic acid usage. Analysis of rotavirus reassortants and assays of virus binding and infectivity in integrin-transfected cells showed that VP4 bound α2β1, and VP7 interacted with αXβ2 and αVβ3 at a postbinding stage. DGEA inhibited rotavirus binding to α2β1 and infectivity, whereas GPRP binding to αXβ2 inhibited infectivity but not binding. The truncated VP5* subunit of VP4, expressed as a glutathione S-transferase fusion protein, bound the expressed α2 I domain. Alanine mutagenesis of D308 and G309 in VP5* eliminated VP5* binding to the α2 I domain. In a novel process, integrin-using viruses bind the α2 I domain of α2β1 via DGE in VP4 and interact with αXβ2 (via GPR) and αVβ3 by using VP7 to facilitate cell entry and infection. PMID:12941907
Zehner, R; Zimmermann, S; Mebs, D
1998-01-01
To identify common animal species by analysis of the cytochrome b gene a method has been developed to obtain PCR products of a large domain of the cytochrome b gene (981 bp out of 1140 bp) in humans, selected mammals and birds using the same specifically designed primers. Species-specific RFLP patterns are generated by co-restriction with the restriction endonucleases ALU I and NCO I. The RFLP patterns obtained are conclusive even in mixtures of two or more species. The results were confirmed by sequence analysis which in addition explained intraspecies variations in the RFLP patterns. The method has been applied to forensic casework studies where the origin of roasted meat, stomach contents and a bone sample has been successfully identified.
Takeshita, S; Kikuno, R; Tezuka, K; Amann, E
1993-01-01
A cDNA library prepared from the mouse osteoblastic cell line MC3T3-E1 was screened for the presence of specifically expressed genes by employing a combined subtraction hybridization/differential screening approach. A cDNA was identified and sequenced which encodes a protein designated osteoblast-specific factor 2 (OSF-2) comprising 811 amino acids. OSF-2 has a typical signal sequence, followed by a cysteine-rich domain, a fourfold repeated domain and a C-terminal domain. The protein lacks a typical transmembrane region. The fourfold repeated domain of OSF-2 shows homology with the insect protein fasciclin I. RNA analyses revealed that OSF-2 is expressed in bone and to a lesser extent in lung, but not in other tissues. Mouse OSF-2 cDNA was subsequently used as a probe to clone the human counterpart. Mouse and human OSF-2 show a high amino acid sequence conservation except for the signal sequence and two regions in the C-terminal domain in which 'in-frame' insertions or deletions are observed, implying alternative splicing events. On the basis of the amino acid sequence homology with fasciclin I, we suggest that OSF-2 functions as a homophilic adhesion molecule in bone formation. Images Figure 3 Figure 4 Figure 5 Figure 6 PMID:8363580
Tsuchiya, Karen D.; Greally, John M.; Yi, Yajun; Noel, Kevin P.; Truong, Jean-Pierre; Disteche, Christine M.
2004-01-01
We have performed X-inactivation and sequence analyses on 350 kb of sequence from human Xp11.2, a region shown previously to contain a cluster of genes that escape X inactivation, and we compared this region with the region of conserved synteny in mouse. We identified several new transcripts from this region in human and in mouse, which defined the full extent of the domain escaping X inactivation in both species. In human, escape from X inactivation involves an uninterrupted 235-kb domain of multiple genes. Despite highly conserved gene content and order between the two species, Smcx is the only mouse gene from the conserved segment that escapes inactivation. As repetitive sequences are believed to facilitate spreading of X inactivation along the chromosome, we compared the repetitive sequence composition of this region between the two species. We found that long terminal repeats (LTRs) were decreased in the human domain of escape, but not in the majority of the conserved mouse region adjacent to Smcx in which genes were subject to X inactivation, suggesting that these repeats might be excluded from escape domains to prevent spreading of silencing. Our findings indicate that genomic context, as well as gene-specific regulatory elements, interact to determine expression of a gene from the inactive X-chromosome. PMID:15197169
Bouchard, P; Chomilier, J; Ravet, V; Mornon, J P; Viguès, B
2001-01-01
Epiplasmin C is the major protein component of the membrane skeleton in the ciliate Tetrahymena pyriformis. Cloning and analysis of the gene encoding epiplasmin C showed this protein to be a previously unrecognized protein. In particular, epiplasmin C was shown to lack the canonical features of already known epiplasmic proteins in ciliates and flagellates. By means of hydrophobic cluster analysis (HCA), it has been shown that epiplasmin C is constituted of a repeat of 25 domains of 40 residues each. These domains are related and can be grouped in two families called types I and types II. Connections between types I and types II present rules that can be evidenced in the sequence itself, thus enforcing the validity of the splitting of the domains. Using these repeated domains as queries, significant structural similarities were demonstrated with an extra six heptads shared by nuclear lamins and invertebrate cytoplasmic intermediate filament proteins and deleted in the cytoplasmic intermediate filament protein lineage at the protostome-deuterostome branching in the eukaryotic phylogenetic tree.
The Leptospiral Antigen Lp49 is a Two-Domain Protein with Putative Protein Binding Function
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oliveira Giuseppe,P.; Oliveira Neves, F.; Nascimento, A.
2008-01-01
Pathogenic Leptospira is the etiological agent of leptospirosis, a life-threatening disease that affects populations worldwide. Currently available vaccines have limited effectiveness and therapeutic interventions are complicated by the difficulty in making an early diagnosis of leptospirosis. The genome of Leptospira interrogans was recently sequenced and comparative genomic analysis contributed to the identification of surface antigens, potential candidates for development of new vaccines and serodiagnosis. Lp49 is a membrane-associated protein recognized by antibodies present in sera from early and convalescent phases of leptospirosis patients. Its crystal structure was determined by single-wavelength anomalous diffraction using selenomethionine-labelled crystals and refined at 2.0 Angstromsmore » resolution. Lp49 is composed of two domains and belongs to the all-beta-proteins class. The N-terminal domain folds in an immunoglobulin-like beta-sandwich structure, whereas the C-terminal domain presents a seven-bladed beta-propeller fold. Structural analysis of Lp49 indicates putative protein-protein binding sites, suggesting a role in Leptospira-host interaction. This is the first crystal structure of a leptospiral antigen described to date.« less
Experimentation in machine discovery
NASA Technical Reports Server (NTRS)
Kulkarni, Deepak; Simon, Herbert A.
1990-01-01
KEKADA, a system that is capable of carrying out a complex series of experiments on problems from the history of science, is described. The system incorporates a set of experimentation strategies that were extracted from the traces of the scientists' behavior. It focuses on surprises to constrain its search, and uses its strategies to generate hypotheses and to carry out experiments. Some strategies are domain independent, whereas others incorporate knowledge of a specific domain. The domain independent strategies include magnification, determining scope, divide and conquer, factor analysis, and relating different anomalous phenomena. KEKADA represents an experiment as a set of independent and dependent entities, with apparatus variables and a goal. It represents a theory either as a sequence of processes or as abstract hypotheses. KEKADA's response is described to a particular problem in biochemistry. On this and other problems, the system is capable of carrying out a complex series of experiments to refine domain theories. Analysis of the system and its behavior on a number of different problems has established its generality, but it has also revealed the reasons why the system would not be a good experimental scientist.
Hybrid and Rogue Kinases Encoded in the Genomes of Model Eukaryotes
Rakshambikai, Ramaswamy; Gnanavel, Mutharasu; Srinivasan, Narayanaswamy
2014-01-01
The highly modular nature of protein kinases generates diverse functional roles mediated by evolutionary events such as domain recombination, insertion and deletion of domains. Usually domain architecture of a kinase is related to the subfamily to which the kinase catalytic domain belongs. However outlier kinases with unusual domain architectures serve in the expansion of the functional space of the protein kinase family. For example, Src kinases are made-up of SH2 and SH3 domains in addition to the kinase catalytic domain. A kinase which lacks these two domains but retains sequence characteristics within the kinase catalytic domain is an outlier that is likely to have modes of regulation different from classical src kinases. This study defines two types of outlier kinases: hybrids and rogues depending on the nature of domain recombination. Hybrid kinases are those where the catalytic kinase domain belongs to a kinase subfamily but the domain architecture is typical of another kinase subfamily. Rogue kinases are those with kinase catalytic domain characteristic of a kinase subfamily but the domain architecture is typical of neither that subfamily nor any other kinase subfamily. This report provides a consolidated set of such hybrid and rogue kinases gleaned from six eukaryotic genomes–S.cerevisiae, D. melanogaster, C.elegans, M.musculus, T.rubripes and H.sapiens–and discusses their functions. The presence of such kinases necessitates a revisiting of the classification scheme of the protein kinase family using full length sequences apart from classical classification using solely the sequences of kinase catalytic domains. The study of these kinases provides a good insight in engineering signalling pathways for a desired output. Lastly, identification of hybrids and rogues in pathogenic protozoa such as P.falciparum sheds light on possible strategies in host-pathogen interactions. PMID:25255313
Comparative sequence analysis suggests a conserved gating mechanism for TRP channels
Palovcak, Eugene; Delemotte, Lucie; Klein, Michael L.
2015-01-01
The transient receptor potential (TRP) channel superfamily plays a central role in transducing diverse sensory stimuli in eukaryotes. Although dissimilar in sequence and domain organization, all known TRP channels act as polymodal cellular sensors and form tetrameric assemblies similar to those of their distant relatives, the voltage-gated potassium (Kv) channels. Here, we investigated the related questions of whether the allosteric mechanism underlying polymodal gating is common to all TRP channels, and how this mechanism differs from that underpinning Kv channel voltage sensitivity. To provide insight into these questions, we performed comparative sequence analysis on large, comprehensive ensembles of TRP and Kv channel sequences, contextualizing the patterns of conservation and correlation observed in the TRP channel sequences in light of the well-studied Kv channels. We report sequence features that are specific to TRP channels and, based on insight from recent TRPV1 structures, we suggest a model of TRP channel gating that differs substantially from the one mediating voltage sensitivity in Kv channels. The common mechanism underlying polymodal gating involves the displacement of a defect in the H-bond network of S6 that changes the orientation of the pore-lining residues at the hydrophobic gate. PMID:26078053
Kobayashi, Michie; Hiraka, Yukie; Abe, Akira; Yaegashi, Hiroki; Natsume, Satoshi; Kikuchi, Hideko; Takagi, Hiroki; Saitoh, Hiromasa; Win, Joe; Kamoun, Sophien; Terauchi, Ryohei
2017-11-22
Downy mildew, caused by the oomycete pathogen Sclerospora graminicola, is an economically important disease of Gramineae crops including foxtail millet (Setaria italica). Plants infected with S. graminicola are generally stunted and often undergo a transformation of flower organs into leaves (phyllody or witches' broom), resulting in serious yield loss. To establish the molecular basis of downy mildew disease in foxtail millet, we carried out whole-genome sequencing and an RNA-seq analysis of S. graminicola. Sequence reads were generated from S. graminicola using an Illumina sequencing platform and assembled de novo into a draft genome sequence comprising approximately 360 Mbp. Of this sequence, 73% comprised repetitive elements, and a total of 16,736 genes were predicted from the RNA-seq data. The predicted genes included those encoding effector-like proteins with high sequence similarity to those previously identified in other oomycete pathogens. Genes encoding jacalin-like lectin-domain-containing secreted proteins were enriched in S. graminicola compared to other oomycetes. Of a total of 1220 genes encoding putative secreted proteins, 91 significantly changed their expression levels during the infection of plant tissues compared to the sporangia and zoospore stages of the S. graminicola lifecycle. We established the draft genome sequence of a downy mildew pathogen that infects Gramineae plants. Based on this sequence and our transcriptome analysis, we generated a catalog of in planta-induced candidate effector genes, providing a solid foundation from which to identify the effectors causing phyllody.
Devailly, Guillaume; Mantsoki, Anna; Joshi, Anagha
2016-11-01
Better protocols and decreasing costs have made high-throughput sequencing experiments now accessible even to small experimental laboratories. However, comparing one or few experiments generated by an individual lab to the vast amount of relevant data freely available in the public domain might be limited due to lack of bioinformatics expertise. Though several tools, including genome browsers, allow such comparison at a single gene level, they do not provide a genome-wide view. We developed Heat*seq, a web-tool that allows genome scale comparison of high throughput experiments chromatin immuno-precipitation followed by sequencing, RNA-sequencing and Cap Analysis of Gene Expression) provided by a user, to the data in the public domain. Heat*seq currently contains over 12 000 experiments across diverse tissues and cell types in human, mouse and drosophila. Heat*seq displays interactive correlation heatmaps, with an ability to dynamically subset datasets to contextualize user experiments. High quality figures and tables are produced and can be downloaded in multiple formats. Web application: http://www.heatstarseq.roslin.ed.ac.uk/ Source code: https://github.com/gdevailly CONTACT: Guillaume.Devailly@roslin.ed.ac.uk or Anagha.Joshi@roslin.ed.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Isolation of a cDNA Encoding a Granule-Bound 152-Kilodalton Starch-Branching Enzyme in Wheat1
Båga, Monica; Nair, Ramesh B.; Repellin, Anne; Scoles, Graham J.; Chibbar, Ravindra N.
2000-01-01
Screening of a wheat (Triticum aestivum) cDNA library for starch-branching enzyme I (SBEI) genes combined with 5′-rapid amplification of cDNA ends resulted in isolation of a 4,563-bp composite cDNA, Sbe1c. Based on sequence alignment to characterized SBEI cDNA clones isolated from plants, the SBEIc predicted from the cDNA sequence was produced with a transit peptide directing the polypeptide into plastids. Furthermore, the predicted mature form of SBEIc was much larger (152 kD) than previously characterized plant SBEI (80–100 kD) and contained a partial duplication of SBEI sequences. The first SBEI domain showed high amino acid similarity to a 74-kD wheat SBEI-like protein that is inactive as a branching enzyme when expressed in Escherichia coli. The second SBEI domain on SBEIc was identical in sequence to a functional 87-kD SBEI produced in the wheat endosperm. Immunoblot analysis of proteins produced in developing wheat kernels demonstrated that the 152-kD SBEIc was, in contrast to the 87- to 88-kD SBEI, preferentially associated with the starch granules. Proteins similar in size and recognized by wheat SBEI antibodies were also present in Triticum monococcum, Triticum tauschii, and Triticum turgidum subsp. durum. PMID:10982440
Identification of a novel vitivirus from grapevines in New Zealand.
Blouin, Arnaud G; Keenan, Sandi; Napier, Kathryn R; Barrero, Roberto A; MacDiarmid, Robin M
2018-01-01
We report a sequence of a novel vitivirus from Vitis vinifera obtained using two high-throughput sequencing (HTS) strategies on RNA. The initial discovery from small-RNA sequencing was confirmed by HTS of the total RNA and Sanger sequencing. The new virus has a genome structure similar to the one reported for other vitiviruses, with five open reading frames (ORFs) coding for the conserved domains described for members of that genus. Phylogenetic analysis of the complete genome sequence confirmed its affiliation to the genus Vitivirus, with the closest described viruses being grapevine virus E (GVE) and Agave tequilana leaf virus (ATLV). However, the virus we report is distinct and shares only 51% amino acid sequence identity with GVE in the replicase polyprotein and 66.8% amino acid sequence identity with ATLV in the coat protein. This is well below the threshold determined by the ICTV for species demarcation, and we propose that this virus represents a new species. It is provisionally named "grapevine virus G".
MOCASSIN-prot: a multi-objective clustering approach for protein similarity networks.
Keel, Brittney N; Deng, Bo; Moriyama, Etsuko N
2018-04-15
Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families. MOCASSIN-prot, implemented in Perl and Matlab, is freely available at http://bioinfolab.unl.edu/emlab/MOCASSINprot. emoriyama2@unl.edu. Supplementary data are available at Bioinformatics online.
Delport, Wayne; Ferguson, J Willem H; Bloomer, Paulette
2002-06-01
We determined the mitochondrial DNA control region sequences of six Bucerotiformes. Hornbills have the typical avian gene order and their control region is similar to other avian control regions in that it is partitioned into three domains: two variable domains that flank a central conserved domain. Two characteristics of the hornbill control region sequence differ from that of other birds. First, domain I is AT rich as opposed to AC rich, and second, the control region is approximately 500 bp longer than that of other birds. Both these deviations from typical avian control region sequence are explainable on the basis of repeat motifs in domain I of the hornbill control region. The repeat motifs probably originated from a duplication of CSB-1 as has been determined in chicken, quail, and snowgoose. Furthermore, the hornbill repeat motifs probably arose before the divergence of hornbills from each other but after the divergence of hornbills from other avian taxa. The mitochondrial control region of hornbills is suitable for both phylogenetic and population studies, with domains I and II probably more suited to population and phylogenetic analyses, respectively.
2010-01-01
Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT). Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the repeat may be disseminated by HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches. PMID:20626840
Smith, Colin A; Kortemme, Tanja
2011-01-01
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.
ASGARD: an open-access database of annotated transcriptomes for emerging model arthropod species.
Zeng, Victor; Extavour, Cassandra G
2012-01-01
The increased throughput and decreased cost of next-generation sequencing (NGS) have shifted the bottleneck genomic research from sequencing to annotation, analysis and accessibility. This is particularly challenging for research communities working on organisms that lack the basic infrastructure of a sequenced genome, or an efficient way to utilize whatever sequence data may be available. Here we present a new database, the Assembled Searchable Giant Arthropod Read Database (ASGARD). This database is a repository and search engine for transcriptomic data from arthropods that are of high interest to multiple research communities but currently lack sequenced genomes. We demonstrate the functionality and utility of ASGARD using de novo assembled transcriptomes from the milkweed bug Oncopeltus fasciatus, the cricket Gryllus bimaculatus and the amphipod crustacean Parhyale hawaiensis. We have annotated these transcriptomes to assign putative orthology, coding region determination, protein domain identification and Gene Ontology (GO) term annotation to all possible assembly products. ASGARD allows users to search all assemblies by orthology annotation, GO term annotation or Basic Local Alignment Search Tool. User-friendly features of ASGARD include search term auto-completion suggestions based on database content, the ability to download assembly product sequences in FASTA format, direct links to NCBI data for predicted orthologs and graphical representation of the location of protein domains and matches to similar sequences from the NCBI non-redundant database. ASGARD will be a useful repository for transcriptome data from future NGS studies on these and other emerging model arthropods, regardless of sequencing platform, assembly or annotation status. This database thus provides easy, one-stop access to multi-species annotated transcriptome information. We anticipate that this database will be useful for members of multiple research communities, including developmental biology, physiology, evolutionary biology, ecology, comparative genomics and phylogenomics. Database URL: asgard.rc.fas.harvard.edu.
The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families.
Yooseph, Shibu; Sutton, Granger; Rusch, Douglas B; Halpern, Aaron L; Williamson, Shannon J; Remington, Karin; Eisen, Jonathan A; Heidelberg, Karla B; Manning, Gerard; Li, Weizhong; Jaroszewski, Lukasz; Cieplak, Piotr; Miller, Christopher S; Li, Huiying; Mashiyama, Susan T; Joachimiak, Marcin P; van Belle, Christopher; Chandonia, John-Marc; Soergel, David A; Zhai, Yufeng; Natarajan, Kannan; Lee, Shaun; Raphael, Benjamin J; Bafna, Vineet; Friedman, Robert; Brenner, Steven E; Godzik, Adam; Eisenberg, David; Dixon, Jack E; Taylor, Susan S; Strausberg, Robert L; Frazier, Marvin; Venter, J Craig
2007-03-01
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.
Ran, Kun; Yang, Hongqiang; Sun, Xiaoli; Li, Qiang; Jiang, Qianqian; Zhang, Weiwei; Shen, Wei
2014-05-01
Vacuolar processing enzymes (VPEs) have received considerable attention recently, as they exhibit caspase-1-like cleavage activity and regulate the process of PCD. However, knowledge about their detailed characteristics and structures is relatively limited. In this study, a gamma vacuolar processing enzyme gene, MhVPEγ, has been isolated from the leaves of Malus hupehensis (Ramp) Rehd. var pinyiensis Jiang. MhVPEγ coded-translated protein sequence comprised of 494 amino acids with a signal peptide and a transmembrane helix structure at N-terminal, peptidase_C13 domain, and vacuolar sorting signal at C-terminal. Consequently, genomic walking approach was performed for the isolation of its upstream sequence. Computational analysis demonstrated several motifs of the promoter exhibiting hypothetic MeJA, ABA, and light-induced characteristics, as well as some typical domains universally discovered in promoter, such as TATA-box and CAAT-box. MhVPEγ transcript level was enhanced during wounding treatment, and WUN-motif, as one of the cis-acting regulatory elements existing in the upstream sequence perhaps regulates its expression. In silico-constructed 3D models revealed that MhCPYL successively interacts with MhVPEγ like that of "Induced Fit-Lock and Key" model, providing molecular conformation evidence that CPY is a direct substrate of VPEγ. This study is the first stride to understand the molecular mechanism of VPEγ and CPYL interactions.