protein sequence analyses: Topics by Science.gov

Sample records for protein sequence analyses

MALDI Top-Down sequencing: calling N- and C-terminal protein sequences with high confidence and speed.

PubMed

Suckau, Detlev; Resemann, Anja

2009-12-01

The ability to match Top-Down protein sequencing (TDS) results by MALDI-TOF to protein sequences by classical protein database searching was evaluated in this work. Resulting from these analyses were the protein identity, the simultaneous assignment of the N- and C-termini and protein sequences of up to 70 residues from either terminus. In combination with de novo sequencing using the MALDI-TDS data, even fusion proteins were assigned and the detailed sequence around the fusion site was elucidated. MALDI-TDS allowed to efficiently match protein sequences quickly and to validate recombinant protein structures-in particular, protein termini-on the level of undigested proteins.
On the Role of Aggregation Prone Regions in Protein Evolution, Stability, and Enzymatic Catalysis: Insights from Diverse Analyses

PubMed Central

Buck, Patrick M.; Kumar, Sandeep; Singh, Satish K.

2013-01-01

The various roles that aggregation prone regions (APRs) are capable of playing in proteins are investigated here via comprehensive analyses of multiple non-redundant datasets containing randomly generated amino acid sequences, monomeric proteins, intrinsically disordered proteins (IDPs) and catalytic residues. Results from this study indicate that the aggregation propensities of monomeric protein sequences have been minimized compared to random sequences with uniform and natural amino acid compositions, as observed by a lower average aggregation propensity and fewer APRs that are shorter in length and more often punctuated by gate-keeper residues. However, evidence for evolutionary selective pressure to disrupt these sequence regions among homologous proteins is inconsistent. APRs are less conserved than average sequence identity among closely related homologues (≥80% sequence identity with a parent) but APRs are more conserved than average sequence identity among homologues that have at least 50% sequence identity with a parent. Structural analyses of APRs indicate that APRs are three times more likely to contain ordered versus disordered residues and that APRs frequently contribute more towards stabilizing proteins than equal length segments from the same protein. Catalytic residues and APRs were also found to be in structural contact significantly more often than expected by random chance. Our findings suggest that proteins have evolved by optimizing their risk of aggregation for cellular environments by both minimizing aggregation prone regions and by conserving those that are important for folding and function. In many cases, these sequence optimizations are insufficient to develop recombinant proteins into commercial products. Rational design strategies aimed at improving protein solubility for biotechnological purposes should carefully evaluate the contributions made by candidate APRs, targeted for disruption, towards protein structure and activity. PMID:24146608
Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

PubMed

Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

2014-09-18

Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.
MannDB: A microbial annotation database for protein characterization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, C; Lam, M; Smith, J

2006-05-19

MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-sourcemore » tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins) are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. MannDB comprises a large number of genomes and comprehensive protein sequence analyses representing organisms listed as high-priority agents on the websites of several governmental organizations concerned with bio-terrorism. MannDB provides the user with a BLAST interface for comparison of native and non-native sequences and a query tool for conveniently selecting proteins of interest. In addition, the user has access to a web-based browser that compiles comprehensive and extensive reports.« less
Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades

PubMed Central

2009-01-01

Background Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. Results To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Conclusion Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences. PMID:19821996
Tardigrade workbench: comparing stress-related proteins, sequence-similar and functional protein clusters as well as RNA elements in tardigrades.

PubMed

Förster, Frank; Liang, Chunguang; Shkumatov, Alexander; Beisser, Daniela; Engelmann, Julia C; Schnölzer, Martina; Frohme, Marcus; Müller, Tobias; Schill, Ralph O; Dandekar, Thomas

2009-10-12

Tardigrades represent an animal phylum with extraordinary resistance to environmental stress. To gain insights into their stress-specific adaptation potential, major clusters of related and similar proteins are identified, as well as specific functional clusters delineated comparing all tardigrades and individual species (Milnesium tardigradum, Hypsibius dujardini, Echiniscus testudo, Tulinus stephaniae, Richtersius coronifer) and functional elements in tardigrade mRNAs are analysed. We find that 39.3% of the total sequences clustered in 58 clusters of more than 20 proteins. Among these are ten tardigrade specific as well as a number of stress-specific protein clusters. Tardigrade-specific functional adaptations include strong protein, DNA- and redox protection, maintenance and protein recycling. Specific regulatory elements regulate tardigrade mRNA stability such as lox P DICE elements whereas 14 other RNA elements of higher eukaryotes are not found. Further features of tardigrade specific adaption are rapidly identified by sequence and/or pattern search on the web-tool tardigrade analyzer http://waterbear.bioapps.biozentrum.uni-wuerzburg.de. The work-bench offers nucleotide pattern analysis for promotor and regulatory element detection (tardigrade specific; nrdb) as well as rapid COG search for function assignments including species-specific repositories of all analysed data. Different protein clusters and regulatory elements implicated in tardigrade stress adaptations are analysed including unpublished tardigrade sequences.
Using SQL Databases for Sequence Similarity Searching and Analysis.

PubMed

Pearson, William R; Mackey, Aaron J

2017-09-13

Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

PubMed

Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

2017-04-15

Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. Copyright © 2017 Sinclair et al.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

PubMed Central

Sinclair, Robert M.; Ravantti, Janne J.

2017-01-01

ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. PMID:28122979
Comparative analyses of putative toxin gene homologs from an Old World viper, Daboia russelii

PubMed Central

Krishnan, Neeraja M.

2017-01-01

Availability of snake genome sequences has opened up exciting areas of research on comparative genomics and gene diversity. One of the challenges in studying snake genomes is the acquisition of biological material from live animals, especially from the venomous ones, making the process cumbersome and time-consuming. Here, we report comparative sequence analyses of putative toxin gene homologs from Russell’s viper (Daboia russelii) using whole-genome sequencing data obtained from shed skin. When compared with the major venom proteins in Russell’s viper studied previously, we found 45–100% sequence similarity between the venom proteins and their putative homologs in the skin. Additionally, comparative analyses of 20 putative toxin gene family homologs provided evidence of unique sequence motifs in nerve growth factor (NGF), platelet derived growth factor (PDGF), Kunitz/Bovine pancreatic trypsin inhibitor (Kunitz BPTI), cysteine-rich secretory proteins, antigen 5, andpathogenesis-related1 proteins (CAP) and cysteine-rich secretory protein (CRISP). In those derived proteins, we identified V11 and T35 in the NGF domain; F23 and A29 in the PDGF domain; N69, K2 and A5 in the CAP domain; and Q17 in the CRISP domain to be responsible for differences in the largest pockets across the protein domain structures in crotalines, viperines and elapids from the in silico structure-based analysis. Similarly, residues F10, Y11 and E20 appear to play an important role in the protein structures across the kunitz protein domain of viperids and elapids. Our study highlights the usefulness of shed skin in obtaining good quality high-molecular weight DNA for comparative genomic studies, and provides evidence towards the unique features and evolution of putative venom gene homologs in vipers. PMID:29230357
Stratification of co-evolving genomic groups using ranked phylogenetic profiles

PubMed Central

Freilich, Shiri; Goldovsky, Leon; Gottlieb, Assaf; Blanc, Eric; Tsoka, Sophia; Ouzounis, Christos A

2009-01-01

Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples. PMID:19860884
Complete genome sequence of 285P, a novel T7-like polyvalent E. coli bacteriophage.

PubMed

Xu, Bin; Ma, Xiangyu; Xiong, Hongyan; Li, Yafei

2014-06-01

Bacteriophages are considered potential biological agents for the control of infectious diseases and environmental disinfection. Here, we describe a novel T7-like polyvalent Escherichia coli bacteriophage, designated "285P," which can lyse several strains of E. coli. The genome, which consists of 39,270 base pairs with a G+C content of 48.73 %, was sequenced and annotated. Forty-three potential open reading frames were identified using bioinformatics tools. Based on whole-genome sequence comparison, phage 285P was identified as a novel strain of subgroup T7. It showed strongest sequence similarity to Kluyvera phage Kvp1. The phylogenetic analyses of both non-structural proteins (endonuclease gp3, amidase gp3.5, DNA primase/helicase gp4, DNA polymerase gp5, and exonuclease gp6) and structural protein (tail fiber protein gp17) led to the identification of 285P as T7-like phage. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis and matrix-assisted laser desorption/ionization time-of-flight mass spectrometric analyses verified the annotation of the structural proteins (major capsid protein gp10a, tail protein gp12, and tail fiber protein gp17).
A traditional evolutionary history of foot-and-mouth disease viruses in Southeast Asia challenged by analyses of non-structural protein coding sequences

USDA-ARS?s Scientific Manuscript database

Molecular epidemiology and evolution of foot-and-mouth disease virus (FMDV) are widely studied using genomic sequences encoding VP1, the capsid protein containing the most relevant antigenic domains. Although sequencing of the full viral genome is not used as a routine diagnostic or surveillance too...
HIPPI: highly accurate protein family classification with ensembles of HMMs.

PubMed

Nguyen, Nam-Phuong; Nute, Michael; Mirarab, Siavash; Warnow, Tandy

2016-11-11

Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics. We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy. HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp .
An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

PubMed

Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

2016-02-18

The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through the graphical user interface ( http://compbio.math.hr/ ). Our results show that scanning with a carefully parameterized motif-HMM is an effective approach for annotation of protein families with low sequence similarity and conserved motifs. The results of this study expand current knowledge and provide new insights into the evolution of the large GDSL-lipase family in land plants.
Variability and genetic structure of the population of watermelon mosaic virus infecting melon in Spain.

PubMed

Moreno, I M; Malpica, J M; Díaz-Pendón, J A; Moriones, E; Fraile, A; García-Arenal, F

2004-01-05

The genetic structure of the population of Watermelon mosaic virus (WMV) in Spain was analysed by the biological and molecular characterisation of isolates sampled from its main host plant, melon. The population was a highly homogeneous one, built of a single pathotype, and comprising isolates closely related genetically. There was indication of temporal replacement of genotypes, but not of spatial structure of the population. Analyses of nucleotide sequences in three genomic regions, that is, in the cistrons for the P1, cylindrical inclusion (CI) and capsid (CP) proteins, showed lower similar values of nucleotide diversity for the P1 than for the CI or CP cistrons. The CI protein and the CP were under tighter evolutionary constraints than the P1 protein. Also, for the CI and CP cistrons, but not for the P1 cistron, two groups of sequences, defining two genetic strains, were apparent. Thus, different genomic regions of WMV show different evolutionary dynamics. Interestingly, for the CI and CP cistrons, sequences were clustered into two regions of the sequence space, defining the two strains above, and no intermediary sequences were identified. Recombinant isolates were found, accounting for at least 7% of the population. These recombinants presented two interesting features: (i) crossover points were detected between the analysed regions in the CI and CP cistrons, but not between those in the P1 and CI cistrons, (ii) crossover points were not observed within the analysed coding regions for the P1, CI or CP proteins. This indicates strong selection against isolates with recombinant proteins, even when originated from closely related strains. Hence, data indicate that genotypes of WMV, generated by mutation or recombination, outside of acceptable, discrete, regions in the evolutionary space, are eliminated from the virus population by negative selection.
Phylogenetic Relationship of Necoclí Virus to Other South American Hantaviruses (Bunyaviridae: Hantavirus).

PubMed

Montoya-Ruiz, Carolina; Cajimat, Maria N B; Milazzo, Mary Louise; Diaz, Francisco J; Rodas, Juan David; Valbuena, Gustavo; Fulhorst, Charles F

2015-07-01

The results of a previous study suggested that Cherrie's cane rat (Zygodontomys cherriei) is the principal host of Necoclí virus (family Bunyaviridae, genus Hantavirus) in Colombia. Bayesian analyses of complete nucleocapsid protein gene sequences and complete glycoprotein precursor gene sequences in this study confirmed that Necoclí virus is phylogenetically closely related to Maporal virus, which is principally associated with the delicate pygmy rice rat (Oligoryzomys delicatus) in western Venezuela. In pairwise comparisons, nonidentities between the complete amino acid sequence of the nucleocapsid protein of Necoclí virus and the complete amino acid sequences of the nucleocapsid proteins of other hantaviruses were ≥8.7%. Likewise, nonidentities between the complete amino acid sequence of the glycoprotein precursor of Necoclí virus and the complete amino acid sequences of the glycoprotein precursors of other hantaviruses were ≥11.7%. Collectively, the unique association of Necoclí virus with Z. cherriei in Colombia, results of the Bayesian analyses of complete nucleocapsid protein gene sequences and complete glycoprotein precursor gene sequences, and results of the pairwise comparisons of amino acid sequences strongly support the notion that Necoclí virus represents a novel species in the genus Hantavirus. Further work is needed to determine whether Calabazo virus (a hantavirus associated with Z. brevicauda cherriei in Panama) and Necoclí virus are conspecific.
A Protein Domain and Family Based Approach to Rare Variant Association Analysis.

PubMed

Richardson, Tom G; Shihab, Hashem A; Rivas, Manuel A; McCarthy, Mark I; Campbell, Colin; Timpson, Nicholas J; Gaunt, Tom R

2016-01-01

It has become common practice to analyse large scale sequencing data with statistical approaches based around the aggregation of rare variants within the same gene. We applied a novel approach to rare variant analysis by collapsing variants together using protein domain and family coordinates, regarded to be a more discrete definition of a biologically functional unit. Using Pfam definitions, we collapsed rare variants (Minor Allele Frequency ≤ 1%) together in three different ways 1) variants within single genomic regions which map to individual protein domains 2) variants within two individual protein domain regions which are predicted to be responsible for a protein-protein interaction 3) all variants within combined regions from multiple genes responsible for coding the same protein domain (i.e. protein families). A conventional collapsing analysis using gene coordinates was also undertaken for comparison. We used UK10K sequence data and investigated associations between regions of variants and lipid traits using the sequence kernel association test (SKAT). We observed no strong evidence of association between regions of variants based on Pfam domain definitions and lipid traits. Quantile-Quantile plots illustrated that the overall distributions of p-values from the protein domain analyses were comparable to that of a conventional gene-based approach. Deviations from this distribution suggested that collapsing by either protein domain or gene definitions may be favourable depending on the trait analysed. We have collapsed rare variants together using protein domain and family coordinates to present an alternative approach over collapsing across conventionally used gene-based regions. Although no strong evidence of association was detected in these analyses, future studies may still find value in adopting these approaches to detect previously unidentified association signals.
Multiplex single-molecule interaction profiling of DNA-barcoded proteins.

PubMed

Gu, Liangcai; Li, Chao; Aach, John; Hill, David E; Vidal, Marc; Church, George M

2014-11-27

In contrast with advances in massively parallel DNA sequencing, high-throughput protein analyses are often limited by ensemble measurements, individual analyte purification and hence compromised quality and cost-effectiveness. Single-molecule protein detection using optical methods is limited by the number of spectrally non-overlapping chromophores. Here we introduce a single-molecular-interaction sequencing (SMI-seq) technology for parallel protein interaction profiling leveraging single-molecule advantages. DNA barcodes are attached to proteins collectively via ribosome display or individually via enzymatic conjugation. Barcoded proteins are assayed en masse in aqueous solution and subsequently immobilized in a polyacrylamide thin film to construct a random single-molecule array, where barcoding DNAs are amplified into in situ polymerase colonies (polonies) and analysed by DNA sequencing. This method allows precise quantification of various proteins with a theoretical maximum array density of over one million polonies per square millimetre. Furthermore, protein interactions can be measured on the basis of the statistics of colocalized polonies arising from barcoding DNAs of interacting proteins. Two demanding applications, G-protein coupled receptor and antibody-binding profiling, are demonstrated. SMI-seq enables 'library versus library' screening in a one-pot assay, simultaneously interrogating molecular binding affinity and specificity.
Sequence analyses and evolutionary relationships among the energy-coupling proteins Enzyme I and HPr of the bacterial phosphoenolpyruvate: sugar phosphotransferase system.

PubMed Central

Reizer, J.; Hoischen, C.; Reizer, A.; Pham, T. N.; Saier, M. H.

1993-01-01

We have previously reported the overexpression, purification, and biochemical properties of the Bacillus subtilis Enzyme I of the phosphoenolpyruvate: sugar phosphotransferase system (PTS) (Reizer, J., et al., 1992, J. Biol. Chem. 267, 9158-9169). We now report the sequencing of the ptsI gene of B. subtilis encoding Enzyme I (570 amino acids and 63,076 Da). Putative transcriptional regulatory signals are identified, and the pts operon is shown to be subject to carbon source-dependent regulation. Multiple alignments of the B. subtilis Enzyme I with (1) six other sequenced Enzymes I of the PTS from various bacterial species, (2) phosphoenolpyruvate synthase of Escherichia coli, and (3) bacterial and plant pyruvate: phosphate dikinases (PPDKs) revealed regions of sequence similarity as well as divergence. Statistical analyses revealed that these three types of proteins comprise a homologous family, and the phylogenetic tree of the 11 sequenced protein members of this family was constructed. This tree was compared with that of the 12 sequence HPr proteins or protein domains. Antibodies raised against the B. subtilis and E. coli Enzymes I exhibited immunological cross-reactivity with each other as well as with PPDK of Bacteroides symbiosus, providing support for the evolutionary relationships of these proteins suggested from the sequence comparisons. Putative flexible linkers tethering the N-terminal and the C-terminal domains of protein members of the Enzyme I family were identified, and their potential significance with regard to Enzyme I function is discussed. The codon choice pattern of the B. subtilis and E. coli ptsI and ptsH genes was found to exhibit a bias toward optimal codons in these organisms.(ABSTRACT TRUNCATED AT 250 WORDS) PMID:7686067

Analysis of Ribosome Inactivating Protein (RIP): A Bioinformatics Approach

NASA Astrophysics Data System (ADS)

Jothi, G. Edward Gnana; Majilla, G. Sahaya Jose; Subhashini, D.; Deivasigamani, B.

2012-10-01

In spite of the medical advances in recent years, the world is in need of different sources to encounter certain health issues.Ribosome Inactivating Proteins (RIPs) were found to be one among them. In order to get easy access about RIPs, there is a need to analyse RIPs towards constructing a database on RIPs. Also, multiple sequence alignment was done towards screening for homologues of significant RIPs from rare sources against RIPs from easily available sources in terms of similarity. Protein sequences were retrieved from SWISS-PROT and are further analysed using pair wise and multiple sequence alignment.Analysis shows that, 151 RIPs have been characterized to date. Amongst them, there are 87 type I, 37 type II, 1 type III and 25 unknown RIPs. The sequence length information of various RIPs about the availability of full or partial sequence was also found. The multiple sequence alignment of 37 type I RIP using the online server Multalin, indicates the presence of 20 conserved residues. Pairwise alignment and multiple sequence alignment of certain selected RIPs in two groups namely Group I and Group II were carried out and the consensus level was found to be 98%, 98% and 90% respectively.
Bioinformatic Analysis of Strawberry GSTF12 Gene

NASA Astrophysics Data System (ADS)

Wang, Xiran; Jiang, Leiyu; Tang, Haoru

2018-01-01

GSTF12 has always been known as a key factor of proanthocyanins accumulate in plant testa. Through bioinformatics analysis of the nucleotide and encoded protein sequence of GSTF12, it is more advantageous to the study of genes related to anthocyanin biosynthesis accumulation pathway. Therefore, we chosen GSTF12 gene of 11 kinds species, downloaded their nucleotide and protein sequence from NCBI as the research object, found strawberry GSTF12 gene via bioinformation analyse, constructed phylogenetic tree. At the same time, we analysed the strawberry GSTF12 gene of physical and chemical properties and its protein structure and so on. The phylogenetic tree showed that Strawberry and petunia were closest relative. By the protein prediction, we found that the protein owed one proper signal peptide without obvious transmembrane regions.
Coevolution analysis of Hepatitis C virus genome to identify the structural and functional dependency network of viral proteins

NASA Astrophysics Data System (ADS)

Champeimont, Raphaël; Laine, Elodie; Hu, Shuang-Wei; Penin, Francois; Carbone, Alessandra

2016-05-01

A novel computational approach of coevolution analysis allowed us to reconstruct the protein-protein interaction network of the Hepatitis C Virus (HCV) at the residue resolution. For the first time, coevolution analysis of an entire viral genome was realized, based on a limited set of protein sequences with high sequence identity within genotypes. The identified coevolving residues constitute highly relevant predictions of protein-protein interactions for further experimental identification of HCV protein complexes. The method can be used to analyse other viral genomes and to predict the associated protein interaction networks.
Determinants of the rate of protein sequence evolution

PubMed Central

Zhang, Jianzhi; Yang, Jian-Rong

2015-01-01

The rate and mechanism of protein sequence evolution have been central questions in evolutionary biology since the 1960s. Although the rate of protein sequence evolution depends primarily on the level of functional constraint, exactly what constitutes functional constraint has remained unclear. The increasing availability of genomic data has allowed for much needed empirical examinations on the nature of functional constraint. These studies found that the evolutionary rate of a protein is predominantly influenced by its expression level rather than functional importance. A combination of theoretical and empirical analyses have identified multiple mechanisms behind these observations and demonstrated a prominent role that selection against errors in molecular and cellular processes plays in protein evolution. PMID:26055156
Complete cpDNA genome sequence of Smilax china and phylogenetic placement of Liliales--influences of gene partitions and taxon sampling.

PubMed

Liu, Juan; Qi, Zhe-Chen; Zhao, Yun-Peng; Fu, Cheng-Xin; Jenny Xiang, Qiu-Yun

2012-09-01

The complete nucleotide sequence of the chloroplast genome (cpDNA) of Smilax china L. (Smilacaceae) is reported. It is the first complete cp genome sequence in Liliales. Genomic analyses were conducted to examine the rate and pattern of cpDNA genome evolution in Smilax relative to other major lineages of monocots. The cpDNA genomic sequences were combined with those available for Lilium to evaluate the phylogenetic position of Liliales and to investigate the influence of taxon sampling, gene sampling, gene function, natural selection, and substitution rate on phylogenetic inference in monocots. Phylogenetic analyses using sequence data of gene groups partitioned according to gene function, selection force, and total substitution rate demonstrated evident impacts of these factors on phylogenetic inference of monocots and the placement of Liliales, suggesting potential evolutionary convergence or adaptation of some cpDNA genes in monocots. Our study also demonstrated that reduced taxon sampling reduced the bootstrap support for the placement of Liliales in the cpDNA phylogenomic analysis. Analyses of sequences of 77 protein genes with some missing data and sequences of 81 genes (all protein genes plus the rRNA genes) support a sister relationship of Liliales to the commelinids-Asparagales clade, consistent with the APG III system. Analyses of 63 cpDNA protein genes for 32 taxa with few missing data, however, support a sister relationship of Liliales (represented by Smilax and Lilium) to Dioscoreales-Pandanales. Topology tests indicated that these two alignments do not significantly differ given any of these three cpDNA genomic sequence data sets. Furthermore, we found no saturation effect of the data, suggesting that the cpDNA genomic sequence data used in the study are appropriate for monocot phylogenetic study and long-branch attraction is unlikely to be the cause to explain the result of two well-supported, conflict placements of Liliales. Further analyses using sufficient nuclear data remain necessary to evaluate these two phylogenetic hypotheses regarding the position of Liliales and to address the causes of signal conflict among genes and partitions. Copyright © 2012 Elsevier Inc. All rights reserved.
Ribosomal RNA maturation in Schizosaccharomyces pombe is dependent on a large ribonucleoprotein complex of the internal transcribed spacer 1.

PubMed

Lalev, A I; Abeyrathne, P D; Nazar, R N

2000-09-08

The interdependency of steps in the processing of pre-rRNA in Schizosaccharomyces pombe suggests that RNA processing, at least in part, acts as a quality control mechanism which helps assure that only functional RNA is incorporated into mature ribosomes. To determine further the role of the transcribed spacer regions in rRNA processing and to detect interactions which underlie the interdependencies, the ITS1 sequence was examined for its ability to form ribonucleoprotein complexes with cellular proteins. When incubated with protein extract, the spacer formed a specific large RNP. This complex was stable to fractionation by agarose or polyacrylamide gel electrophoresis. Modification exclusion analyses indicated that the proteins interact with a helical domain which is conserved in the internal transcribed spacers. Mutagenic analyses confirmed an interaction with this sequence and indicated that this domain is critical to the efficient maturation of the precursor RNA. The protein constituents, purified by affinity chromatography using the ITS1 sequence, retained an ability to form stable RNP. Protein analyses of gel purified complex, prepared with affinity-purified proteins, indicated at least 20 protein components ranging in size from 20-200 kDa. Peptide mapping by Maldi-Toff mass spectroscopy identified eight hypothetical RNA binding proteins which included four different RNA-binding motifs. Another protein was putatively identified as a pseudouridylate synthase. Additional RNA constituents were not detected. The significance of this complex with respect to rRNA maturation and interdependence in rRNA processing is discussed. Copyright 2000 Academic Press.
Preservation of protein clefts in comparative models.

PubMed

Piedra, David; Lois, Sergi; de la Cruz, Xavier

2008-01-16

Comparative, or homology, modelling of protein structures is the most widely used prediction method when the target protein has homologues of known structure. Given that the quality of a model may vary greatly, several studies have been devoted to identifying the factors that influence modelling results. These studies usually consider the protein as a whole, and only a few provide a separate discussion of the behaviour of biologically relevant features of the protein. Given the value of the latter for many applications, here we extended previous work by analysing the preservation of native protein clefts in homology models. We chose to examine clefts because of their role in protein function/structure, as they are usually the locus of protein-protein interactions, host the enzymes' active site, or, in the case of protein domains, can also be the locus of domain-domain interactions that lead to the structure of the whole protein. We studied how the largest cleft of a protein varies in comparative models. To this end, we analysed a set of 53507 homology models that cover the whole sequence identity range, with a special emphasis on medium and low similarities. More precisely we examined how cleft quality - measured using six complementary parameters related to both global shape and local atomic environment, depends on the sequence identity between target and template proteins. In addition to this general analysis, we also explored the impact of a number of factors on cleft quality, and found that the relationship between quality and sequence identity varies depending on cleft rank amongst the set of protein clefts (when ordered according to size), and number of aligned residues. We have examined cleft quality in homology models at a range of seq.id. levels. Our results provide a detailed view of how quality is affected by distinct parameters and thus may help the user of comparative modelling to determine the final quality and applicability of his/her cleft models. In addition, the large variability in model quality that we observed within each sequence bin, with good models present even at low sequence identities (between 20% and 30%), indicates that properly developed identification methods could be used to recover good cleft models in this sequence range.
Sequence analyses of fimbriae subunit FimA proteins on Actinomyces naeslundii genospecies 1 and 2 and Actinomyces odontolyticus with variant carbohydrate binding specificities

PubMed Central

Drobni, Mirva; Hallberg, Kristina; Öhman, Ulla; Birve, Anna; Persson, Karina; Johansson, Ingegerd; Strömberg, Nicklas

2006-01-01

Background Actinomyces naeslundii genospecies 1 and 2 express type-2 fimbriae (FimA subunit polymers) with variant Galβ binding specificities and Actinomyces odontolyticus a sialic acid specificity to colonize different oral surfaces. However, the fimbrial nature of the sialic acid binding property and sequence information about FimA proteins from multiple strains are lacking. Results Here we have sequenced fimA genes from strains of A.naeslundii genospecies 1 (n = 4) and genospecies 2 (n = 4), both of which harboured variant Galβ-dependent hemagglutination (HA) types, and from A.odontolyticus PK984 with a sialic acid-dependent HA pattern. Three unique subtypes of FimA proteins with 63.8–66.4% sequence identity were present in strains of A. naeslundii genospecies 1 and 2 and A. odontolyticus. The generally high FimA sequence identity (>97.2%) within a genospecies revealed species specific sequences or segments that coincided with binding specificity. All three FimA protein variants contained a signal peptide, pilin motif, E box, proline-rich segment and an LPXTG sorting motif among other conserved segments for secretion, assembly and sorting of fimbrial proteins. The highly conserved pilin, E box and LPXTG motifs are present in fimbriae proteins from other Gram-positive bacteria. Moreover, only strains of genospecies 1 were agglutinated with type-2 fimbriae antisera derived from A. naeslundii genospecies 1 strain 12104, emphasizing that the overall folding of FimA may generate different functionalities. Western blot analyses with FimA antisera revealed monomers and oligomers of FimA in whole cell protein extracts and a purified recombinant FimA preparation, indicating a sortase-independent oligomerization of FimA. Conclusion The genus Actinomyces involves a diversity of unique FimA proteins with conserved pilin, E box and LPXTG motifs, depending on subspecies and associated binding specificity. In addition, a sortase independent oligomerization of FimA subunit proteins in solution was indicated. PMID:16686953
A comparative genomics perspective on the genetic content of the alkaliphilic haloarchaeon Natrialba magadii ATCC 43099T

PubMed Central

2012-01-01

Background Natrialba magadii is an aerobic chemoorganotrophic member of the Euryarchaeota and is a dual extremophile requiring alkaline conditions and hypersalinity for optimal growth. The genome sequence of Nab. magadii type strain ATCC 43099 was deciphered to obtain a comprehensive insight into the genetic content of this haloarchaeon and to understand the basis of some of the cellular functions necessary for its survival. Results The genome of Nab. magadii consists of four replicons with a total sequence of 4,443,643 bp and encodes 4,212 putative proteins, some of which contain peptide repeats of various lengths. Comparative genome analyses facilitated the identification of genes encoding putative proteins involved in adaptation to hypersalinity, stress response, glycosylation, and polysaccharide biosynthesis. A proton-driven ATP synthase and a variety of putative cytochromes and other proteins supporting aerobic respiration and electron transfer were encoded by one or more of Nab. magadii replicons. The genome encodes a number of putative proteases/peptidases as well as protein secretion functions. Genes encoding putative transcriptional regulators, basal transcription factors, signal perception/transduction proteins, and chemotaxis/phototaxis proteins were abundant in the genome. Pathways for the biosynthesis of thiamine, riboflavin, heme, cobalamin, coenzyme F420 and other essential co-factors were deduced by in depth sequence analyses. However, approximately 36% of Nab. magadii protein coding genes could not be assigned a function based on Blast analysis and have been annotated as encoding hypothetical or conserved hypothetical proteins. Furthermore, despite extensive comparative genomic analyses, genes necessary for survival in alkaline conditions could not be identified in Nab. magadii. Conclusions Based on genomic analyses, Nab. magadii is predicted to be metabolically versatile and it could use different carbon and energy sources to sustain growth. Nab. magadii has the genetic potential to adapt to its milieu by intracellular accumulation of inorganic cations and/or neutral organic compounds. The identification of Nab. magadii genes involved in coenzyme biosynthesis is a necessary step toward further reconstruction of the metabolic pathways in halophilic archaea and other extremophiles. The knowledge gained from the genome sequence of this haloalkaliphilic archaeon is highly valuable in advancing the applications of extremophiles and their enzymes. PMID:22559199
Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

PubMed

Mackey, Aaron J; Pearson, William R

2004-10-01

Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.
Prediction of the protein components of a putative Calanus finmarchicus (Crustacea, Copepoda) circadian signaling system using a de novo assembled transcriptome

PubMed Central

Christie, Andrew E.; Fontanilla, Tiana M.; Nesbit, Katherine T.; Lenz, Petra H.

2013-01-01

Diel vertical migration and seasonal diapause are critical life history events for the copepod Calanus finmarchicus. While much is known about these behaviors phenomenologically, little is known about their molecular underpinnings. Recent studies in insects suggest that some circadian genes/proteins also contribute to the establishment of seasonal diapause. Thus, it is possible that in Calanus these distinct timing regimes share some genetic components. To begin to address this possibility, we used the well-established Drosophila melanogaster circadian system as a reference for mining clock transcripts from a 200,000+ sequence Calanus transcriptome; the proteins encoded by the identified transcripts were also deduced and characterized. Sequences encoding homologs of the Drosophila core clock proteins CLOCK, CYCLE, PERIOD and TIMELESS were identified, as was one encoding CRYPTOCHROME 2, a core clock protein in ancestral insect systems, but absent in Drosophila. Calanus transcripts encoding proteins known to modulate the Drosophila core clock were also identified and characterized, e.g. CLOCKWORK ORANGE, DOUBLETIME, SHAGGY and VRILLE. Alignment and structural analyses of the deduced Calanus proteins with their Drosophila counterparts revealed extensive sequence conservation, particularly in functional domains. Interestingly, reverse BLAST analyses of these sequences against all arthropod proteins typically revealed non-Drosophila isoforms to be most similar to the Calanus queries. This, in combination with the presence of both CRYPTOCHROME 1 (a clock input pathway protein) and CRYPTOCHROME 2 in Calanus, suggests that the organization of the copepod circadian system is an ancestral one, more similar to that of insects like Danaus plexippus than to that of Drosophila. PMID:23727418
DNA Multiple Sequence Alignment Guided by Protein Domains: The MSA-PAD 2.0 Method.

PubMed

Balech, Bachir; Monaco, Alfonso; Perniola, Michele; Santamaria, Monica; Donvito, Giacinto; Vicario, Saverio; Maggi, Giorgio; Pesole, Graziano

2018-01-01

Multiple sequence alignment (MSA) is a fundamental component in many DNA sequence analyses including metagenomics studies and phylogeny inference. When guided by protein profiles, DNA multiple alignments assume a higher precision and robustness. Here we present details of the use of the upgraded version of MSA-PAD (2.0), which is a DNA multiple sequence alignment framework able to align DNA sequences coding for single/multiple protein domains guided by PFAM or user-defined annotations. MSA-PAD has two alignment strategies, called "Gene" and "Genome," accounting for coding domains order and genomic rearrangements, respectively. Novel options were added to the present version, where the MSA can be guided by protein profiles provided by the user. This allows MSA-PAD 2.0 to run faster and to add custom protein profiles sometimes not present in PFAM database according to the user's interest. MSA-PAD 2.0 is currently freely available as a Web application at https://recasgateway.cloud.ba.infn.it/ .
Mathematical methods for protein science

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hart, W.; Istrail, S.; Atkins, J.

1997-12-31

Understanding the structure and function of proteins is a fundamental endeavor in molecular biology. Currently, over 100,000 protein sequences have been determined by experimental methods. The three dimensional structure of the protein determines its function, but there are currently less than 4,000 structures known to atomic resolution. Accordingly, techniques to predict protein structure from sequence have an important role in aiding the understanding of the Genome and the effects of mutations in genetic disease. The authors describe current efforts at Sandia to better understand the structure of proteins through rigorous mathematical analyses of simple lattice models. The efforts have focusedmore » on two aspects of protein science: mathematical structure prediction, and inverse protein folding.« less
The Phyre2 web portal for protein modelling, prediction and analysis

PubMed Central

Kelley, Lawrence A; Mezulis, Stefans; Yates, Christopher M; Wass, Mark N; Sternberg, Michael JE

2017-01-01

Summary Phyre2 is a suite of tools available on the web to predict and analyse protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a protocol. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites, and analyse the effect of amino-acid variants (e.g. nsSNPs) for a user’s protein sequence. Users are guided through results by a simple interface at a level of detail determined by them. This protocol will guide a user from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30mins and 2 hours after submission. PMID:25950237
Molecular cloning and evolutionary analysis of the calcium-modulated contractile protein, centrin, in green algae and land plants.

PubMed

Bhattacharya, D; Steinkötter, J; Melkonian, M

1993-12-01

Centrin (= caltractin) is a ubiquitous, cytoskeletal protein which is a member of the EF-hand superfamily of calcium-binding proteins. A centrin-coding cDNA was isolated and characterized from the prasinophyte green alga Scherffelia dubia. Centrin PCR amplification primers were used to isolate partial, homologous cDNA sequences from the green algae Tetraselmis striata and Spermatozopsis similis. Annealing analyses suggested that centrin is a single-copy-coding region in T. striata and S. similis and other green algae studied. Centrin-coding regions from S. dubia, S. similis and T. striata encode four colinear EF-hand domains which putatively bind calcium. Phylogenetic analyses, including homologous sequences from Chlamydomonas reinhardtii and the land plant Atriplex nummularia, demonstrate that the domains of centrins are congruent and arose from the two-fold duplication of an ancestral EF hand with Domains 1+3 and Domains 2+4 clustering. The domains of centrins are also congruent with those of calmodulins demonstrating that, like calmodulin, centrin is an ancient protein which arose within the ancestor of all eukaryotes via gene duplication. Phylogenetic relationships inferred from centrin-coding region comparisons mirror results of small subunit ribosomal RNA sequence analyses suggesting that centrin-coding regions are useful evolutionary markers within the green algae.
Identification of a Herbal Powder by Deoxyribonucleic Acid Barcoding and Structural Analyses.

PubMed

Sheth, Bhavisha P; Thaker, Vrinda S

2015-10-01

Authentic identification of plants is essential for exploiting their medicinal properties as well as to stop the adulteration and malpractices with the trade of the same. To identify a herbal powder obtained from a herbalist in the local vicinity of Rajkot, Gujarat, using deoxyribonucleic acid (DNA) barcoding and molecular tools. The DNA was extracted from a herbal powder and selected Cassia species, followed by the polymerase chain reaction (PCR) and sequencing of the rbcL barcode locus. Thereafter the sequences were subjected to National Center for Biotechnology Information (NCBI) basic local alignment search tool (BLAST) analysis, followed by the protein three-dimension structure determination of the rbcL protein from the herbal powder and Cassia species namely Cassia fistula, Cassia tora and Cassia javanica (sequences obtained in the present study), Cassia Roxburghii, and Cassia abbreviata (sequences retrieved from Genbank). Further, the multiple and pairwise structural alignment were carried out in order to identify the herbal powder. The nucleotide sequences obtained from the selected species of Cassia were submitted to Genbank (Accession No. JX141397, JX141405, JX141420). The NCBI BLAST analysis of the rbcL protein from the herbal powder showed an equal sequence similarity (with reference to different parameters like E value, maximum identity, total score, query coverage) to C. javanica and C. roxburghii. In order to solve the ambiguities of the BLAST result, a protein structural approach was implemented. The protein homology models obtained in the present study were submitted to the protein model database (PM0079748-PM0079753). The pairwise structural alignment of the herbal powder (as template) and C. javanica and C. roxburghii (as targets individually) revealed a close similarity of the herbal powder with C. javanica. A strategy as used here, incorporating the integrated use of DNA barcoding and protein structural analyses could be adopted, as a novel rapid and economic procedure, especially in cases when protein coding loci are considered. Authentic identification of plants is essential for exploiting their medicinal properties as well as to stop the adulteration and malpractices with the trade of the same. A herbal powder was obtained from a herbalist in the local vicinity of Rajkot, Gujarat. An integrated approach using DNA barcoding and structural analyses was carried out to identify the herbal powder. The herbal powder was identified as Cassia javanica L.
Characterization of a Novel Polerovirus Infecting Maize in China

PubMed Central

Chen, Sha; Jiang, Guangzhuang; Wu, Jianxiang; Liu, Yong; Qian, Yajuan; Zhou, Xueping

2016-01-01

A novel virus, tentatively named Maize Yellow Mosaic Virus (MaYMV), was identified from the field-grown maize plants showing yellow mosaic symptoms on the leaves collected from the Yunnan Province of China by the deep sequencing of small RNAs. The complete 5642 nucleotide (nt)-long genome of the MaYMV shared the highest nucleotide sequence identity (73%) to Maize Yellow Dwarf Virus-RMV. Sequence comparisons and phylogenetic analyses suggested that MaYMV represents a new member of the genus Polerovirus in the family Luteoviridae. Furthermore, the P0 protein encoded by MaYMV was demonstrated to inhibit both local and systemic RNA silencing by co-infiltration assays using transgenic Nicotiana benthamiana line 16c carrying the GFP reporter gene, which further supported the identification of a new polerovirus. The biologically-active cDNA clone of MaYMV was generated by inserting the full-length cDNA of MaYMV into the binary vector pCB301. RT-PCR and Northern blot analyses showed that this clone was systemically infectious upon agro-inoculation into N. benthamiana. Subsequently, 13 different isolates of MaYMV from field-grown maize plants in different geographical locations of Yunnan and Guizhou provinces of China were sequenced. Analyses of their molecular variation indicate that the 3′ half of P3–P5 read-through protein coding region was the most variable, whereas the coat protein- (CP-) and movement protein- (MP-)coding regions were the most conserved. PMID:27136578
Characterization of a Novel Polerovirus Infecting Maize in China.

PubMed

Chen, Sha; Jiang, Guangzhuang; Wu, Jianxiang; Liu, Yong; Qian, Yajuan; Zhou, Xueping

2016-04-28

A novel virus, tentatively named Maize Yellow Mosaic Virus (MaYMV), was identified from the field-grown maize plants showing yellow mosaic symptoms on the leaves collected from the Yunnan Province of China by the deep sequencing of small RNAs. The complete 5642 nucleotide (nt)-long genome of the MaYMV shared the highest nucleotide sequence identity (73%) to Maize Yellow Dwarf Virus-RMV. Sequence comparisons and phylogenetic analyses suggested that MaYMV represents a new member of the genus Polerovirus in the family Luteoviridae. Furthermore, the P0 protein encoded by MaYMV was demonstrated to inhibit both local and systemic RNA silencing by co-infiltration assays using transgenic Nicotiana benthamiana line 16c carrying the GFP reporter gene, which further supported the identification of a new polerovirus. The biologically-active cDNA clone of MaYMV was generated by inserting the full-length cDNA of MaYMV into the binary vector pCB301. RT-PCR and Northern blot analyses showed that this clone was systemically infectious upon agro-inoculation into N. benthamiana. Subsequently, 13 different isolates of MaYMV from field-grown maize plants in different geographical locations of Yunnan and Guizhou provinces of China were sequenced. Analyses of their molecular variation indicate that the 3' half of P3-P5 read-through protein coding region was the most variable, whereas the coat protein- (CP-) and movement protein- (MP-)coding regions were the most conserved.
UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures.

PubMed

Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier

2016-01-04

The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
CDSbank: taxonomy-aware extraction, selection, renaming and formatting of protein-coding DNA or amino acid sequences.

PubMed

Hazes, Bart

2014-02-28

Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.

Genome-wide profiling of DNA-binding proteins using barcode-based multiplex Solexa sequencing.

PubMed

Raghav, Sunil Kumar; Deplancke, Bart

2012-01-01

Chromatin immunoprecipitation (ChIP) is a commonly used technique to detect the in vivo binding of proteins to DNA. ChIP is now routinely paired to microarray analysis (ChIP-chip) or next-generation sequencing (ChIP-Seq) to profile the DNA occupancy of proteins of interest on a genome-wide level. Because ChIP-chip introduces several biases, most notably due to the use of a fixed number of probes, ChIP-Seq has quickly become the method of choice as, depending on the sequencing depth, it is more sensitive, quantitative, and provides a greater binding site location resolution. With the ever increasing number of reads that can be generated per sequencing run, it has now become possible to analyze several samples simultaneously while maintaining sufficient sequence coverage, thus significantly reducing the cost per ChIP-Seq experiment. In this chapter, we provide a step-by-step guide on how to perform multiplexed ChIP-Seq analyses. As a proof-of-concept, we focus on the genome-wide profiling of RNA Polymerase II as measuring its DNA occupancy at different stages of any biological process can provide insights into the gene regulatory mechanisms involved. However, the protocol can also be used to perform multiplexed ChIP-Seq analyses of other DNA-binding proteins such as chromatin modifiers and transcription factors.
Predicting protein crystallization propensity from protein sequence

PubMed Central

2011-01-01

The high-throughput structure determination pipelines developed by structural genomics programs offer a unique opportunity for data mining. One important question is how protein properties derived from a primary sequence correlate with the protein’s propensity to yield X-ray quality crystals (crystallizability) and 3D X-ray structures. A set of protein properties were computed for over 1,300 proteins that expressed well but were insoluble, and for ~720 unique proteins that resulted in X-ray structures. The correlation of the protein’s iso-electric point and grand average hydropathy (GRAVY) with crystallizability was analyzed for full length and domain constructs of protein targets. In a second step, several additional properties that can be calculated from the protein sequence were added and evaluated. Using statistical analyses we have identified a set of the attributes correlating with a protein’s propensity to crystallize and implemented a Support Vector Machine (SVM) classifier based on these. We have created applications to analyze and provide optimal boundary information for query sequences and to visualize the data. These tools are available via the web site http://bioinformatics.anl.gov/cgi-bin/tools/pdpredictor. PMID:20177794
Analysis of functional redundancies within the Arabidopsis TCP transcription factor family.

PubMed

Danisman, Selahattin; van Dijk, Aalt D J; Bimbo, Andrea; van der Wal, Froukje; Hennig, Lars; de Folter, Stefan; Angenent, Gerco C; Immink, Richard G H

2013-12-01

Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein-protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein-protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family.
Transterm: a database to aid the analysis of regulatory sequences in mRNAs

PubMed Central

Jacobs, Grant H.; Chen, Augustine; Stevens, Stewart G.; Stockwell, Peter A.; Black, Michael A.; Tate, Warren P.; Brown, Chris M.

2009-01-01

Messenger RNAs, in addition to coding for proteins, may contain regulatory elements that affect how the protein is translated. These include protein and microRNA-binding sites. Transterm (http://mRNA.otago.ac.nz/Transterm.html) is a database of regions and elements that affect translation with two major unique components. The first is integrated results of analysis of general features that affect translation (initiation, elongation, termination) for species or strains in Genbank, processed through a standard pipeline. The second is curated descriptions of experimentally determined regulatory elements that function as translational control elements in mRNAs. Transterm focuses on protein binding sites, particularly those in 3′-untranslated regions (3′-UTR). For this release the interface has been extensively updated based on user feedback. The data is now accessible by strain rather than species, for example there are 10 Escherichia coli strains (genomes) analysed separately. In addition to providing a repository of data, the database also provides tools for users to query their own mRNA sequences. Users can search sequences for Transterm or user defined regulatory elements, including protein or miRNA targets. Transterm also provides a central core of links to related resources for complementary analyses. PMID:18984623
Structural and functional analysis of an enhancer GPEI having a phorbol 12-O-tetradecanoate 13-acetate responsive element-like sequence found in the rat glutathione transferase P gene.

PubMed

Okuda, A; Imagawa, M; Maeda, Y; Sakai, M; Muramatsu, M

1989-10-05

We have recently identified a typical enhancer, termed GPEI, located about 2.5 kilobases upstream from the transcription initiation site of the rat glutathione transferase P gene. Analyses of 5' and 3' deletion mutants revealed that the cis-acting sequence of GPEI contained the phorbol 12-O-tetradecanoate 13-acetate responsive element (TRE)-like sequence in it. For the maximal activity, however, GPEI required an adjacent upstream sequence of about 19 base pairs in addition to the TRE-like sequence. With the DNA binding gel-shift assay, we could detect protein(s) that specifically binds to the TRE-like sequence of GPEI fragment, which was possibly c-jun.c-fos complex or a similar protein complex. The sequence immediately upstream of the TRE-like sequence did not have any activity by itself, but augmented the latter activity by about 5-fold.
Differentiated evolutionary relationships among chordates from comparative alignments of multiple sequences of MyoD and MyoG myogenic regulatory factors.

PubMed

Oliani, L C; Lidani, K C F; Gabriel, J E

2015-10-16

MyoD and MyoG are transcription factors that have essential roles in myogenic lineage determination and muscle differentiation. The purpose of this study was to compare multiple amino acid sequences of myogenic regulatory proteins to infer evolutionary relationships among chordates. Protein sequences from Mus musculus (P10085 and P12979), human Homo sapiens (P15172 and P15173), bovine Bos taurus (Q7YS82 and Q7YS81), wild pig Sus scrofa (P49811 and P49812), quail Coturnix coturnix (P21572 and P34060), chicken Gallus gallus (P16075 and P17920), rat Rattus norvegicus (Q02346 and P20428), domestic water buffalo Bubalus bubalis (D2SP11 and A7L034), and sheep Ovis aries (Q90477 and D3YKV7) were searched from a non-redundant protein sequence database UniProtKB/Swiss-Prot, and subsequently analyzed using the Mega6.0 software. MyoD evolutionary analyses revealed the presence of three main clusters with all mammals branched in one cluster, members of the order Rodentia (mouse and rat) in a second branch linked to the first, and birds of the order Galliformes (chicken and quail) remaining isolated in a third. MyoG evolutionary analyses aligned sequences in two main clusters, all mammalian specimens grouped in different sub-branches, and birds clustered in a second branch. These analyses suggest that the evolution of MyoD and MyoG was driven by different pathways.
Mass spectrometry: Raw protein from the top down

NASA Astrophysics Data System (ADS)

Breuker, Kathrin

2018-02-01

Mass spectrometry is a powerful technique for analysing proteins, yet linking higher-order protein structure to amino acid sequence and post-translational modifications is far from simple. Now, a native top-down method has been developed that can provide information on higher-order protein structure and different proteoforms at the same time.
Inferences from structural comparison: flexibility, secondary structure wobble and sequence alignment optimization.

PubMed

Zhang, Gaihua; Su, Zhen

2012-01-01

Work on protein structure prediction is very useful in biological research. To evaluate their accuracy, experimental protein structures or their derived data are used as the 'gold standard'. However, as proteins are dynamic molecular machines with structural flexibility such a standard may be unreliable. To investigate the influence of the structure flexibility, we analysed 3,652 protein structures of 137 unique sequences from 24 protein families. The results showed that (1) the three-dimensional (3D) protein structures were not rigid: the root-mean-square deviation (RMSD) of the backbone Cα of structures with identical sequences was relatively large, with the average of the maximum RMSD from each of the 137 sequences being 1.06 Å; (2) the derived data of the 3D structure was not constant, e.g. the highest ratio of the secondary structure wobble site was 60.69%, with the sequence alignments from structural comparisons of two proteins in the same family sometimes being completely different. Proteins may have several stable conformations and the data derived from resolved structures as a 'gold standard' should be optimized before being utilized as criteria to evaluate the prediction methods, e.g. sequence alignment from structural comparison. Helix/β-sheet transition exists in normal free proteins. The coil ratio of the 3D structure could affect its resolution as determined by X-ray crystallography.
A curated gluten protein sequence database to support development of proteomics methods for determination of gluten in gluten-free foods.

PubMed

Bromilow, Sophie; Gethings, Lee A; Buckley, Mike; Bromley, Mike; Shewry, Peter R; Langridge, James I; Clare Mills, E N

2017-06-23

The unique physiochemical properties of wheat gluten enable a diverse range of food products to be manufactured. However, gluten triggers coeliac disease, a condition which is treated using a gluten-free diet. Analytical methods are required to confirm if foods are gluten-free, but current immunoassay-based methods can unreliable and proteomic methods offer an alternative but require comprehensive and well annotated sequence databases which are lacking for gluten. A manually a curated database (GluPro V1.0) of gluten proteins, comprising 630 discrete unique full length protein sequences has been compiled. It is representative of the different types of gliadin and glutenin components found in gluten. An in silico comparison of their coeliac toxicity was undertaken by analysing the distribution of coeliac toxic motifs. This demonstrated that whilst the α-gliadin proteins contained more toxic motifs, these were distributed across all gluten protein sub-types. Comparison of annotations observed using a discovery proteomics dataset acquired using ion mobility MS/MS showed that more reliable identifications were obtained using the GluPro V1.0 database compared to the complete reviewed Viridiplantae database. This highlights the value of a curated sequence database specifically designed to support the proteomic workflows and the development of methods to detect and quantify gluten. We have constructed the first manually curated open-source wheat gluten protein sequence database (GluPro V1.0) in a FASTA format to support the application of proteomic methods for gluten protein detection and quantification. We have also analysed the manually verified sequences to give the first comprehensive overview of the distribution of sequences able to elicit a reaction in coeliac disease, the prevalent form of gluten intolerance. Provision of this database will improve the reliability of gluten protein identification by proteomic analysis, and aid the development of targeted mass spectrometry methods in line with Codex Alimentarius Commission requirements for foods designed to meet the needs of gluten intolerant individuals. Copyright © 2017. Published by Elsevier B.V.
Classification of viral zoonosis through receptor pattern analysis.

PubMed

Bae, Se-Eun; Son, Hyeon Seok

2011-04-13

Viral zoonosis, the transmission of a virus from its primary vertebrate reservoir species to humans, requires ubiquitous cellular proteins known as receptor proteins. Zoonosis can occur not only through direct transmission from vertebrates to humans, but also through intermediate reservoirs or other environmental factors. Viruses can be categorized according to genotype (ssDNA, dsDNA, ssRNA and dsRNA viruses). Among them, the RNA viruses exhibit particularly high mutation rates and are especially problematic for this reason. Most zoonotic viruses are RNA viruses that change their envelope proteins to facilitate binding to various receptors of host species. In this study, we sought to predict zoonotic propensity through the analysis of receptor characteristics. We hypothesized that the major barrier to interspecies virus transmission is that receptor sequences vary among species--in other words, that the specific amino acid sequence of the receptor determines the ability of the viral envelope protein to attach to the cell. We analysed host-cell receptor sequences for their hydrophobicity/hydrophilicity characteristics. We then analysed these properties for similarities among receptors of different species and used a statistical discriminant analysis to predict the likelihood of transmission among species. This study is an attempt to predict zoonosis through simple computational analysis of receptor sequence differences. Our method may be useful in predicting the zoonotic potential of newly discovered viral strains.
Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.

PubMed

Smith, Colin A; Kortemme, Tanja

2011-01-01

Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.
Selfish DNA in protein-coding genes of Rickettsia.

PubMed

Ogata, H; Audic, S; Barbe, V; Artiguenave, F; Fournier, P E; Raoult, D; Claverie, J M

2000-10-13

Rickettsia conorii, the aetiological agent of Mediterranean spotted fever, is an intracellular bacterium transmitted by ticks. Preliminary analyses of the nearly complete genome sequence of R. conorii have revealed 44 occurrences of a previously undescribed palindromic repeat (150 base pairs long) throughout the genome. Unexpectedly, this repeat was found inserted in-frame within 19 different R. conorii open reading frames likely to encode functional proteins. We found the same repeat in proteins of other Rickettsia species. The finding of a mobile element inserted in many unrelated genes suggests the potential role of selfish DNA in the creation of new protein sequences.
An Interspecies Comparative Analysis of the Predicted Secretomes of the Necrotrophic Plant Pathogens Sclerotinia sclerotiorum and Botrytis cinerea

PubMed Central

2015-01-01

Phytopathogenic fungi form intimate associations with host plant species and cause disease. To be successful, fungal pathogens communicate with a susceptible host through the secretion of proteinaceous effectors, hydrolytic enzymes and metabolites. Sclerotinia sclerotiorum and Botrytis cinerea are economically important necrotrophic fungal pathogens that cause disease on numerous crop species. Here, a powerful bioinformatics pipeline was used to predict the refined S. sclerotiorum and B. cinerea secretomes, identifying 432 and 499 proteins respectively. Analyses focusing on S. sclerotiorum revealed that 16% of the secretome encoding genes resided in small, sequence heterogeneous, gene clusters that were distributed over 13 of the 16 predicted chromosomes. Functional analyses highlighted the importance of plant cell hydrolysis, oxidation-reduction processes and the redox state to the S. sclerotiorum and B. cinerea secretomes and potentially host infection. Only 8% of the predicted proteins were distinct between the two secretomes. In contrast to S. sclerotiorum, the B. cinerea secretome lacked CFEM- or LysM-containing proteins. The 115 fungal and oomycete genome comparison identified 30 proteins specific to S. sclerotiorum and B. cinerea, plus 11 proteins specific to S. sclerotiorum and 32 proteins specific to B. cinerea. Expressed sequence tag (EST) and proteomic analyses showed that 246 S. sclerotiorum secretome encoding genes had EST support, including 101 which were only expressed in vitro and 49 which were only expressed in planta, whilst 42 predicted proteins were experimentally proven to be secreted. These detailed in silico analyses of two important necrotrophic pathogens will permit informed choices to be made when candidate effector proteins are selected for function analyses in planta. PMID:26107498
@TOME-2: a new pipeline for comparative modeling of protein-ligand complexes.

PubMed

Pons, Jean-Luc; Labesse, Gilles

2009-07-01

@TOME 2.0 is new web pipeline dedicated to protein structure modeling and small ligand docking based on comparative analyses. @TOME 2.0 allows fold recognition, template selection, structural alignment editing, structure comparisons, 3D-model building and evaluation. These tasks are routinely used in sequence analyses for structure prediction. In our pipeline the necessary software is efficiently interconnected in an original manner to accelerate all the processes. Furthermore, we have also connected comparative docking of small ligands that is performed using protein-protein superposition. The input is a simple protein sequence in one-letter code with no comment. The resulting 3D model, protein-ligand complexes and structural alignments can be visualized through dedicated Web interfaces or can be downloaded for further studies. These original features will aid in the functional annotation of proteins and the selection of templates for molecular modeling and virtual screening. Several examples are described to highlight some of the new functionalities provided by this pipeline. The server and its documentation are freely available at http://abcis.cbs.cnrs.fr/AT2/
The First Complete Mitochondrial Genome Sequences for Stomatopod Crustaceans: Implications for Phylogeny

DOE Office of Scientific and Technical Information (OSTI.GOV)

Swinstrom, Kirsten; Caldwell, Roy; Fourcade, H. Matthew

2005-09-07

We report the first complete mitochondrial genome sequences of stomatopods and compare their features to each other and to those of other crustaceans. Phylogenetic analyses of the concatenated mitochondrial protein-coding sequences were used to explore relationships within the Stomatopoda, within the malacostracan crustaceans, and among crustaceans and insects. Although these analyses support the monophyly of both Malacostraca and, within it, Stomatopoda, it also confirms the view of a paraphyletic Crustacea, with Malacostraca being more closely related to insects than to the branchiopod crustaceans.
Complete amino acid sequences of the ribosomal proteins L25, L29 and L31 from the archaebacterium Halobacterium marismortui.

PubMed

Hatakeyama, T; Kimura, M

1988-03-15

Ribosomal proteins were extracted from 50S ribosomal subunits of the archaebacterium Halobacterium marismortui by decreasing the concentration of Mg2+ and K+, and the proteins were separated and purified by ion-exchange column chromatography on DEAE-cellulose. Ten proteins were purified to homogeneity and three of these proteins were subjected to sequence analysis. The complete amino acid sequences of the ribosomal proteins L25, L29 and L31 were established by analyses of the peptides obtained by enzymatic digestion with trypsin, Staphylococcus aureus protease, chymotrypsin and lysylendopeptidase. Proteins L25, L29 and L31 consist of 84, 115 and 95 amino acid residues with the molecular masses of 9472 Da, 12293 Da and 10418 Da respectively. A comparison of their sequences with those of other large-ribosomal-subunit proteins from other organisms revealed that protein L25 from H. marismortui is homologous to protein L23 from Escherichia coli (34.6%), Bacillus stearothermophilus (41.8%), and tobacco chloroplasts (16.3%) as well as to protein L25 from yeast (38.0%). Proteins L29 and L31 do not appear to be homologous to any other ribosomal proteins whose structures are so far known.
The Universal Protein Resource (UniProt): an expanding universe of protein information.

PubMed

Wu, Cathy H; Apweiler, Rolf; Bairoch, Amos; Natale, Darren A; Barker, Winona C; Boeckmann, Brigitte; Ferro, Serenella; Gasteiger, Elisabeth; Huang, Hongzhan; Lopez, Rodrigo; Magrane, Michele; Martin, Maria J; Mazumder, Raja; O'Donovan, Claire; Redaschi, Nicole; Suzek, Baris

2006-01-01

The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.
Content of intrinsic disorder influences the outcome of cell-free protein synthesis.

PubMed

Tokmakov, Alexander A; Kurotani, Atsushi; Ikeda, Mariko; Terazawa, Yumiko; Shirouzu, Mikako; Stefanov, Vasily; Sakurai, Tetsuya; Yokoyama, Shigeyuki

2015-09-11

Cell-free protein synthesis is used to produce proteins with various structural traits. Recent bioinformatics analyses indicate that more than half of eukaryotic proteins possess long intrinsically disordered regions. However, no systematic study concerning the connection between intrinsic disorder and expression success of cell-free protein synthesis has been presented until now. To address this issue, we examined correlations of the experimentally observed cell-free protein expression yields with the contents of intrinsic disorder bioinformatically predicted in the expressed sequences. This analysis revealed strong relationships between intrinsic disorder and protein amenability to heterologous cell-free expression. On the one hand, elevated disorder content was associated with the increased ratio of soluble expression. On the other hand, overall propensity for detectable protein expression decreased with disorder content. We further demonstrated that these tendencies are rooted in some distinct features of intrinsically disordered regions, such as low hydrophobicity, elevated surface accessibility and high abundance of sequence motifs for proteolytic degradation, including sites of ubiquitination and PEST sequences. Our findings suggest that identification of intrinsically disordered regions in the expressed amino acid sequences can be of practical use for predicting expression success and optimizing cell-free protein synthesis.
Sequence Alignment to Predict Across Species Susceptibility ...

EPA Pesticide Factsheets

Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target, so it is amenable to variable degrees of protein characterization, depending on available information about the chemical/protein interaction and the molecular target itself. To allow for flexibility in the analysis, a layered strategy was adopted for the tool. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of candidate orthologs), the second level evaluates sequence similarity within selected domains (e.g., ligand-binding domain, DNA binding domain), and the third level of analysis compares individual amino acid residue positions identified as being of importance for protein conformation and/or ligand binding upon chemical perturbation. Each level of the SeqAPASS analysis provides increasing evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further ev
Insights into rubber biosynthesis from transcriptome analysis of Hevea brasiliensis latex.

PubMed

Chow, Keng-See; Wan, Kiew-Lian; Isa, Mohd Noor Mat; Bahari, Azlina; Tan, Siang-Hee; Harikrishna, K; Yeang, Hoong-Yeet

2007-01-01

Hevea brasiliensis is the most widely cultivated species for commercial production of natural rubber (cis-polyisoprene). In this study, 10,040 expressed sequence tags (ESTs) were generated from the latex of the rubber tree, which represents the cytoplasmic content of a single cell type, in order to analyse the latex transcription profile with emphasis on rubber biosynthesis-related genes. A total of 3,441 unique transcripts (UTs) were obtained after quality editing and assembly of EST sequences. Functional classification of UTs according to the Gene Ontology convention showed that 73.8% were related to genes of unknown function. Among highly expressed ESTs, a significant proportion encoded proteins related to rubber biosynthesis and stress or defence responses. Sequences encoding rubber particle membrane proteins (RPMPs) belonging to three protein families accounted for 12% of the ESTs. Characterization of these ESTs revealed nine RPMP variants (7.9-27 kDa) including the 14 kDa REF (rubber elongation factor) and 22 kDa SRPP (small rubber particle protein). The expression of multiple RPMP isoforms in latex was shown using antibodies against REF and SRPP. Both EST and quantitative reverse transcription-PCR (QRT-PCR) analyses demonstrated REF and SRPP to be the most abundant transcripts in latex. Besides rubber biosynthesis, comparative sequence analysis showed that the RPMPs are highly similar to sequences in the plant kingdom having stress-related functions. Implications of the RPMP function in cis-polyisoprene biosynthesis in the context of transcript abundance and differential gene expression are discussed.

Large-Scale Concatenation cDNA Sequencing

PubMed Central

Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

1997-01-01

A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174
Evolutionary mechanisms involved in the virulence of infectious salmon anaemia virus (ISAV), a piscine orthomyxovirus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Markussen, Turhan; Jonassen, Christine Monceyron; Numanovic, Sanela

2008-05-10

Infectious salmon anaemia virus (ISAV) is an orthomyxovirus causing a multisystemic, emerging disease in Atlantic salmon. Here we present, for the first time, detailed sequence analyses of the full-genome sequence of a presumed avirulent isolate displaying a full-length hemagglutinin-esterase (HE) gene (HPR0), and compare this with full-genome sequences of 11 Norwegian ISAV isolates from clinically diseased fish. These analyses revealed the presence of a virulence marker right upstream of the putative cleavage site R{sub 267} in the fusion (F) protein, suggesting a Q{sub 266} {yields} L{sub 266} substitution to be a prerequisite for virulence. To gain virulence in isolates lackingmore » this substitution, a sequence insertion near the cleavage site seems to be required. This strongly suggests the involvement of a protease recognition pattern at the cleavage site of the fusion protein as a determinant of virulence, as seen in highly pathogenic influenza A virus H5 or H7 and the paramyxovirus Newcastle disease virus.« less
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.

PubMed

Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

2010-03-01

Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. The database can be accessed through http://proteinworlddb.org
Adhesive Proteins of Stalked and Acorn Barnacles Display Homology with Low Sequence Similarities

PubMed Central

Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

2014-01-01

Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins ‘sticky’ has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7–16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18–26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa). PMID:25295513
Adhesive proteins of stalked and acorn barnacles display homology with low sequence similarities.

PubMed

Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

2014-01-01

Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins 'sticky' has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7-16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18-26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa).
Use of conserved key amino acid positions to morph protein folds.

PubMed

Reddy, Boojala V B; Li, Wilfred W; Bourne, Philip E

2002-07-15

By using three-dimensional (3D) structure alignments and a previously published method to determine Conserved Key Amino Acid Positions (CKAAPs) we propose a theoretical method to design mutations that can be used to morph the protein folds. The original Paracelsus challenge, met by several groups, called for the engineering of a stable but different structure by modifying less than 50% of the amino acid residues. We have used the sequences from the Protein Data Bank (PDB) identifiers 1ROP, and 2CRO, which were previously used in the Paracelsus challenge by those groups, and suggest mutation to CKAAPs to morph the protein fold. The total number of mutations suggested is less than 40% of the starting sequence theoretically improving the challenge results. From secondary structure prediction experiments of the proposed mutant sequence structures, we observe that each of the suggested mutant protein sequences likely folds to a different, non-native potentially stable target structure. These results are an early indicator that analyses using structure alignments leading to CKAAPs of a given structure are of value in protein engineering experiments. Copyright 2002 Wiley Periodicals, Inc.
Using GenBank.

PubMed

Wheeler, David

2007-01-01

GenBank(R) is a comprehensive database of publicly available DNA sequences for more than 205,000 named organisms and for more than 60,000 within the embryophyta, obtained through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Daily data exchange with the European Molecular Biology Laboratory (EMBL) in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases with taxonomy, genome, mapping, protein structure, and domain information and the biomedical journal literature through PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available through FTP. GenBank usage scenarios ranging from local analyses of the data available through FTP to online analyses supported by the NCBI Web-based tools are discussed. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at http://www.ncbi.nlm.nih.gov.
Experimental strategies for the identification and characterization of adhesive proteins in animals: a review

PubMed Central

Hennebert, Elise; Maldonado, Barbara; Ladurner, Peter; Flammang, Patrick; Santos, Romana

2015-01-01

Adhesive secretions occur in both aquatic and terrestrial animals, in which they perform diverse functions. Biological adhesives can therefore be remarkably complex and involve a large range of components with different functions and interactions. However, being mainly protein based, biological adhesives can be characterized by classical molecular methods. This review compiles experimental strategies that were successfully used to identify, characterize and obtain the full-length sequence of adhesive proteins from nine biological models: echinoderms, barnacles, tubeworms, mussels, sticklebacks, slugs, velvet worms, spiders and ticks. A brief description and practical examples are given for a variety of tools used to study adhesive molecules at different levels from genes to secreted proteins. In most studies, proteins, extracted from secreted materials or from adhesive organs, are analysed for the presence of post-translational modifications and submitted to peptide sequencing. The peptide sequences are then used directly for a BLAST search in genomic or transcriptomic databases, or to design degenerate primers to perform RT-PCR, both allowing the recovery of the sequence of the cDNA coding for the investigated protein. These sequences can then be used for functional validation and recombinant production. In recent years, the dual proteomic and transcriptomic approach has emerged as the best way leading to the identification of novel adhesive proteins and retrieval of their complete sequences. PMID:25657842
Multi-level machine learning prediction of protein-protein interactions in Saccharomyces cerevisiae.

PubMed

Zubek, Julian; Tatjewski, Marcin; Boniecki, Adam; Mnich, Maciej; Basu, Subhadip; Plewczynski, Dariusz

2015-01-01

Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).
Origin and spread of photosynthesis based upon conserved sequence features in key bacteriochlorophyll biosynthesis proteins.

PubMed

Gupta, Radhey S

2012-11-01

The origin of photosynthesis and how this capability has spread to other bacterial phyla remain important unresolved questions. I describe here a number of conserved signature indels (CSIs) in key proteins involved in bacteriochlorophyll (Bchl) biosynthesis that provide important insights in these regards. The proteins BchL and BchX, which are essential for Bchl biosynthesis, are derived by gene duplication in a common ancestor of all phototrophs. More ancient gene duplication gave rise to the BchX-BchL proteins and the NifH protein of the nitrogenase complex. The sequence alignment of NifH-BchX-BchL proteins contain two CSIs that are uniquely shared by all NifH and BchX homologs, but not by any BchL homologs. These CSIs and phylogenetic analysis of NifH-BchX-BchL protein sequences strongly suggest that the BchX homologs are ancestral to BchL and that the Bchl-based anoxygenic photosynthesis originated prior to the chlorophyll (Chl)-based photosynthesis in cyanobacteria. Another CSI in the BchX-BchL sequence alignment that is uniquely shared by all BchX homologs and the BchL sequences from Heliobacteriaceae, but absent in all other BchL homologs, suggests that the BchL homologs from Heliobacteriaceae are primitive in comparison to all other photosynthetic lineages. Several other identified CSIs in the BchN homologs are commonly shared by all proteobacterial homologs and a clade consisting of the marine unicellular Cyanobacteria (Clade C). These CSIs in conjunction with the results of phylogenetic analyses and pair-wise sequence similarity on the BchL, BchN, and BchB proteins, where the homologs from Clade C Cyanobacteria and Proteobacteria exhibited close relationship, provide strong evidence that these two groups have incurred lateral gene transfers. Additionally, phylogenetic analyses and several CSIs in the BchL-N-B proteins that are uniquely shared by all Chlorobi and Chloroflexi homologs provide evidence that the genes for these proteins have also been laterally transferred between these groups. Other results and observations reported here indicate that the genes for the BchL-N-B proteins in Proteobacteria are derived from the Clade C Cyanobacteria, whereas those in Chlorobi were acquired from Chloroflexus or related bacteria by means of LGTs. Some implications of these observations regarding the origin and spread of photosynthesis are discussed.
Sequence analyses reveal that a TPR-DP module, surrounded by recombinable flanking introns, could be at the origin of eukaryotic Hop and Hip TPR-DP domains and prokaryotic GerD proteins.

PubMed

Hernández Torres, Jorge; Papandreou, Nikolaos; Chomilier, Jacques

2009-05-01

The co-chaperone Hop [heat shock protein (HSP) organising protein] is known to bind both Hsp70 and Hsp90. Hop comprises three repeats of a tetratricopeptide repeat (TPR) domain, each consisting of three TPR motifs. The first and last TPR domains are followed by a domain containing several dipeptide (DP) repeats called the DP domain. These analyses suggest that the hop genes result from successive recombination events of an ancestral TPR-DP module. From a hydrophobic cluster analysis of homologous Hop protein sequences derived from gene families, we can postulate that shifts in the open reading frames are at the origin of the present sequences. Moreover, these shifts can be related to the presence or absence of biological function. We propose to extend the family of Hop co-chaperons into the kingdom of bacteria, as several structurally related genes have been identified by hydrophobic cluster analysis. We also provide evidence of common structural characteristics between hop and hip genes, suggesting a shared precursor of ancestral TPR-DP domains.
Investigation of the protein osteocalcin of Camelops hesternus: Sequence, structure and phylogenetic implications

NASA Astrophysics Data System (ADS)

Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.

2007-12-01

Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a complete character analysis aimed at determining the evolutionary history of this functionally significant protein. We emphasize that ancient protein sequencing and phylogenetic analyses using amino acid sequences must pay close attention to post-translational modifications, amino acid substitutions due to diagenetic alteration and the impacts of isobaric amino acids on mass shifts and sequence alignments.
Genetic analyses of bone morphogenetic protein 2, 4 and 7 in congenital combined pituitary hormone deficiency.

PubMed

Breitfeld, Jana; Martens, Susanne; Klammt, Jürgen; Schlicke, Marina; Pfäffle, Roland; Krause, Kerstin; Weidle, Kerstin; Schleinitz, Dorit; Stumvoll, Michael; Führer, Dagmar; Kovacs, Peter; Tönjes, Anke

2013-12-01

The complex process of development of the pituitary gland is regulated by a number of signalling molecules and transcription factors. Mutations in these factors have been identified in rare cases of congenital hypopituitarism but for most subjects with combined pituitary hormone deficiency (CPHD) genetic causes are unknown. Bone morphogenetic proteins (BMPs) affect induction and growth of the pituitary primordium and thus represent plausible candidates for mutational screening of patients with CPHD. We sequenced BMP2, 4 and 7 in 19 subjects with CPHD. For validation purposes, novel genetic variants were genotyped in 1046 healthy subjects. Additionally, potential functional relevance for most promising variants has been assessed by phylogenetic analyses and prediction of effects on protein structure. Sequencing revealed two novel variants and confirmed 30 previously known polymorphisms and mutations in BMP2, 4 and 7. Although phylogenetic analyses indicated that these variants map within strongly conserved gene regions, there was no direct support for their impact on protein structure when applying predictive bioinformatics tools. A mutation in the BMP4 coding region resulting in an amino acid exchange (p.Arg300Pro) appeared most interesting among the identified variants. Further functional analyses are required to ultimately map the relevance of these novel variants in CPHD.
Genetic analyses of bone morphogenetic protein 2, 4 and 7 in congenital combined pituitary hormone deficiency

PubMed Central

2013-01-01

Background The complex process of development of the pituitary gland is regulated by a number of signalling molecules and transcription factors. Mutations in these factors have been identified in rare cases of congenital hypopituitarism but for most subjects with combined pituitary hormone deficiency (CPHD) genetic causes are unknown. Bone morphogenetic proteins (BMPs) affect induction and growth of the pituitary primordium and thus represent plausible candidates for mutational screening of patients with CPHD. Methods We sequenced BMP2, 4 and 7 in 19 subjects with CPHD. For validation purposes, novel genetic variants were genotyped in 1046 healthy subjects. Additionally, potential functional relevance for most promising variants has been assessed by phylogenetic analyses and prediction of effects on protein structure. Results Sequencing revealed two novel variants and confirmed 30 previously known polymorphisms and mutations in BMP2, 4 and 7. Although phylogenetic analyses indicated that these variants map within strongly conserved gene regions, there was no direct support for their impact on protein structure when applying predictive bioinformatics tools. Conclusions A mutation in the BMP4 coding region resulting in an amino acid exchange (p.Arg300Pro) appeared most interesting among the identified variants. Further functional analyses are required to ultimately map the relevance of these novel variants in CPHD. PMID:24289245
Expansion for the Brachylophosaurus canadensis Collagen I Sequence and Additional Evidence of the Preservation of Cretaceous Protein.

PubMed

Schroeter, Elena R; DeHart, Caroline J; Cleland, Timothy P; Zheng, Wenxia; Thomas, Paul M; Kelleher, Neil L; Bern, Marshall; Schweitzer, Mary H

2017-02-03

Sequence data from biomolecules such as DNA and proteins, which provide critical information for evolutionary studies, have been assumed to be forever outside the reach of dinosaur paleontology. Proteins, which are predicted to have greater longevity than DNA, have been recovered from two nonavian dinosaurs, but these results remain controversial. For proteomic data derived from extinct Mesozoic organisms to reach their greatest potential for investigating questions of phylogeny and paleobiology, it must be shown that peptide sequences can be reliably and reproducibly obtained from fossils and that fragmentary sequences for ancient proteins can be increasingly expanded. To test the hypothesis that peptides can be repeatedly detected and validated from fossil tissues many millions of years old, we applied updated extraction methodology, high-resolution mass spectrometry, and bioinformatics analyses on a Brachylophosaurus canadensis specimen (MOR 2598) from which collagen I peptides were recovered in 2009. We recovered eight peptide sequences of collagen I: two identical to peptides recovered in 2009 and six new peptides. Phylogenetic analyses place the recovered sequences within basal archosauria. When only the new sequences are considered, B. canadensis is grouped more closely to crocodylians, but when all sequences (current and those reported in 2009) are analyzed, B. canadensis is placed more closely to basal birds. The data robustly support the hypothesis of an endogenous origin for these peptides, confirm the idea that peptides can survive in specimens tens of millions of years old, and bolster the validity of the 2009 study. Furthermore, the new data expand the coverage of B. canadensis collagen I (a 33.6% increase in collagen I alpha 1 and 116.7% in alpha 2). Finally, this study demonstrates the importance of reexamining previously studied specimens with updated methods and instrumentation, as we obtained roughly the same amount of sequence data as the previous study with substantially less sample material. Data are available via ProteomeXchange with identifier PXD005087.
Data Portal | Office of Cancer Clinical Proteomics Research

Cancer.gov

The CPTAC Data Portal is a centralized repository for the public dissemination of proteomic sequence datasets collected by CPTAC, along with corresponding genomic sequence datasets. In addition, available are analyses of CPTAC's raw mass spectrometry-based data files (mapping of spectra to peptide sequences and protein identification) by individual investigators from CPTAC and by a Common Data Analysis Pipeline.
Evidence for the principle of minimal frustration in the evolution of protein folding landscapes.

PubMed

Tzul, Franco O; Vasilchuk, Daniel; Makhatadze, George I

2017-02-28

Theoretical and experimental studies have firmly established that protein folding can be described by a funneled energy landscape. This funneled energy landscape is the result of foldable protein sequences evolving following the principle of minimal frustration, which allows proteins to rapidly fold to their native biologically functional conformations. For a protein family with a given functional fold, the principle of minimal frustration suggests that, independent of sequence, all proteins within this family should fold with similar rates. However, depending on the optimal living temperature of the organism, proteins also need to modulate their thermodynamic stability. Consequently, the difference in thermodynamic stability should be primarily caused by differences in the unfolding rates. To test this hypothesis experimentally, we performed comprehensive thermodynamic and kinetic analyses of 15 different proteins from the thioredoxin family. Eight of these thioredoxins were extant proteins from psychrophilic, mesophilic, or thermophilic organisms. The other seven protein sequences were obtained using ancestral sequence reconstruction and can be dated back over 4 billion years. We found that all studied proteins fold with very similar rates but unfold with rates that differ up to three orders of magnitude. The unfolding rates correlate well with the thermodynamic stability of the proteins. Moreover, proteins that unfold slower are more resistant to proteolysis. These results provide direct experimental support to the principle of minimal frustration hypothesis.
Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leung, Elo; Huang, Amy; Cadag, Eithon

In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less
Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

DOE PAGES

Leung, Elo; Huang, Amy; Cadag, Eithon; ...

2016-01-20

In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less
KinFin: Software for Taxon-Aware Analysis of Clustered Protein Sequences.

PubMed

Laetsch, Dominik R; Blaxter, Mark L

2017-10-05

The field of comparative genomics is concerned with the study of similarities and differences between the information encoded in the genomes of organisms. A common approach is to define gene families by clustering protein sequences based on sequence similarity, and analyze protein cluster presence and absence in different species groups as a guide to biology. Due to the high dimensionality of these data, downstream analysis of protein clusters inferred from large numbers of species, or species with many genes, is nontrivial, and few solutions exist for transparent, reproducible, and customizable analyses. We present KinFin, a streamlined software solution capable of integrating data from common file formats and delivering aggregative annotation of protein clusters. KinFin delivers analyses based on systematic taxonomy of the species analyzed, or on user-defined, groupings of taxa, for example, sets based on attributes such as life history traits, organismal phenotypes, or competing phylogenetic hypotheses. Results are reported through graphical and detailed text output files. We illustrate the utility of the KinFin pipeline by addressing questions regarding the biology of filarial nematodes, which include parasites of veterinary and medical importance. We resolve the phylogenetic relationships between the species and explore functional annotation of proteins in clusters in key lineages and between custom taxon sets, identifying gene families of interest. KinFin can easily be integrated into existing comparative genomic workflows, and promotes transparent and reproducible analysis of clustered protein data. Copyright © 2017 Laetsch and Blaxter.

A proteomics study of barley powdery mildew haustoria.

PubMed

Godfrey, Dale; Zhang, Ziguo; Saalbach, Gerhard; Thordal-Christensen, Hans

2009-06-01

A number of fungal and oomycete plant pathogens of major economic importance feed on their hosts by means of haustoria, which they place inside living plant cells. The underlying mechanisms are poorly understood, partly due to difficulty in preparing haustoria. We have therefore developed a procedure for isolating haustoria from the barley powdery mildew fungus (Blumeria graminis f.sp. hordei, Bgh). We subsequently aimed to understand the molecular mechanisms of haustoria through a study of their proteome. Extracted proteins were digested using trypsin, separated by LC, and analysed by MS/MS. Searches of a custom Bgh EST sequence database and the NCBI-NR fungal protein database, using the MS/MS data, identified 204 haustoria proteins. The majority of the proteins appear to have roles in protein metabolic pathways and biological energy production. Surprisingly, pyruvate decarboxylase (PDC), involved in alcoholic fermentation and commonly abundant in fungi and plants, was absent in our Bgh proteome data set. A sequence encoding this enzyme was also absent in our EST sequence database. Significantly, BLAST searches of the recently available Bgh genome sequence data also failed to identify a sequence encoding this enzyme, strongly indicating that Bgh does not have a gene for PDC.
STAT1:DNA sequence-dependent binding modulation by phosphorylation, protein:protein interactions and small-molecule inhibition

PubMed Central

Bonham, Andrew J.; Wenta, Nikola; Osslund, Leah M.; Prussin, Aaron J.; Vinkemeier, Uwe; Reich, Norbert O.

2013-01-01

The DNA-binding specificity and affinity of the dimeric human transcription factor (TF) STAT1, were assessed by total internal reflectance fluorescence protein-binding microarrays (TIRF-PBM) to evaluate the effects of protein phosphorylation, higher-order polymerization and small-molecule inhibition. Active, phosphorylated STAT1 showed binding preferences consistent with prior characterization, whereas unphosphorylated STAT1 showed a weak-binding preference for one-half of the GAS consensus site, consistent with recent models of STAT1 structure and function in response to phosphorylation. This altered-binding preference was further tested by use of the inhibitor LLL3, which we show to disrupt STAT1 binding in a sequence-dependent fashion. To determine if this sequence-dependence is specific to STAT1 and not a general feature of human TF biology, the TF Myc/Max was analysed and tested with the inhibitor Mycro3. Myc/Max inhibition by Mycro3 is sequence independent, suggesting that the sequence-dependent inhibition of STAT1 may be specific to this system and a useful target for future inhibitor design. PMID:23180800
Phylogenetic and Protein Sequence Analysis of Bacterial Chemoreceptors.

PubMed

Ortega, Davi R; Zhulin, Igor B

2018-01-01

Identifying chemoreceptors in sequenced bacterial genomes, revealing their domain architecture, inferring their evolutionary relationships, and comparing them to chemoreceptors of known function become important steps in genome annotation and chemotaxis research. Here, we describe bioinformatics procedures that enable such analyses, using two closely related bacterial genomes as examples.
RNA-ID, a Powerful Tool for Identifying and Characterizing Regulatory Sequences.

PubMed

Brule, C E; Dean, K M; Grayhack, E J

2016-01-01

The identification and analysis of sequences that regulate gene expression is critical because regulated gene expression underlies biology. RNA-ID is an efficient and sensitive method to discover and investigate regulatory sequences in the yeast Saccharomyces cerevisiae, using fluorescence-based assays to detect green fluorescent protein (GFP) relative to a red fluorescent protein (RFP) control in individual cells. Putative regulatory sequences can be inserted either in-frame or upstream of a superfolder GFP fusion protein whose expression, like that of RFP, is driven by the bidirectional GAL1,10 promoter. In this chapter, we describe the methodology to identify and study cis-regulatory sequences in the RNA-ID system, explaining features and variations of the RNA-ID reporter, as well as some applications of this system. We describe in detail the methods to analyze a single regulatory sequence, from construction of a single GFP variant to assay of variants by flow cytometry, as well as modifications required to screen libraries of different strains simultaneously. We also describe subsequent analyses of regulatory sequences. © 2016 Elsevier Inc. All rights reserved.
Interaction between core protein of classical swine fever virus with cellular IQGAP1 proetin appears essential for virulence in swine

USDA-ARS?s Scientific Manuscript database

Here we show that IQGAP1, a cellular protein that plays a pivotal role as a regulator of the cytoskeleton affecting cell adhesion, polarization and migration, interacts with Classical Swine Fever Virus (CSFV) Core protein. Sequence analyses identified a defined set of residues within CSFV Core prote...
Homology analyses of the protein sequences of fatty acid synthases from chicken liver, rat mammary gland, and yeast

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chang, Soo-Ik; Hammes, G.G.

1989-11-01

Homology analyses of the protein sequences of chicken liver and rat mammary gland fatty acid synthases were carried out. The amino acid sequences of the chicken and rat enzymes are 67% identical. If conservative substitutions are allowed, 78% of the amino acids are matched. A region of low homologies exists between the functional domains, in particular around amino acid residues 1059-1264 of the chicken enzyme. Homologies between the active sites of chicken and rat and of chicken and yeast enzymes have been analyzed by an alignment method. A high degree of homology exists between the active sites of the chickenmore » and rat enzymes. However, the chicken and yeast enzymes show a lower degree of homology. The DADPH-binding dinucleotide folds of the {beta}-ketoacyl reductase and the enoyl reductase sites were identified by comparison with a known consensus sequence for the DADP- and FAD-binding dinucleotide folds. The active sites of all of the enzymes are primarily in hydrophobic regions of the protein. This study suggests that the genes for the functional domains of fatty acid synthase were originally separated, and these genes were connected to each other by using different connecting nucleotide sequences in different species. An alternative explanation for the differences in rat and chicken is a common ancestry and mutations in the joining regions during evolution.« less
Mapping the miRNA interactome by crosslinking ligation and sequencing of hybrids (CLASH)

PubMed Central

Helwak, Aleksandra; Tollervey, David

2014-01-01

RNA-RNA interactions play critical roles in many cellular processes but studying them is difficult and laborious. Here, we describe an experimental procedure, termed crosslinking ligation and sequencing of hybrids (CLASH), which allows high-throughput identification of sites of RNA-RNA interaction. During CLASH, a tagged bait protein is UV crosslinked in vivo to stabilise RNA interactions and purified under denaturing conditions. RNAs associated with the bait protein are partially truncated, and the ends of RNA-duplexes are ligated together. Following linker addition, cDNA library preparation and high-throughput sequencing, the ligated duplexes give rise to chimeric cDNAs, which unambiguously identify RNA-RNA interaction sites independent of bioinformatic predictions. This protocol is optimized for studying miRNA targets bound by Argonaute proteins, but should be easily adapted for other RNA-binding proteins and classes of RNA. The protocol requires around 5 days to complete, excluding the time required for high-throughput sequencing and bioinformatic analyses. PMID:24577361
Integrated databanks access and sequence/structure analysis services at the PBIL.

PubMed

Perrière, Guy; Combet, Christophe; Penel, Simon; Blanchet, Christophe; Thioulouse, Jean; Geourjon, Christophe; Grassot, Julien; Charavay, Céline; Gouy, Manolo; Duret, Laurent; Deléage, Gilbert

2003-07-01

The World Wide Web server of the PBIL (Pôle Bioinformatique Lyonnais) provides on-line access to sequence databanks and to many tools of nucleic acid and protein sequence analyses. This server allows to query nucleotide sequence banks in the EMBL and GenBank formats and protein sequence banks in the SWISS-PROT and PIR formats. The query engine on which our data bank access is based is the ACNUC system. It allows the possibility to build complex queries to access functional zones of biological interest and to retrieve large sequence sets. Of special interest are the unique features provided by this system to query the data banks of gene families developed at the PBIL. The server also provides access to a wide range of sequence analysis methods: similarity search programs, multiple alignments, protein structure prediction and multivariate statistics. An originality of this server is the integration of these two aspects: sequence retrieval and sequence analysis. Indeed, thanks to the introduction of re-usable lists, it is possible to perform treatments on large sets of data. The PBIL server can be reached at: http://pbil.univ-lyon1.fr.
Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus

PubMed Central

Labudde, Dirk

2015-01-01

The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations. PMID:26180540
Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus.

PubMed

Grunert, Steffen; Labudde, Dirk

2015-01-01

The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations.
Microsporidia, amitochondrial protists, possess a 70-kDa heat shock protein gene of mitochondrial evolutionary origin.

PubMed

Peyretaillade, E; Broussolle, V; Peyret, P; Méténier, G; Gouy, M; Vivarès, C P

1998-06-01

An intronless gene encoding a protein of 592 amino acid residues with similarity to 70-kDa heat shock proteins (HSP70s) has been cloned and sequenced from the amitochondrial protist Encephalitozoon cuniculi (phylum Microsporidia). Southern blot analyses show the presence of a single gene copy located on chromosome XI. The encoded protein exhibits an N-terminal hydrophobic leader sequence and two motifs shared by proteobacterial and mitochondrially expressed HSP70 homologs. Phylogenetic analysis using maximum likelihood and evolutionary distances place the E. cuniculi sequence in the cluster of mitochondrially expressed HSP70s, with a higher evolutionary rate than those of homologous sequences. Similar results were obtained after cloning a fragment of the homologous gene in the closely related species E. hellem. The presence of a nuclear targeting signal-like sequence supports a role of the Encephalitozoon HSP70 as a molecular chaperone of nuclear proteins. No evidence for cytosolic or endoplasmic reticulum forms of HSP70 was obtained through PCR amplification. These data suggest that Encephalitozoon species have evolved from an ancestor bearing mitochondria, which is in disagreement with the postulated presymbiotic origin of Microsporidia. The specific role and intracellular localization of the mitochondrial HSP70-like protein remain to be elucidated.
Complete genomic sequence of Powassan virus: evaluation of genetic elements in tick-borne versus mosquito-borne flaviviruses.

PubMed

Mandl, C W; Holzmann, H; Kunz, C; Heinz, F X

1993-05-01

The complete nucleotide sequence of the positive-stranded RNA genome of the tick-borne flavivirus Powassan (10,839 nucleotides) was elucidated and the amino acid sequence of all viral proteins was derived. Based on this sequence as well as serological data, Powassan virus represents the most divergent member of the tick-borne serocomplex within the genus flaviviruses, family Flaviviridae. The primary nucleotide sequence and potential RNA secondary structures of the Powassan virus genome as well as the protein sequences and the reactivities of the virion with a panel of monoclonal antibodies were compared to other tick-borne and mosquito-borne flaviviruses. These analyses corroborated significant differences between tick-borne and mosquito-borne flaviviruses, but also emphasized structural elements that are conserved among both vector groups. The comparisons among tick-borne flaviviruses revealed conserved sequence elements that might represent important determinants of the tick-borne flavivirus phenotype.
HomPPI: a class of sequence homology based protein-protein interface prediction methods

PubMed Central

2011-01-01

Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Conclusions Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners. PMID:21682895
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes

PubMed Central

Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

2010-01-01

Motivation: Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith–Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid™, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. Availability: The database can be accessed through http://proteinworlddb.org Contact: otto@fiocruz.br PMID:20089515
Transcription factor IID in the Archaea: sequences in the Thermococcus celer genome would encode a product closely related to the TATA-binding protein of eukaryotes

NASA Technical Reports Server (NTRS)

Marsh, T. L.; Reich, C. I.; Whitelock, R. B.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

1994-01-01

The first step in transcription initiation in eukaryotes is mediated by the TATA-binding protein, a subunit of the transcription factor IID complex. We have cloned and sequenced the gene for a presumptive homolog of this eukaryotic protein from Thermococcus celer, a member of the Archaea (formerly archaebacteria). The protein encoded by the archaeal gene is a tandem repeat of a conserved domain, corresponding to the repeated domain in its eukaryotic counterparts. Molecular phylogenetic analyses of the two halves of the repeat are consistent with the duplication occurring before the divergence of the archael and eukaryotic domains. In conjunction with previous observations of similarity in RNA polymerase subunit composition and sequences and the finding of a transcription factor IIB-like sequence in Pyrococcus woesei (a relative of T. celer) it appears that major features of the eukaryotic transcription apparatus were well-established before the origin of eukaryotic cellular organization. The divergence between the two halves of the archael protein is less than that between the halves of the individual eukaryotic sequences, indicating that the average rate of sequence change in the archael protein has been less than in its eukaryotic counterparts. To the extent that this lower rate applies to the genome as a whole, a clearer picture of the early genes (and gene families) that gave rise to present-day genomes is more apt to emerge from the study of sequences from the Archaea than from the corresponding sequences from eukaryotes.
Sialotranscriptomics of Rhipicephalus zambeziensis reveals intricate expression profiles of secretory proteins and suggests tight temporal transcriptional regulation during blood-feeding.

PubMed

de Castro, Minique Hilda; de Klerk, Daniel; Pienaar, Ronel; Rees, D Jasper G; Mans, Ben J

2017-08-10

Ticks secrete a diverse mixture of secretory proteins into the host to evade its immune response and facilitate blood-feeding, making secretory proteins attractive targets for the production of recombinant anti-tick vaccines. The largely neglected tick species, Rhipicephalus zambeziensis, is an efficient vector of Theileria parva in southern Africa but its available sequence information is limited. Next generation sequencing has advanced sequence availability for ticks in recent years and has assisted the characterisation of secretory proteins. This study focused on the de novo assembly and annotation of the salivary gland transcriptome of R. zambeziensis and the temporal expression of secretory protein transcripts in female and male ticks, before the onset of feeding and during early and late feeding. The sialotranscriptome of R. zambeziensis yielded 23,631 transcripts from which 13,584 non-redundant proteins were predicted. Eighty-six percent of these contained a predicted start and stop codon and were estimated to be putatively full-length proteins. A fifth (2569) of the predicted proteins were annotated as putative secretory proteins and explained 52% of the expression in the transcriptome. Expression analyses revealed that 2832 transcripts were differentially expressed among feeding time points and 1209 between the tick sexes. The expression analyses further indicated that 57% of the annotated secretory protein transcripts were differentially expressed. Dynamic expression profiles of secretory protein transcripts were observed during feeding of female ticks. Whereby a number of transcripts were upregulated during early feeding, presumably for feeding site establishment and then during late feeding, 52% of these were downregulated, indicating that transcripts were required at specific feeding stages. This suggested that secretory proteins are under stringent transcriptional regulation that fine-tunes their expression in salivary glands during feeding. No open reading frames were predicted for 7947 transcripts. This class represented 17% of the differentially expressed transcripts, suggesting a potential transcriptional regulatory function of long non-coding RNA in tick blood-feeding. The assembled sialotranscriptome greatly expands the sequence availability of R. zambeziensis, assists in our understanding of the transcription of secretory proteins during blood-feeding and will be a valuable resource for future vaccine candidate selection.
Application of the major capsid protein as a marker of the phylogenetic diversity of Emiliania huxleyi viruses.

PubMed

Rowe, Janet M; Fabre, Marie-Françoise; Gobena, Daniel; Wilson, William H; Wilhelm, Steven W

2011-05-01

Studies of the Phycodnaviridae have traditionally relied on the DNA polymerase (pol) gene as a biomarker. However, recent investigations have suggested that the major capsid protein (MCP) gene may be a reliable phylogenetic biomarker. We used MCP gene amplicons gathered across the North Atlantic to assess the diversity of Emiliania huxleyi-infecting Phycodnaviridae. Nucleotide sequences were examined across >6000 km of open ocean, with comparisons between concentrates of the virus-size fraction of seawater and of lysates generated by exposing host strains to these same virus concentrates. Analyses revealed that many sequences were only sampled once, while several were over-represented. Analyses also revealed nucleotide sequences distinct from previous coastal isolates. Examination of lysed cultures revealed a new richness in phylogeny, as MCP sequences previously unrepresented within the existing collection of E. huxleyi viruses (EhV) were associated with viruses lysing cultures. Sequences were compared with previously described EhV MCP sequences from the North Sea and a Norwegian Fjord, as well as from the Gulf of Maine. Principal component analysis indicates that location-specific distinctions exist despite the presence of sequences common across these environments. Overall, this investigation provides new sequence data and an assessment on the use of the MCP gene. © 2011 Federation of European Microbiological Societies Published by Blackwell Publishing Ltd. All rights reserved.
An archaebacterial homologue of the essential eubacterial cell division protein FtsZ.

PubMed

Baumann, P; Jackson, S P

1996-06-25

Life falls into three fundamental domains--Archaea, Bacteria, and Eucarya (formerly archaebacteria, eubacteria, and eukaryotes,. respectively). Though Archaea lack nuclei and share many morphological features with Bacteria, molecular analyses, principally of the transcription and translation machineries, have suggested that Archaea are more related to Eucarya than to Bacteria. Currently, little is known about the archaeal cell division apparatus. In Bacteria, a crucial component of the cell division machinery is FtsZ, a GTPase that localizes to a ring at the site of septation. Interestingly, FtsZ is distantly related in sequence to eukaryotic tubulins, which also interact with GTP and are components of the eukaryotic cell cytoskeleton. By screening for the ability to bind radiolabeled nucleotides, we have identified a protein of the hyperthermophilic archaeon Pyrococcus woesei that interacts tightly and specifically with GTP. Furthermore, through screening an expression library of P. woesei genomic DNA, we have cloned the gene encoding this protein. Sequence comparisons reveal that the P. woesei GTP-binding protein is strikingly related in sequence to eubacterial FtsZ and is marginally more similar to eukaryotic tubulins than are bacterial FtsZ proteins. Phylogenetic analyses reinforce the notion that there is an evolutionary linkage between FtsZ and tubulins. These findings suggest that the archaeal cell division apparatus may be fundamentally similar to that of Bacteria and lead us to consider the evolutionary relationships between Archaea, Bacteria, and Eucarya.
Multiplex single-molecule interaction profiling of DNA barcoded proteins

PubMed Central

Gu, Liangcai; Li, Chao; Aach, John; Hill, David E.; Vidal, Marc; Church, George M.

2014-01-01

In contrast with advances in massively parallel DNA sequencing1, high-throughput protein analyses2-4 are often limited by ensemble measurements, individual analyte purification and hence compromised quality and cost-effectiveness. Single-molecule (SM) protein detection achieved using optical methods5 is limited by the number of spectrally nonoverlapping chromophores. Here, we introduce a single molecular interaction-sequencing (SMI-Seq) technology for parallel protein interaction profiling leveraging SM advantages. DNA barcodes are attached to proteins collectively via ribosome display6 or individually via enzymatic conjugation. Barcoded proteins are assayed en masse in aqueous solution and subsequently immobilized in a polyacrylamide (PAA) thin film to construct a random SM array, where barcoding DNAs are amplified into in situ polymerase colonies (polonies)7 and analyzed by DNA sequencing. This method allows precise quantification of various proteins with a theoretical maximum array density of over one million polonies per square millimeter. Furthermore, protein interactions can be measured based on the statistics of colocalized polonies arising from barcoding DNAs of interacting proteins. Two demanding applications, G-protein coupled receptor (GPCR) and antibody binding profiling, were demonstrated. SMI-Seq enables “library vs. library” screening in a one-pot assay, simultaneously interrogating molecular binding affinity and specificity. PMID:25252978
Sequence analyses reveal that a TPR–DP module, surrounded by recombinable flanking introns, could be at the origin of eukaryotic Hop and Hip TPR–DP domains and prokaryotic GerD proteins

PubMed Central

Papandreou, Nikolaos; Chomilier, Jacques

2008-01-01

The co-chaperone Hop [heat shock protein (HSP) organising protein] is known to bind both Hsp70 and Hsp90. Hop comprises three repeats of a tetratricopeptide repeat (TPR) domain, each consisting of three TPR motifs. The first and last TPR domains are followed by a domain containing several dipeptide (DP) repeats called the DP domain. These analyses suggest that the hop genes result from successive recombination events of an ancestral TPR–DP module. From a hydrophobic cluster analysis of homologous Hop protein sequences derived from gene families, we can postulate that shifts in the open reading frames are at the origin of the present sequences. Moreover, these shifts can be related to the presence or absence of biological function. We propose to extend the family of Hop co-chaperons into the kingdom of bacteria, as several structurally related genes have been identified by hydrophobic cluster analysis. We also provide evidence of common structural characteristics between hop and hip genes, suggesting a shared precursor of ancestral TPR–DP domains. Electronic supplementary material The online version of this article (doi:10.1007/s12192-008-0083-8) contains supplementary material, which is available to authorized users. PMID:18987995

StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase.

PubMed

Zemla, Adam T; Lang, Dorothy M; Kostova, Tanya; Andino, Raul; Ecale Zhou, Carol L

2011-06-02

Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.
Programming molecular self-assembly of intrinsically disordered proteins containing sequences of low complexity

NASA Astrophysics Data System (ADS)

Simon, Joseph R.; Carroll, Nick J.; Rubinstein, Michael; Chilkoti, Ashutosh; López, Gabriel P.

2017-06-01

Dynamic protein-rich intracellular structures that contain phase-separated intrinsically disordered proteins (IDPs) composed of sequences of low complexity (SLC) have been shown to serve a variety of important cellular functions, which include signalling, compartmentalization and stabilization. However, our understanding of these structures and our ability to synthesize models of them have been limited. We present design rules for IDPs possessing SLCs that phase separate into diverse assemblies within droplet microenvironments. Using theoretical analyses, we interpret the phase behaviour of archetypal IDP sequences and demonstrate the rational design of a vast library of multicomponent protein-rich structures that ranges from uniform nano-, meso- and microscale puncta (distinct protein droplets) to multilayered orthogonally phase-separated granular structures. The ability to predict and program IDP-rich assemblies in this fashion offers new insights into (1) genetic-to-molecular-to-macroscale relationships that encode hierarchical IDP assemblies, (2) design rules of such assemblies in cell biology and (3) molecular-level engineering of self-assembled recombinant IDP-rich materials.
Characterization and N-terminal sequencing of a calcium binding protein from the calcareous concretion organic matrix of the terrestrial crustacean Orchestia cavimana.

PubMed

Luquet, G; Testenière, O; Graf, F

1996-04-16

We extracted proteins from the organic matrix of calcareous concretions, which represents the calcium storage form in a terrestrial crustacean. Electrophoretic analyses of water-soluble organic-matrix proteinaceous components revealed 11 polypeptides, 6 of which are probably glycosylated. Among the unglycosylated proteins, we characterized a 23 kDa polypeptide, with an isoelectric point of 5.5, which is able to bind calcium. Its N-terminal sequence is rich in acidic amino acids (essentially aspartic acid). All these characteristics suggest its involvement in the calcium precipitation process within the successive layers of the organic matrix.
A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics

PubMed Central

Tang, Haixu; Li, Sujun; Ye, Yuzhen

2016-01-01

Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro. PMID:27918579
Evolution and function of CAG/polyglutamine repeats in protein–protein interaction networks

PubMed Central

Schaefer, Martin H.; Wanker, Erich E.; Andrade-Navarro, Miguel A.

2012-01-01

Expanded runs of consecutive trinucleotide CAG repeats encoding polyglutamine (polyQ) stretches are observed in the genes of a large number of patients with different genetic diseases such as Huntington's and several Ataxias. Protein aggregation, which is a key feature of most of these diseases, is thought to be triggered by these expanded polyQ sequences in disease-related proteins. However, polyQ tracts are a normal feature of many human proteins, suggesting that they have an important cellular function. To clarify the potential function of polyQ repeats in biological systems, we systematically analyzed available information stored in sequence and protein interaction databases. By integrating genomic, phylogenetic, protein interaction network and functional information, we obtained evidence that polyQ tracts in proteins stabilize protein interactions. This happens most likely through structural changes whereby the polyQ sequence extends a neighboring coiled-coil region to facilitate its interaction with a coiled-coil region in another protein. Alteration of this important biological function due to polyQ expansion results in gain of abnormal interactions, leading to pathological effects like protein aggregation. Our analyses suggest that research on polyQ proteins should shift focus from expanded polyQ proteins into the characterization of the influence of the wild-type polyQ on protein interactions. PMID:22287626
Molecular Cloning and Characterization of cDNA Encoding a Putative Stress-Induced Heat-Shock Protein from Camelus dromedarius

PubMed Central

Elrobh, Mohamed S.; Alanazi, Mohammad S.; Khan, Wajahatullah; Abduljaleel, Zainularifeen; Al-Amri, Abdullah; Bazzi, Mohammad D.

2011-01-01

Heat shock proteins are ubiquitous, induced under a number of environmental and metabolic stresses, with highly conserved DNA sequences among mammalian species. Camelus dromedaries (the Arabian camel) domesticated under semi-desert environments, is well adapted to tolerate and survive against severe drought and high temperatures for extended periods. This is the first report of molecular cloning and characterization of full length cDNA of encoding a putative stress-induced heat shock HSPA6 protein (also called HSP70B′) from Arabian camel. A full-length cDNA (2417 bp) was obtained by rapid amplification of cDNA ends (RACE) and cloned in pET-b expression vector. The sequence analysis of HSPA6 gene showed 1932 bp-long open reading frame encoding 643 amino acids. The complete cDNA sequence of the Arabian camel HSPA6 gene was submitted to NCBI GeneBank (accession number HQ214118.1). The BLAST analysis indicated that C. dromedaries HSPA6 gene nucleotides shared high similarity (77–91%) with heat shock gene nucleotide of other mammals. The deduced 643 amino acid sequences (accession number ADO12067.1) showed that the predicted protein has an estimated molecular weight of 70.5 kDa with a predicted isoelectric point (pI) of 6.0. The comparative analyses of camel HSPA6 protein sequences with other mammalian heat shock proteins (HSPs) showed high identity (80–94%). Predicted camel HSPA6 protein structure using Protein 3D structural analysis high similarities with human and mouse HSPs. Taken together, this study indicates that the cDNA sequences of HSPA6 gene and its amino acid and protein structure from the Arabian camel are highly conserved and have similarities with other mammalian species. PMID:21845074
Analysis of functional redundancies within the Arabidopsis TCP transcription factor family

PubMed Central

Danisman, Selahattin; de Folter, Stefan; Immink, Richard G. H.

2013-01-01

Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein–protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein–protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family. PMID:24129704
TIMPs of parasitic helminths - a large-scale analysis of high-throughput sequence datasets.

PubMed

Cantacessi, Cinzia; Hofmann, Andreas; Pickering, Darren; Navarro, Severine; Mitreva, Makedonka; Loukas, Alex

2013-05-30

Tissue inhibitors of metalloproteases (TIMPs) are a multifunctional family of proteins that orchestrate extracellular matrix turnover, tissue remodelling and other cellular processes. In parasitic helminths, such as hookworms, TIMPs have been proposed to play key roles in the host-parasite interplay, including invasion of and establishment in the vertebrate animal hosts. Currently, knowledge of helminth TIMPs is limited to a small number of studies on canine hookworms, whereas no information is available on the occurrence of TIMPs in other parasitic helminths causing neglected diseases. In the present study, we conducted a large-scale investigation of TIMP proteins of a range of neglected human parasites including the hookworm Necator americanus, the roundworm Ascaris suum, the liver flukes Clonorchis sinensis and Opisthorchis viverrini, as well as the schistosome blood flukes. This entailed mining available transcriptomic and/or genomic sequence datasets for the presence of homologues of known TIMPs, predicting secondary structures of defined protein sequences, systematic phylogenetic analyses and assessment of differential expression of genes encoding putative TIMPs in the developmental stages of A. suum, N. americanus and Schistosoma haematobium which infect the mammalian hosts. A total of 15 protein sequences with high homology to known eukaryotic TIMPs were predicted from the complement of sequence data available for parasitic helminths and subjected to in-depth bioinformatic analyses. Supported by the availability of gene manipulation technologies such as RNA interference and/or transgenesis, this work provides a basis for future functional explorations of helminth TIMPs and, in particular, of their role/s in fundamental biological pathways linked to long-term establishment in the vertebrate hosts, with a view towards the development of novel approaches for the control of neglected helminthiases.
TIMPs of parasitic helminths – a large-scale analysis of high-throughput sequence datasets

PubMed Central

2013-01-01

Background Tissue inhibitors of metalloproteases (TIMPs) are a multifunctional family of proteins that orchestrate extracellular matrix turnover, tissue remodelling and other cellular processes. In parasitic helminths, such as hookworms, TIMPs have been proposed to play key roles in the host-parasite interplay, including invasion of and establishment in the vertebrate animal hosts. Currently, knowledge of helminth TIMPs is limited to a small number of studies on canine hookworms, whereas no information is available on the occurrence of TIMPs in other parasitic helminths causing neglected diseases. Methods In the present study, we conducted a large-scale investigation of TIMP proteins of a range of neglected human parasites including the hookworm Necator americanus, the roundworm Ascaris suum, the liver flukes Clonorchis sinensis and Opisthorchis viverrini, as well as the schistosome blood flukes. This entailed mining available transcriptomic and/or genomic sequence datasets for the presence of homologues of known TIMPs, predicting secondary structures of defined protein sequences, systematic phylogenetic analyses and assessment of differential expression of genes encoding putative TIMPs in the developmental stages of A. suum, N. americanus and Schistosoma haematobium which infect the mammalian hosts. Results A total of 15 protein sequences with high homology to known eukaryotic TIMPs were predicted from the complement of sequence data available for parasitic helminths and subjected to in-depth bioinformatic analyses. Conclusions Supported by the availability of gene manipulation technologies such as RNA interference and/or transgenesis, this work provides a basis for future functional explorations of helminth TIMPs and, in particular, of their role/s in fundamental biological pathways linked to long-term establishment in the vertebrate hosts, with a view towards the development of novel approaches for the control of neglected helminthiases. PMID:23721526
Candidate chemosensory ionotropic receptors in a Lepidoptera.

PubMed

Olivier, V; Monsempes, C; François, M-C; Poivet, E; Jacquin-Joly, E

2011-04-01

A new family of candidate chemosensory ionotropic receptors (IRs) related to ionotropic glutamate receptors (iGluRs) was recently discovered in Drosophila melanogaster. Through Blast analyses of an expressed sequenced tag library prepared from male antennae of the noctuid moth Spodoptera littoralis, we identified 12 unigenes encoding proteins related to D. melanogaster and Bombyx mori IRs. Their full length sequences were obtained and the analyses of their expression patterns suggest that they were exclusively expressed or clearly enriched in chemosensory organs. The deduced protein sequences were more similar to B. mori and D. melanogaster IRs than to iGluRs and showed considerable variations in the predicted ligand-binding domains; none have the three glutamate-interacting residues found in iGluRs, suggesting different binding specificities. Our data suggest that we identified members of the insect IR chemosensory receptor family in S. littoralis and we report here the first demonstration of IR expression in Lepidoptera. © 2010 The Authors. Insect Molecular Biology © 2010 The Royal Entomological Society.
Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

PubMed

Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

2015-01-01

In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.
Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

PubMed Central

Dasenko, Mark A.

2015-01-01

In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced. PMID:26716693
Sequence and structural analyses of nuclear export signals in the NESdb database

PubMed Central

Xu, Darui; Farmer, Alicia; Collett, Garen; Grishin, Nick V.; Chook, Yuh Min

2012-01-01

We compiled >200 nuclear export signal (NES)–containing CRM1 cargoes in a database named NESdb. We analyzed the sequences and three-dimensional structures of natural, experimentally identified NESs and of false-positive NESs that were generated from the database in order to identify properties that might distinguish the two groups of sequences. Analyses of amino acid frequencies, sequence logos, and agreement with existing NES consensus sequences revealed strong preferences for the Φ1-X3-Φ2-X2-Φ3-X-Φ4 pattern and for negatively charged amino acids in the nonhydrophobic positions of experimentally identified NESs but not of false positives. Strong preferences against certain hydrophobic amino acids in the hydrophobic positions were also revealed. These findings led to a new and more precise NES consensus. More important, three-dimensional structures are now available for 68 NESs within 56 different cargo proteins. Analyses of these structures showed that experimentally identified NESs are more likely than the false positives to adopt α-helical conformations that transition to loops at their C-termini and more likely to be surface accessible within their protein domains or be present in disordered or unobserved parts of the structures. Such distinguishing features for real NESs might be useful in future NES prediction efforts. Finally, we also tested CRM1-binding of 40 NESs that were found in the 56 structures. We found that 16 of the NES peptides did not bind CRM1, hence illustrating how NESs are easily misidentified. PMID:22833565
Molecular characterisation of Atlantic salmon paramyxovirus (ASPV): A novel paramyxovirus associated with proliferative gill inflammation

USGS Publications Warehouse

Falk, K.; Batts, W.N.; Kvellestad, A.; Kurath, G.; Wiik-Nielsen, J.; Winton, J.R.

2008-01-01

Atlantic salmon paramyxovirus (ASPV) was isolated in 1995 from gills of farmed Atlantic salmon suffering from proliferative gill inflammation. The complete genome sequence of ASPV was determined, revealing a genome 16,968 nucleotides in length consisting of six non-overlapping genes coding for the nucleo- (N), phospho- (P), matrix- (M), fusion- (F), haemagglutinin-neuraminidase- (HN) and large polymerase (L) proteins in the order 3???-N-P-M-F-HN-L-5???. The various conserved features related to virus replication found in most paramyxoviruses were also found in ASPV. These include: conserved and complementary leader and trailer sequences, tri-nucleotide intergenic regions and highly conserved transcription start and stop signal sequences. The P gene expression strategy of ASPV was like that of the respiro-, morbilli- and henipaviruses, which express the P and C proteins from the primary transcript and edit a portion of the mRNA to encode V and W proteins. Sequence similarities among various features related to virus replication, pairwise comparisons of all deduced ASPV protein sequences with homologous regions from other members of the family Paramyxoviridae, and phylogenetic analyses of these amino acid sequences suggested that ASPV was a novel member of the sub-family Paramyxovirinae, most closely related to the respiroviruses. ?? 2008 Elsevier B.V. All rights reserved.
Solid-phase proximity ligation assays for individual or parallel protein analyses with readout via real-time PCR or sequencing.

PubMed

Nong, Rachel Yuan; Wu, Di; Yan, Junhong; Hammond, Maria; Gu, Gucci Jijuan; Kamali-Moghaddam, Masood; Landegren, Ulf; Darmanis, Spyros

2013-06-01

Solid-phase proximity ligation assays share properties with the classical sandwich immunoassays for protein detection. The proteins captured via antibodies on solid supports are, however, detected not by single antibodies with detectable functions, but by pairs of antibodies with attached DNA strands. Upon recognition by these sets of three antibodies, pairs of DNA strands brought in proximity are joined by ligation. The ligated reporter DNA strands are then detected via methods such as real-time PCR or next-generation sequencing (NGS). We describe how to construct assays that can offer improved detection specificity by virtue of recognition by three antibodies, as well as enhanced sensitivity owing to reduced background and amplified detection. Finally, we also illustrate how the assays can be applied for parallel detection of proteins, taking advantage of the oligonucleotide ligation step to avoid background problems that might arise with multiplexing. The protocol for the singleplex solid-phase proximity ligation assay takes ~5 h. The multiplex version of the assay takes 7-8 h depending on whether quantitative PCR (qPCR) or sequencing is used as the readout. The time for the sequencing-based protocol includes the library preparation but not the actual sequencing, as times may vary based on the choice of sequencing platform.
Characterization of tannase protein sequences of bacteria and fungi: an in silico study.

PubMed

Banerjee, Amrita; Jana, Arijit; Pati, Bikash R; Mondal, Keshab C; Das Mohapatra, Pradeep K

2012-04-01

The tannase protein sequences of 149 bacteria and 36 fungi were retrieved from NCBI database. Among them only 77 bacterial and 31 fungal tannase sequences were taken which have different amino acid compositions. These sequences were analysed for different physical and chemical properties, superfamily search, multiple sequence alignment, phylogenetic tree construction and motif finding to find out the functional motif and the evolutionary relationship among them. The superfamily search for these tannase exposed the occurrence of proline iminopeptidase-like, biotin biosynthesis protein BioH, O-acetyltransferase, carboxylesterase/thioesterase 1, carbon-carbon bond hydrolase, haloperoxidase, prolyl oligopeptidase, C-terminal domain and mycobacterial antigens families and alpha/beta hydrolase superfamily. Some bacterial and fungal sequence showed similarity with different families individually. The multiple sequence alignment of these tannase protein sequences showed conserved regions at different stretches with maximum homology from amino acid residues 389-469 and 482-523 which could be used for designing degenerate primers or probes specific for tannase producing bacterial and fungal species. Phylogenetic tree showed two different clusters; one has only bacteria and another have both fungi and bacteria showing some relationship between these different genera. Although in second cluster near about all fungal species were found together in a corner which indicates the sequence level similarity among fungal genera. The distributions of fourteen motifs analysis revealed Motif 1 with a signature amino acid sequence of 29 amino acids, i.e. GCSTGGREALKQAQRWPHDYDGIIANNPA, was uniformly observed in 83.3 % of studied tannase sequences representing its participation with the structure and enzymatic function.
In silico analysis of β-mannanases and β-mannosidase from Aspergillus flavus and Trichoderma virens UKM1

NASA Astrophysics Data System (ADS)

Yee, Chai Sin; Murad, Abdul Munir Abdul; Bakar, Farah Diba Abu

2013-11-01

A gene encoding an endo-β-1,4-mannanase from Trichoderma virens UKM1 (manTV) and Aspergillus flavus UKM1 (manAF) was analysed with bioinformatic tools. In addition, A. flavus NRRL 3357 genome database was screened for a β-mannosidase gene and analysed (mndA-AF). These three genes were analysed to understand their gene properties. manTV and manAF both consists of 1,332-bp and 1,386-bp nucleotides encoding 443 and 461 amino acid residues, respectively. Both the endo-β-1,4-mannanases belong to the glycosyl hydrolase family 5 and contain a carbohydrate-binding module family 1 (CBM1). On the other hand, mndA-AF which is a 2,745-bp gene encodes a protein sequence of 914 amino acid residues. This β-mannosidase belongs to the glycosyl hydrolase family 2. Predicted molecular weight of manTV, manAF and mndA-AF are 47.74 kDa, 49.71 kDa and 103 kDa, respectively. All three predicted protein sequences possessed signal peptide sequence and are highly conserved among other fungal β-mannanases and β-mannosidases.
QuickProbs 2: Towards rapid construction of high-quality alignments of large protein families

PubMed Central

Gudyś, Adam; Deorowicz, Sebastian

2017-01-01

The ever-increasing size of sequence databases caused by the development of high throughput sequencing, poses to multiple alignment algorithms one of the greatest challenges yet. As we show, well-established techniques employed for increasing alignment quality, i.e., refinement and consistency, are ineffective when large protein families are investigated. We present QuickProbs 2, an algorithm for multiple sequence alignment. Based on probabilistic models, equipped with novel column-oriented refinement and selective consistency, it offers outstanding accuracy. When analysing hundreds of sequences, Quick-Probs 2 is noticeably better than ClustalΩ and MAFFT, the previous leaders for processing numerous protein families. In the case of smaller sets, for which consistency-based methods are the best performing, QuickProbs 2 is also superior to the competitors. Due to low computational requirements of selective consistency and utilization of massively parallel architectures, presented algorithm has similar execution times to ClustalΩ, and is orders of magnitude faster than full consistency approaches, like MSAProbs or PicXAA. All these make QuickProbs 2 an excellent tool for aligning families ranging from few, to hundreds of proteins. PMID:28139687
@TOME-2: a new pipeline for comparative modeling of protein–ligand complexes

PubMed Central

Pons, Jean-Luc; Labesse, Gilles

2009-01-01

@TOME 2.0 is new web pipeline dedicated to protein structure modeling and small ligand docking based on comparative analyses. @TOME 2.0 allows fold recognition, template selection, structural alignment editing, structure comparisons, 3D-model building and evaluation. These tasks are routinely used in sequence analyses for structure prediction. In our pipeline the necessary software is efficiently interconnected in an original manner to accelerate all the processes. Furthermore, we have also connected comparative docking of small ligands that is performed using protein–protein superposition. The input is a simple protein sequence in one-letter code with no comment. The resulting 3D model, protein–ligand complexes and structural alignments can be visualized through dedicated Web interfaces or can be downloaded for further studies. These original features will aid in the functional annotation of proteins and the selection of templates for molecular modeling and virtual screening. Several examples are described to highlight some of the new functionalities provided by this pipeline. The server and its documentation are freely available at http://abcis.cbs.cnrs.fr/AT2/ PMID:19443448
Apple Macintosh programs for nucleic and protein sequence analyses.

PubMed Central

Bellon, B

1988-01-01

This paper describes a package of programs for handling and analyzing nucleic acid and protein sequences using the Apple Macintosh microcomputer. There are three important features of these programs: first, because of the now classical Macintosh interface the programs can be easily used by persons with little or no computer experience. Second, it is possible to save all the data, written in an editable scrolling text window or drawn in a graphic window, as files that can be directly used either as word processing documents or as picture documents. Third, sequences can be easily exchanged with any other computer. The package is composed of thirteen programs, written in Pascal programming language. PMID:2832832

Identification and correction of abnormal, incomplete and mispredicted proteins in public databases.

PubMed

Nagy, Alinda; Hegyi, Hédi; Farkas, Krisztina; Tordai, Hedvig; Kozma, Evelin; Bányai, László; Patthy, László

2008-08-27

Despite significant improvements in computational annotation of genomes, sequences of abnormal, incomplete or incorrectly predicted genes and proteins remain abundant in public databases. Since the majority of incomplete, abnormal or mispredicted entries are not annotated as such, these errors seriously affect the reliability of these databases. Here we describe the MisPred approach that may provide an efficient means for the quality control of databases. The current version of the MisPred approach uses five distinct routines for identifying abnormal, incomplete or mispredicted entries based on the principle that a sequence is likely to be incorrect if some of its features conflict with our current knowledge about protein-coding genes and proteins: (i) conflict between the predicted subcellular localization of proteins and the absence of the corresponding sequence signals; (ii) presence of extracellular and cytoplasmic domains and the absence of transmembrane segments; (iii) co-occurrence of extracellular and nuclear domains; (iv) violation of domain integrity; (v) chimeras encoded by two or more genes located on different chromosomes. Analyses of predicted EnsEMBL protein sequences of nine deuterostome (Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Fugu rubripes, Danio rerio and Ciona intestinalis) and two protostome species (Caenorhabditis elegans and Drosophila melanogaster) have revealed that the absence of expected signal peptides and violation of domain integrity account for the majority of mispredictions. Analyses of sequences predicted by NCBI's GNOMON annotation pipeline show that the rates of mispredictions are comparable to those of EnsEMBL. Interestingly, even the manually curated UniProtKB/Swiss-Prot dataset is contaminated with mispredicted or abnormal proteins, although to a much lesser extent than UniProtKB/TrEMBL or the EnsEMBL or GNOMON-predicted entries. MisPred works efficiently in identifying errors in predictions generated by the most reliable gene prediction tools such as the EnsEMBL and NCBI's GNOMON pipelines and also guides the correction of errors. We suggest that application of the MisPred approach will significantly improve the quality of gene predictions and the associated databases.
CaMELS: In silico prediction of calmodulin binding proteins and their binding sites.

PubMed

Abbasi, Wajid Arshad; Asif, Amina; Andleeb, Saiqa; Minhas, Fayyaz Ul Amir Afsar

2017-09-01

Due to Ca 2+ -dependent binding and the sequence diversity of Calmodulin (CaM) binding proteins, identifying CaM interactions and binding sites in the wet-lab is tedious and costly. Therefore, computational methods for this purpose are crucial to the design of such wet-lab experiments. We present an algorithm suite called CaMELS (CalModulin intEraction Learning System) for predicting proteins that interact with CaM as well as their binding sites using sequence information alone. CaMELS offers state of the art accuracy for both CaM interaction and binding site prediction and can aid biologists in studying CaM binding proteins. For CaM interaction prediction, CaMELS uses protein sequence features coupled with a large-margin classifier. CaMELS models the binding site prediction problem using multiple instance machine learning with a custom optimization algorithm which allows more effective learning over imprecisely annotated CaM-binding sites during training. CaMELS has been extensively benchmarked using a variety of data sets, mutagenic studies, proteome-wide Gene Ontology enrichment analyses and protein structures. Our experiments indicate that CaMELS outperforms simple motif-based search and other existing methods for interaction and binding site prediction. We have also found that the whole sequence of a protein, rather than just its binding site, is important for predicting its interaction with CaM. Using the machine learning model in CaMELS, we have identified important features of protein sequences for CaM interaction prediction as well as characteristic amino acid sub-sequences and their relative position for identifying CaM binding sites. Python code for training and evaluating CaMELS together with a webserver implementation is available at the URL: http://faculty.pieas.edu.pk/fayyaz/software.html#camels. © 2017 Wiley Periodicals, Inc.
Structure and function of homodomain-leucine zipper (HD-Zip) proteins.

PubMed

Elhiti, Mohamed; Stasolla, Claudio

2009-02-01

Homeodomain-leucine zipper (HD-Zip) proteins are transcription factors unique to plants and are encoded by more than 25 genes in Arabidopsis thaliana. Based on sequence analyses these proteins have been classified into four distinct groups: HD-Zip I-IV. HD-Zip proteins are characterized by the presence of two functional domains; a homeodomain (HD) responsible for DNA binding and a leucine zipper domain (Zip) located immediately C-terminal to the homeodomain and involved in protein-protein interaction. Despite sequence similarities HD-ZIP proteins participate in a variety of processes during plant growth and development. HD-Zip I proteins are generally involved in responses related to abiotic stress, abscisic acid (ABA), blue light, de-etiolation and embryogenesis. HD-Zip II proteins participate in light response, shade avoidance and auxin signalling. Members of the third group (HD-Zip III) control embryogenesis, leaf polarity, lateral organ initiation and meristem function. HD-Zip IV proteins play significant roles during anthocyanin accumulation, differentiation of epidermal cells, trichome formation and root development.
Tutorial on Protein Ontology Resources

PubMed Central

Arighi, Cecilia; Drabkin, Harold; Christie, Karen R.; Ross, Karen; Natale, Darren

2017-01-01

The Protein Ontology (PRO) is the reference ontology for proteins in the Open Biomedical Ontologies (OBO) foundry and consists of three sub-ontologies representing protein classes of homologous genes, proteoforms (e.g., splice isoforms, sequence variants, and post-translationally modified forms), and protein complexes. PRO defines classes of proteins and protein complexes, both species-specific and species non-specific, and indicates their relationships in a hierarchical framework, supporting accurate protein annotation at the appropriate level of granularity, analyses of protein conservation across species, and semantic reasoning. In this first section of this chapter, we describe the PRO framework including categories of PRO terms and the relationship of PRO to other ontologies and protein resources. Next, we provide a tutorial about the PRO website (proconsortium.org) where users can browse and search the PRO hierarchy, view reports on individual PRO terms, and visualize relationships among PRO terms in a hierarchical table view, a multiple sequence alignment view, and a Cytoscape network view. Finally, we describe several examples illustrating the unique and rich information available in PRO. PMID:28150233
Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

PubMed Central

2012-01-01

Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742
Genomic perspectives of spider silk genes through target capture sequencing: Conservation of stabilization mechanisms and homology-based structural models of spidroin terminal regions.

PubMed

Collin, Matthew A; Clarke, Thomas H; Ayoub, Nadia A; Hayashi, Cheryl Y

2018-07-01

A powerful system for studying protein aggregation, particularly rapid self-assembly, is spider silk. Spider silks are proteinaceous and silk proteins are synthesized and stored within silk glands as liquid dope. As needed, liquid dope is near-instantaneously transformed into solid fibers or viscous adhesives. The dominant constituents of silks are spidroins (spider fibroins) and their terminal domains are vital for the tight control of silk self-assembly. To better understand spidroin termini, we used target capture and deep sequencing to identify spidroin gene sequences from six species representing the araneoid families of Araneidae, Nephilidae, and Theridiidae. We obtained 145 terminal regions, of which 103 are newly annotated here, as well as novel variants within nine diverse spidroin types. Our comparative analyses demonstrated the conservation of acidic, basic, and cysteine amino acid residues across spidroin types that had been proposed to be important for monomer stability, dimer formation, and self-assembly from a limited sampling of spidroins. Computational, protein homology modeling revealed areas of spidroin terminal regions that are highly conserved in three-dimensions despite sequence divergence across spidroin types. Analyses of our dense sampling of terminal regions suggest that most spidroins share stabilization mechanisms, dimer formation, and tertiary structure, despite producing functionally distinct materials. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Adjacent DNA sequences modulate Sox9 transcriptional activation at paired Sox sites in three chondrocyte-specific enhancer elements

PubMed Central

Bridgewater, Laura C.; Walker, Marlan D.; Miller, Gwen C.; Ellison, Trevor A.; Holsinger, L. Daniel; Potter, Jennifer L.; Jackson, Todd L.; Chen, Reuben K.; Winkel, Vicki L.; Zhang, Zhaoping; McKinney, Sandra; de Crombrugghe, Benoit

2003-01-01

Expression of the type XI collagen gene Col11a2 is directed to cartilage by at least three chondrocyte-specific enhancer elements, two in the 5′ region and one in the first intron of the gene. The three enhancers each contain two heptameric sites with homology to the Sox protein-binding consensus sequence. The two sites are separated by 3 or 4 bp and arranged in opposite orientation to each other. Targeted mutational analyses of these three enhancers showed that in the intronic enhancer, as in the other two enhancers, both Sox sites in a pair are essential for enhancer activity. The transcription factor Sox9 binds as a dimer at the paired sites, and the introduction of insertion mutations between the sites demonstrated that physical interactions between the adjacently bound proteins are essential for enhancer activity. Additional mutational analyses demonstrated that although Sox9 binding at the paired Sox sites is necessary for enhancer activity, it alone is not sufficient. Adjacent DNA sequences in each enhancer are also required, and mutation of those sequences can eliminate enhancer activity without preventing Sox9 binding. The data suggest a new model in which adjacently bound proteins affect the DNA bend angle produced by Sox9, which in turn determines whether an active transcriptional enhancer complex is assembled. PMID:12595563
Molecular cloning and nucleotide sequences of the genes for two essential proteins constituting a novel enzyme system for heptaprenyl diphosphate synthesis.

PubMed

Koike-Takeshita, A; Koyama, T; Obata, S; Ogura, K

1995-08-04

The genes encoding two dissociable components essential for Bacillus stearothermophilus heptaprenyl diphosphate synthase (all-trans-hexparenyl-diphosphate:isopentenyl-diphosphate hexaprenyl-trans-transferase, EC 2.5.1.30) were cloned, and their nucleotide sequences were determined. Sequence analyses revealed the presence of three open reading frames within 2,350 base pairs, designated as ORF-1, ORF-2, and ORF-3 in order of nucleotide sequence, which encode proteins of 220, 234, and 323 amino acids, respectively. Deletion experiments have shown that expression of the enzymatic activity requires the presence of ORF-1 and ORF-3, but ORF-2 is not essential. As a result, this enzyme was proved genetically to consist of two different protein compounds with molecular masses of 25 kDa (Component I) and 36 kDa (Component II), encoded by two of the three tandem genes. The protein encoded by ORF-1 has no similarity to any protein so far registered. However, the protein encoded by ORF-3 shows a 32% similarity to the farnesyl diphosphate synthase of the same bacterium and has seven highly conserved regions that have been shown typical in prenyltransferases (Koyama, T., Obata, S., Osabe, M., Takeshita, A., Yokoyama, K., Uchida, M., Nishino, T., and Ogura, K. (1993) J. Biochem. (Tokyo) 113, 355-363).
Horizontal gene transfer in Histophilus somni and its role in the evolution of pathogenic strain 2336, as determined by comparative genomic analyses

PubMed Central

2011-01-01

Background Pneumonia and myocarditis are the most commonly reported diseases due to Histophilus somni, an opportunistic pathogen of the reproductive and respiratory tracts of cattle. Thus far only a few genes involved in metabolic and virulence functions have been identified and characterized in H. somni using traditional methods. Analyses of the genome sequences of several Pasteurellaceae species have provided insights into their biology and evolution. In view of the economic and ecological importance of H. somni, the genome sequence of pneumonia strain 2336 has been determined and compared to that of commensal strain 129Pt and other members of the Pasteurellaceae. Results The chromosome of strain 2336 (2,263,857 bp) contained 1,980 protein coding genes, whereas the chromosome of strain 129Pt (2,007,700 bp) contained only 1,792 protein coding genes. Although the chromosomes of the two strains differ in size, their average GC content, gene density (total number of genes predicted on the chromosome), and percentage of sequence (number of genes) that encodes proteins were similar. The chromosomes of these strains also contained a number of discrete prophage regions and genomic islands. One of the genomic islands in strain 2336 contained genes putatively involved in copper, zinc, and tetracycline resistance. Using the genome sequence data and comparative analyses with other members of the Pasteurellaceae, several H. somni genes that may encode proteins involved in virulence (e.g., filamentous haemaggutinins, adhesins, and polysaccharide biosynthesis/modification enzymes) were identified. The two strains contained a total of 17 ORFs that encode putative glycosyltransferases and some of these ORFs had characteristic simple sequence repeats within them. Most of the genes/loci common to both the strains were located in different regions of the two chromosomes and occurred in opposite orientations, indicating genome rearrangement since their divergence from a common ancestor. Conclusions Since the genome of strain 129Pt was ~256,000 bp smaller than that of strain 2336, these genomes provide yet another paradigm for studying evolutionary gene loss and/or gain in regard to virulence repertoire and pathogenic ability. Analyses of the complete genome sequences revealed that bacteriophage- and transposon-mediated horizontal gene transfer had occurred at several loci in the chromosomes of strains 2336 and 129Pt. It appears that these mobile genetic elements have played a major role in creating genomic diversity and phenotypic variability among the two H. somni strains. PMID:22111657
Horizontal gene transfer in Histophilus somni and its role in the evolution of pathogenic strain 2336, as determined by comparative genomic analyses.

PubMed

Siddaramappa, Shivakumara; Challacombe, Jean F; Duncan, Alison J; Gillaspy, Allison F; Carson, Matthew; Gipson, Jenny; Orvis, Joshua; Zaitshik, Jeremy; Barnes, Gentry; Bruce, David; Chertkov, Olga; Detter, J Chris; Han, Cliff S; Tapia, Roxanne; Thompson, Linda S; Dyer, David W; Inzana, Thomas J

2011-11-23

Pneumonia and myocarditis are the most commonly reported diseases due to Histophilus somni, an opportunistic pathogen of the reproductive and respiratory tracts of cattle. Thus far only a few genes involved in metabolic and virulence functions have been identified and characterized in H. somni using traditional methods. Analyses of the genome sequences of several Pasteurellaceae species have provided insights into their biology and evolution. In view of the economic and ecological importance of H. somni, the genome sequence of pneumonia strain 2336 has been determined and compared to that of commensal strain 129Pt and other members of the Pasteurellaceae. The chromosome of strain 2336 (2,263,857 bp) contained 1,980 protein coding genes, whereas the chromosome of strain 129Pt (2,007,700 bp) contained only 1,792 protein coding genes. Although the chromosomes of the two strains differ in size, their average GC content, gene density (total number of genes predicted on the chromosome), and percentage of sequence (number of genes) that encodes proteins were similar. The chromosomes of these strains also contained a number of discrete prophage regions and genomic islands. One of the genomic islands in strain 2336 contained genes putatively involved in copper, zinc, and tetracycline resistance. Using the genome sequence data and comparative analyses with other members of the Pasteurellaceae, several H. somni genes that may encode proteins involved in virulence (e.g., filamentous haemaggutinins, adhesins, and polysaccharide biosynthesis/modification enzymes) were identified. The two strains contained a total of 17 ORFs that encode putative glycosyltransferases and some of these ORFs had characteristic simple sequence repeats within them. Most of the genes/loci common to both the strains were located in different regions of the two chromosomes and occurred in opposite orientations, indicating genome rearrangement since their divergence from a common ancestor. Since the genome of strain 129Pt was ~256,000 bp smaller than that of strain 2336, these genomes provide yet another paradigm for studying evolutionary gene loss and/or gain in regard to virulence repertoire and pathogenic ability. Analyses of the complete genome sequences revealed that bacteriophage- and transposon-mediated horizontal gene transfer had occurred at several loci in the chromosomes of strains 2336 and 129Pt. It appears that these mobile genetic elements have played a major role in creating genomic diversity and phenotypic variability among the two H. somni strains.
An archaebacterial homologue of the essential eubacterial cell division protein FtsZ.

PubMed Central

Baumann, P; Jackson, S P

1996-01-01

Life falls into three fundamental domains--Archaea, Bacteria, and Eucarya (formerly archaebacteria, eubacteria, and eukaryotes,. respectively). Though Archaea lack nuclei and share many morphological features with Bacteria, molecular analyses, principally of the transcription and translation machineries, have suggested that Archaea are more related to Eucarya than to Bacteria. Currently, little is known about the archaeal cell division apparatus. In Bacteria, a crucial component of the cell division machinery is FtsZ, a GTPase that localizes to a ring at the site of septation. Interestingly, FtsZ is distantly related in sequence to eukaryotic tubulins, which also interact with GTP and are components of the eukaryotic cell cytoskeleton. By screening for the ability to bind radiolabeled nucleotides, we have identified a protein of the hyperthermophilic archaeon Pyrococcus woesei that interacts tightly and specifically with GTP. Furthermore, through screening an expression library of P. woesei genomic DNA, we have cloned the gene encoding this protein. Sequence comparisons reveal that the P. woesei GTP-binding protein is strikingly related in sequence to eubacterial FtsZ and is marginally more similar to eukaryotic tubulins than are bacterial FtsZ proteins. Phylogenetic analyses reinforce the notion that there is an evolutionary linkage between FtsZ and tubulins. These findings suggest that the archaeal cell division apparatus may be fundamentally similar to that of Bacteria and lead us to consider the evolutionary relationships between Archaea, Bacteria, and Eucarya. Images Fig. 1 Fig. 2 PMID:8692886
Comparative analysis of amino acid composition in the active site of nirk gene encoding copper-containing nitrite reductase (CuNiR) in bacterial spp.

PubMed

Adhikari, Utpal Kumar; Rahman, M Mizanur

2017-04-01

The nirk gene encoding the copper-containing nitrite reductase (CuNiR), a key catalytic enzyme in the environmental denitrification process that helps to produce nitric oxide from nitrite. The molecular mechanism of denitrification process is definitely complex and in this case a theoretical investigation has been conducted to know the sequence information and amino acid composition of the active site of CuNiR enzyme using various Bioinformatics tools. 10 Fasta formatted sequences were retrieved from the NCBI database and the domain and disordered regions identification and phylogenetic analyses were done on these sequences. The comparative modeling of protein was performed through Modeller 9v14 program and visualized by PyMOL tools. Validated protein models were deposited in the Protein Model Database (PMDB) (PMDB id: PM0080150 to PM0080159). Active sites of nirk encoding CuNiR enzyme were identified by Castp server. The PROCHECK showed significant scores for four protein models in the most favored regions of the Ramachandran plot. Active sites and cavities prediction exhibited that the amino acid, namely Glycine, Alanine, Histidine, Aspartic acid, Glutamic acid, Threonine, and Glutamine were common in four predicted protein models. The present in silico study anticipates that active site analyses result will pave the way for further research on the complex denitrification mechanism of the selected species in the experimental laboratory. Copyright © 2016. Published by Elsevier Ltd.
[Complete genome sequencing and analyses of rabies viruses isolated from wild animals (Chinese Ferret-Badger) in Zhejiang province].

PubMed

Lei, Yong-Liang; Wang, Xiao-Guang; Liu, Fu-Ming; Chen, Xiu-Ying; Ye, Bi-Feng; Mei, Jian-Hua; Lan, Jin-Quan; Tang, Qing

2009-08-01

Based on sequencing the full-length genomes of two Chinese Ferret-Badger, we analyzed the properties of rabies viruses genetic variation in molecular level to get information on prevalence and variation of rabies viruses in Zhejiang, and to enrich the genome database of rabies viruses street strains isolated from Chinese wildlife. Overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses of the N genes from Chinese Ferret-Badger, sika deer, vole, dog. Vaccine strains were then determined. The two full-length genomes were completely sequenced to find out that they had the same genetic structure with 11 923 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions (IGRs), 423 nts-Pseudogene-like sequence (Psi), 70 nts-Trailer. The two full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by blast and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the two full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so that the nucleotide mutations happened in these two genomes were most probably as synonymous mutations. Compared to the referenced rabies viruses, the lengths of the five protein coding regions did not show any changes or recombination, but only with a few-point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the two ferret badgers genomes were similar to the referenced vaccine or street strains. The two strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessing the distinct geographyphic characteristics of China. All the evidence suggested a cue that these two ferret badgers rabies viruses were likely to be street virus that already circulating in wildlife.
ESTuber db: an online database for Tuber borchii EST sequences.

PubMed

Lazzari, Barbara; Caprera, Andrea; Cosentino, Cristian; Stella, Alessandra; Milanesi, Luciano; Viotti, Angelo

2007-03-08

The ESTuber database (http://www.itb.cnr.it/estuber) includes 3,271 Tuber borchii expressed sequence tags (EST). The dataset consists of 2,389 sequences from an in-house prepared cDNA library from truffle vegetative hyphae, and 882 sequences downloaded from GenBank and representing four libraries from white truffle mycelia and ascocarps at different developmental stages. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts. Data were collected in a MySQL database, which can be queried via a php-based web interface. Sequences included in the ESTuber db were clustered and annotated against three databases: the GenBank nr database, the UniProtKB database and a third in-house prepared database of fungi genomic sequences. An algorithm was implemented to infer statistical classification among Gene Ontology categories from the ontology occurrences deduced from the annotation procedure against the UniProtKB database. Ontologies were also deduced from the annotation of more than 130,000 EST sequences from five filamentous fungi, for intra-species comparison purposes. Further analyses were performed on the ESTuber db dataset, including tandem repeats search and comparison of the putative protein dataset inferred from the EST sequences to the PROSITE database for protein patterns identification. All the analyses were performed both on the complete sequence dataset and on the contig consensus sequences generated by the EST assembly procedure. The resulting web site is a resource of data and links related to truffle expressed genes. The Sequence Report and Contig Report pages are the web interface core structures which, together with the Text search utility and the Blast utility, allow easy access to the data stored in the database.
Circadian clock protein KaiC forms ATP-dependent hexameric rings and binds DNA

PubMed Central

Mori, Tetsuya; Saveliev, Sergei V.; Xu, Yao; Stafford, Walter F.; Cox, Michael M.; Inman, Ross B.; Johnson, Carl H.

2002-01-01

KaiC from Synechococcus elongatus PCC 7942 (KaiC) is an essential circadian clock protein in cyanobacteria. Previous sequence analyses suggested its inclusion in the RecA/DnaB superfamily. A characteristic of the proteins of this superfamily is that they form homohexameric complexes that bind DNA. We show here that KaiC also forms ring complexes with a central pore that can be visualized by electron microscopy. A combination of analytical ultracentrifugation and chromatographic analyses demonstrates that these complexes are hexameric. The association of KaiC molecules into hexamers depends on the presence of ATP. The KaiC sequence does not include the obvious DNA-binding motifs found in RecA or DnaB. Nevertheless, KaiC binds forked DNA substrates. These data support the inclusion of KaiC into the RecA/DnaB superfamily and have important implications for enzymatic activity of KaiC in the circadian clock mechanism that regulates global changes in gene expression patterns. PMID:12477935
Circadian clock protein KaiC forms ATP-dependent hexameric rings and binds DNA.

PubMed

Mori, Tetsuya; Saveliev, Sergei V; Xu, Yao; Stafford, Walter F; Cox, Michael M; Inman, Ross B; Johnson, Carl H

2002-12-24

KaiC from Synechococcus elongatus PCC 7942 (KaiC) is an essential circadian clock protein in cyanobacteria. Previous sequence analyses suggested its inclusion in the RecADnaB superfamily. A characteristic of the proteins of this superfamily is that they form homohexameric complexes that bind DNA. We show here that KaiC also forms ring complexes with a central pore that can be visualized by electron microscopy. A combination of analytical ultracentrifugation and chromatographic analyses demonstrates that these complexes are hexameric. The association of KaiC molecules into hexamers depends on the presence of ATP. The KaiC sequence does not include the obvious DNA-binding motifs found in RecA or DnaB. Nevertheless, KaiC binds forked DNA substrates. These data support the inclusion of KaiC into the RecADnaB superfamily and have important implications for enzymatic activity of KaiC in the circadian clock mechanism that regulates global changes in gene expression patterns.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae).

PubMed

Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

2016-04-01

Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae)

PubMed Central

Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

2016-01-01

Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans. PMID:27180575
Characterization of the complete mitochondrial genome of Marshallagia marshalli and phylogenetic implications for the superfamily Trichostrongyloidea.

PubMed

Sun, Miao-Miao; Han, Liang; Zhang, Fu-Kai; Zhou, Dong-Hui; Wang, Shu-Qing; Ma, Jun; Zhu, Xing-Quan; Liu, Guo-Hua

2018-01-01

Marshallagia marshalli (Nematoda: Trichostrongylidae) infection can lead to serious parasitic gastroenteritis in sheep, goat, and wild ruminant, causing significant socioeconomic losses worldwide. Up to now, the study concerning the molecular biology of M. marshalli is limited. Herein, we sequenced the complete mitochondrial (mt) genome of M. marshalli and examined its phylogenetic relationship with selected members of the superfamily Trichostrongyloidea using Bayesian inference (BI) based on concatenated mt amino acid sequence datasets. The complete mt genome sequence of M. marshalli is 13,891 bp, including 12 protein-coding genes, 22 transfer RNA genes, and 2 ribosomal RNA genes. All protein-coding genes are transcribed in the same direction. Phylogenetic analyses based on concatenated amino acid sequences of the 12 protein-coding genes supported the monophylies of the families Haemonchidae, Molineidae, and Dictyocaulidae with strong statistical support, but rejected the monophyly of the family Trichostrongylidae. The determination of the complete mt genome sequence of M. marshalli provides novel genetic markers for studying the systematics, population genetics, and molecular epidemiology of M. marshalli and its congeners.
The complete DNA sequence of lymphocystis disease virus.

PubMed

Tidona, C A; Darai, G

1997-04-14

Lymphocystis disease virus (LCDV) is the causative agent of lymphocystis disease, which has been reported to occur in over 100 different fish species worldwide. LCDV is a member of the family Iridoviridae and the type species of the genus Lymphocystivirus. The virions contain a single linear double-stranded DNA molecule, which is circularly permuted, terminally redundant, and heavily methylated at cytosines in CpG sequences. The complete nucleotide sequence of LCDV-1 (flounder isolate) was determined by automated cycle sequencing and primer walking. The genome of LCDV-1 is 102.653 bp in length and contains 195 open reading frames with coding capacities ranging from 40 to 1199 amino acids. Computer-assisted analyses of the deduced amino acid sequences led to the identification of several putative gene products with significant homologies to entries in protein data banks, such as the two major subunits of the viral DNA-dependent RNA polymerase, DNA polymerase, several protein kinases, two subunits of the ribonucleoside diphosphate reductase, DNA methyltransferase, the viral major capsid protein, insulin-like growth factor, and tumor necrosis factor receptor homolog.

GWFASTA: server for FASTA search in eukaryotic and microbial genomes.

PubMed

Issac, Biju; Raghava, G P S

2002-09-01

Similarity searches are a powerful method for solving important biological problems such as database scanning, evolutionary studies, gene prediction, and protein structure prediction. FASTA is a widely used sequence comparison tool for rapid database scanning. Here we describe the GWFASTA server that was developed to assist the FASTA user in similarity searches against partially and/or completely sequenced genomes. GWFASTA consists of more than 60 microbial genomes, eight eukaryote genomes, and proteomes of annotatedgenomes. Infact, it provides the maximum number of databases for similarity searching from a single platform. GWFASTA allows the submission of more than one sequence as a single query for a FASTA search. It also provides integrated post-processing of FASTA output, including compositional analysis of proteins, multiple sequences alignment, and phylogenetic analysis. Furthermore, it summarizes the search results organism-wise for prokaryotes and chromosome-wise for eukaryotes. Thus, the integration of different tools for sequence analyses makes GWFASTA a powerful toolfor biologists.
Surface Proteins of Gram-Positive Pathogens: Using Crystallography to Uncover Novel Features in Drug and Vaccine Candidates

NASA Astrophysics Data System (ADS)

Baker, Edward N.; Proft, Thomas; Kang, Haejoo

Proteins displayed on the cell surfaces of pathogenic organisms are the front-line troops of bacterial attack, playing critical roles in colonization, infection and virulence. Although such proteins can often be recognized from genome sequence data, through characteristic sequence motifs, their functions are often unknown. One such group of surface proteins is attached to the cell surface of Gram-positive pathogens through the action of sortase enzymes. Some of these proteins are now known to form pili: long filamentous structures that mediate attachment to human cells. Crystallographic analyses of these and other cell surface proteins have uncovered novel features in their structure, assembly and stability, including the presence of inter- and intramolecular isopeptide crosslinks. This improved understanding of structures on the bacterial cell surface offers opportunities for the development of some new drug targets and for novel approaches to vaccine design.
Structural flexibility and protein adaptation to temperature: Molecular dynamics analysis of malate dehydrogenases of marine molluscs.

PubMed

Dong, Yun-Wei; Liao, Ming-Ling; Meng, Xian-Liang; Somero, George N

2018-02-06

Orthologous proteins of species adapted to different temperatures exhibit differences in stability and function that are interpreted to reflect adaptive variation in structural "flexibility." However, quantifying flexibility and comparing flexibility across proteins has remained a challenge. To address this issue, we examined temperature effects on cytosolic malate dehydrogenase (cMDH) orthologs from differently thermally adapted congeners of five genera of marine molluscs whose field body temperatures span a range of ∼60 °C. We describe consistent patterns of convergent evolution in adaptation of function [temperature effects on K M of cofactor (NADH)] and structural stability (rate of heat denaturation of activity). To determine how these differences depend on flexibilities of overall structure and of regions known to be important in binding and catalysis, we performed molecular dynamics simulation (MDS) analyses. MDS analyses revealed a significant negative correlation between adaptation temperature and heat-induced increase of backbone atom movements [root mean square deviation (rmsd) of main-chain atoms]. Root mean square fluctuations (RMSFs) of movement by individual amino acid residues varied across the sequence in a qualitatively similar pattern among orthologs. Regions of sequence involved in ligand binding and catalysis-termed mobile regions 1 and 2 (MR1 and MR2), respectively-showed the largest values for RMSF. Heat-induced changes in RMSF values across the sequence and, importantly, in MR1 and MR2 were greatest in cold-adapted species. MDS methods are shown to provide powerful tools for examining adaptation of enzymes by providing a quantitative index of protein flexibility and identifying sequence regions where adaptive change in flexibility occurs.
Genetic diversity of the merozoite surface protein-3 gene in Plasmodium falciparum populations in Thailand.

PubMed

Pattaradilokrat, Sittiporn; Sawaswong, Vorthon; Simpalipan, Phumin; Kaewthamasorn, Morakot; Siripoon, Napaporn; Harnyuttanakorn, Pongchai

2016-10-21

An effective malaria vaccine is an urgently needed tool to fight against human malaria, the most deadly parasitic disease of humans. One promising candidate is the merozoite surface protein-3 (MSP-3) of Plasmodium falciparum. This antigenic protein, encoded by the merozoite surface protein (msp-3) gene, is polymorphic and classified according to size into the two allelic types of K1 and 3D7. A recent study revealed that both the K1 and 3D7 alleles co-circulated within P. falciparum populations in Thailand, but the extent of the sequence diversity and variation within each allelic type remains largely unknown. The msp-3 gene was sequenced from 59 P. falciparum samples collected from five endemic areas (Mae Hong Son, Kanchanaburi, Ranong, Trat and Ubon Ratchathani) in Thailand and analysed for nucleotide sequence diversity, haplotype diversity and deduced amino acid sequence diversity. The gene was also subject to population genetic analysis (F st ) and neutrality tests (Tajima's D, Fu and Li D* and Fu and Li' F* tests) to determine any signature of selection. The sequence analyses revealed eight unique DNA haplotypes and seven amino acid sequence variants, with a haplotype and nucleotide diversity of 0.828 and 0.049, respectively. Neutrality tests indicated that the polymorphism detected in the alanine heptad repeat region of MSP-3 was maintained by positive diversifying selection, suggesting its role as a potential target of protective immune responses and supporting its role as a vaccine candidate. Comparison of MSP-3 variants among parasite populations in Thailand, India and Nigeria also inferred a close genetic relationship between P. falciparum populations in Asia. This study revealed the extent of the msp-3 gene diversity in P. falciparum in Thailand, providing the fundamental basis for the better design of future blood stage malaria vaccines against P. falciparum.
Closely related dermatophyte species produce different patterns of secreted proteins.

PubMed

Giddey, Karin; Favre, Bertrand; Quadroni, Manfredo; Monod, Michel

2007-02-01

Dermatophytes are the most common infectious agents responsible for superficial mycosis in humans and animals. Various species in this group of fungi show overlapping characteristics. We investigated the possibility that closely related dermatophyte species with different behaviours secrete distinct proteins when grown in the same culture medium. Protein patterns from culture filtrates of several strains of the same species were very similar. In contrast, secreted protein profiles from various species were different, and so a specific signature could be associated with each of the six analysed species. In particular, protein patterns were useful to distinguish Trichophyton tonsurans from Trichophyton equinum, which cannot be differentiated by ribosomal DNA sequencing. The secreted proteases Sub2, Sub6 and Sub7 of the subtilisin family, as well as Mep3 and Mep4 of the fungalisin family were identified. SUB6, SUB7, MEP3 and MEP4 genes were cloned and sequenced. Although the protein sequence of each protease was highly conserved across species, their level of secretion by the various species was not equivalent. These results suggest that a switch of habitat could be related to a differential expression of genes encoding homologous secreted proteins.
Codon influence on protein expression in E. coli correlates with mRNA levels

PubMed Central

Boël, Grégory; Wong, Kam-Ho; Su, Min; Luff, Jon; Valecha, Mayank; Everett, John K.; Acton, Thomas B.; Xiao, Rong; Montelione, Gaetano T.; Aalberts, Daniel P.; Hunt, John F.

2016-01-01

Degeneracy in the genetic code, which enables a single protein to be encoded by a multitude of synonymous gene sequences, has an important role in regulating protein expression, but substantial uncertainty exists concerning the details of this phenomenon. Here we analyze the sequence features influencing protein expression levels in 6,348 experiments using bacteriophage T7 polymerase to synthesize messenger RNA in Escherichia coli. Logistic regression yields a new codon-influence metric that correlates only weakly with genomic codon-usage frequency, but strongly with global physiological protein concentrations and also mRNA concentrations and lifetimes in vivo. Overall, the codon content influences protein expression more strongly than mRNA-folding parameters, although the latter dominate in the initial ~16 codons. Genes redesigned based on our analyses are transcribed with unaltered efficiency but translated with higher efficiency in vitro. The less efficiently translated native sequences show greatly reduced mRNA levels in vivo. Our results suggest that codon content modulates a kinetic competition between protein elongation and mRNA degradation that is a central feature of the physiology and also possibly the regulation of translation in E. coli. PMID:26760206
ProtPhylo: identification of protein-phenotype and protein-protein functional associations via phylogenetic profiling.

PubMed

Cheng, Yiming; Perocchi, Fabiana

2015-07-01

ProtPhylo is a web-based tool to identify proteins that are functionally linked to either a phenotype or a protein of interest based on co-evolution. ProtPhylo infers functional associations by comparing protein phylogenetic profiles (co-occurrence patterns of orthology relationships) for more than 9.7 million non-redundant protein sequences from all three domains of life. Users can query any of 2048 fully sequenced organisms, including 1678 bacteria, 255 eukaryotes and 115 archaea. In addition, they can tailor ProtPhylo to a particular kind of biological question by choosing among four main orthology inference methods based either on pair-wise sequence comparisons (One-way Best Hits and Best Reciprocal Hits) or clustering of orthologous proteins across multiple species (OrthoMCL and eggNOG). Next, ProtPhylo ranks phylogenetic neighbors of query proteins or phenotypic properties using the Hamming distance as a measure of similarity between pairs of phylogenetic profiles. Candidate hits can be easily and flexibly prioritized by complementary clues on subcellular localization, known protein-protein interactions, membrane spanning regions and protein domains. The resulting protein list can be quickly exported into a csv text file for further analyses. ProtPhylo is freely available at http://www.protphylo.org. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Draft genome sequence for virulent and avirulent strains of Xanthomonas arboricola isolated from Prunus spp. in Spain.

PubMed

Garita-Cambronero, Jerson; Palacio-Bielsa, Ana; López, María M; Cubero, Jaime

2016-01-01

Xanthomonas arboricola is a species in genus Xanthomonas which is mainly comprised of plant pathogens. Among the members of this taxon, X. arboricola pv. pruni, the causal agent of bacterial spot disease of stone fruits and almond, is distributed worldwide although it is considered a quarantine pathogen in the European Union. Herein, we report the draft genome sequence, the classification, the annotation and the sequence analyses of a virulent strain, IVIA 2626.1, and an avirulent strain, CITA 44, of X. arboricola associated with Prunus spp. The draft genome sequence of IVIA 2626.1 consists of 5,027,671 bp, 4,720 protein coding genes and 50 RNA encoding genes. The draft genome sequence of strain CITA 44 consists of 4,760,482 bp, 4,250 protein coding genes and 56 RNA coding genes. Initial comparative analyses reveals differences in the presence of structural and regulatory components of the type IV pilus, the type III secretion system, the type III effectors as well as variations in the number of the type IV secretion systems. The genome sequence data for these strains will facilitate the development of molecular diagnostics protocols that differentiate virulent and avirulent strains. In addition, comparative genome analysis will provide insights into the plant-pathogen interaction during the bacterial spot disease process.
The primary structure of the thymidine kinase gene of fish lymphocystis disease virus.

PubMed

Schnitzler, P; Handermann, M; Szépe, O; Darai, G

1991-06-01

The DNA nucleotide sequence of the thymidine kinase (TK) gene of fish lymphocystis disease virus (FLDV) which has been localized between the coordinates 0.678 to 0.688 of the viral genome was determined. The analysis of the DNA nucleotide sequence located between the recognition sites of HindIII (0.669 map unit; nucleotide position 1) and AccI (nucleotide position 2032) revealed the presence of an open reading frame of 954 bp on the lower strand of this region between nucleotide positions 1868 (ATG) and 915 (TAA). It encodes for a protein of 318 amino acid residues. The evolutionary relationships of the TK gene of FLDV to the other known TK genes was investigated using the method of progressive sequence alignment. These analyses revealed a high degree of diversity between the protein sequence of FLDV TK gene and the amino acid composition of other TKs tested. However, significant conservations were detected at several regions of amino acid residues of the FLDV TK protein when compared to the amino acid sequence of TKs of African swine fever virus, fowlpox virus, shope fibroma virus, and vaccinia virus and to the amino acid sequences of the cellular cytoplasmic TK of chicken, mouse, and man.
SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments.

PubMed

Ajawatanawong, Pravech; Atkinson, Gemma C; Watson-Haigh, Nathan S; Mackenzie, Bryony; Baldauf, Sandra L

2012-07-01

Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at www.seqfire.org/.
An integrated native mass spectrometry and top-down proteomics method that connects sequence to structure and function of macromolecular complexes

NASA Astrophysics Data System (ADS)

Li, Huilin; Nguyen, Hong Hanh; Ogorzalek Loo, Rachel R.; Campuzano, Iain D. G.; Loo, Joseph A.

2018-02-01

Mass spectrometry (MS) has become a crucial technique for the analysis of protein complexes. Native MS has traditionally examined protein subunit arrangements, while proteomics MS has focused on sequence identification. These two techniques are usually performed separately without taking advantage of the synergies between them. Here we describe the development of an integrated native MS and top-down proteomics method using Fourier-transform ion cyclotron resonance (FTICR) to analyse macromolecular protein complexes in a single experiment. We address previous concerns of employing FTICR MS to measure large macromolecular complexes by demonstrating the detection of complexes up to 1.8 MDa, and we demonstrate the efficacy of this technique for direct acquirement of sequence to higher-order structural information with several large complexes. We then summarize the unique functionalities of different activation/dissociation techniques. The platform expands the ability of MS to integrate proteomics and structural biology to provide insights into protein structure, function and regulation.
Ribosomes slide on lysine-encoding homopolymeric A stretches

PubMed Central

Koutmou, Kristin S; Schuller, Anthony P; Brunelle, Julie L; Radhakrishnan, Aditya; Djuranovic, Sergej; Green, Rachel

2015-01-01

Protein output from synonymous codons is thought to be equivalent if appropriate tRNAs are sufficiently abundant. Here we show that mRNAs encoding iterated lysine codons, AAA or AAG, differentially impact protein synthesis: insertion of iterated AAA codons into an ORF diminishes protein expression more than insertion of synonymous AAG codons. Kinetic studies in E. coli reveal that differential protein production results from pausing on consecutive AAA-lysines followed by ribosome sliding on homopolymeric A sequence. Translation in a cell-free expression system demonstrates that diminished output from AAA-codon-containing reporters results from premature translation termination on out of frame stop codons following ribosome sliding. In eukaryotes, these premature termination events target the mRNAs for Nonsense-Mediated-Decay (NMD). The finding that ribosomes slide on homopolymeric A sequences explains bioinformatic analyses indicating that consecutive AAA codons are under-represented in gene-coding sequences. Ribosome ‘sliding’ represents an unexpected type of ribosome movement possible during translation. DOI: http://dx.doi.org/10.7554/eLife.05534.001 PMID:25695637
Using Synthetic Nanopores for Single-Molecule Analyses: Detecting SNPs, Trapping DNA Molecules, and the Prospects for Sequencing DNA

ERIC Educational Resources Information Center

Dimitrov, Valentin V.

2009-01-01

This work focuses on studying properties of DNA molecules and DNA-protein interactions using synthetic nanopores, and it examines the prospects of sequencing DNA using synthetic nanopores. We have developed a method for discriminating between alleles that uses a synthetic nanopore to measure the binding of a restriction enzyme to DNA. There exists…
Evolution and molecular epidemiology of classical swine fever virus during a multi-annual outbreak amongst European wild boar.

PubMed

Goller, Katja V; Gabriel, Claudia; Dimna, Mireille Le; Le Potier, Marie-Frédérique; Rossi, Sophie; Staubach, Christoph; Merboth, Matthias; Beer, Martin; Blome, Sandra

2016-03-01

Classical swine fever is a viral disease of pigs that carries tremendous socio-economic impact. In outbreak situations, genetic typing is carried out for the purpose of molecular epidemiology in both domestic pigs and wild boar. These analyses are usually based on harmonized partial sequences. However, for high-resolution analyses towards the understanding of genetic variability and virus evolution, full-genome sequences are more appropriate. In this study, a unique set of representative virus strains was investigated that was collected during an outbreak in French free-ranging wild boar in the Vosges-du-Nord mountains between 2003 and 2007. Comparative sequence and evolutionary analyses of the nearly full-length sequences showed only slow evolution of classical swine fever virus strains over the years and no impact of vaccination on mutation rates. However, substitution rates varied amongst protein genes; furthermore, a spatial and temporal pattern could be observed whereby two separate clusters were formed that coincided with physical barriers.
Import of honeybee prepromelittin into the endoplasmic reticulum: structural basis for independence of SRP and docking protein.

PubMed Central

Müller, G; Zimmermann, R

1987-01-01

Honeybee prepromelittin is correctly processed and imported by dog pancreas microsomes. Insertion of prepromelittin into microsomal membranes, as assayed by signal sequence removal, does not depend on signal recognition particle (SRP) and docking protein. We addressed the question as to how prepromelittin bypasses the SRP/docking protein system. Hybrid proteins between prepromelittin, or carboxy-terminally truncated derivatives, and the cytoplasmic protein dihydrofolate reductase from mouse were constructed. These hybrid proteins were analysed for membrane insertion and sequestration into microsomes. The results suggest the following: (i) The signal sequence of prepromelittin is capable of interacting with the SRP/docking protein system, but this interaction is not mandatory for membrane insertion; this is related to the small size of prepromelittin. (ii) In prepromelittin a cluster of negatively charged amino acids must be balanced by a cluster of positively charged amino acids in order to allow membrane insertion. (iii) In general, a signal sequence can be sufficient to mediate membrane insertion independently of SRP and docking protein in the case of short precursor proteins; however, the presence and distribution of charged amino acids within the mature part of these precursors can play distinct roles. Images Fig. 3. Fig. 4. Fig. 5. Fig. 6. Fig. 7. Fig. 8. Fig. 9. PMID:2820722
P2RP: a Web-based framework for the identification and analysis of regulatory proteins in prokaryotic genomes.

PubMed

Barakat, Mohamed; Ortet, Philippe; Whitworth, David E

2013-04-20

Regulatory proteins (RPs) such as transcription factors (TFs) and two-component system (TCS) proteins control how prokaryotic cells respond to changes in their external and/or internal state. Identification and annotation of TFs and TCSs is non-trivial, and between-genome comparisons are often confounded by different standards in annotation. There is a need for user-friendly, fast and convenient tools to allow researchers to overcome the inherent variability in annotation between genome sequences. We have developed the web-server P2RP (Predicted Prokaryotic Regulatory Proteins), which enables users to identify and annotate TFs and TCS proteins within their sequences of interest. Users can input amino acid or genomic DNA sequences, and predicted proteins therein are scanned for the possession of DNA-binding domains and/or TCS domains. RPs identified in this manner are categorised into families, unambiguously annotated, and a detailed description of their features generated, using an integrated software pipeline. P2RP results can then be outputted in user-specified formats. Biologists have an increasing need for fast and intuitively usable tools, which is why P2RP has been developed as an interactive system. As well as assisting experimental biologists to interrogate novel sequence data, it is hoped that P2RP will be built into genome annotation pipelines and re-annotation processes, to increase the consistency of RP annotation in public genomic sequences. P2RP is the first publicly available tool for predicting and analysing RP proteins in users' sequences. The server is freely available and can be accessed along with documentation at http://www.p2rp.org.
Comparative genomics and evolution of the amylase-binding proteins of oral streptococci.

PubMed

Haase, Elaine M; Kou, Yurong; Sabharwal, Amarpreet; Liao, Yu-Chieh; Lan, Tianying; Lindqvist, Charlotte; Scannapieco, Frank A

2017-04-20

Successful commensal bacteria have evolved to maintain colonization in challenging environments. The oral viridans streptococci are pioneer colonizers of dental plaque biofilm. Some of these bacteria have adapted to life in the oral cavity by binding salivary α-amylase, which hydrolyzes dietary starch, thus providing a source of nutrition. Oral streptococcal species bind α-amylase by expressing a variety of amylase-binding proteins (ABPs). Here we determine the genotypic basis of amylase binding where proteins of diverse size and function share a common phenotype. ABPs were detected in culture supernatants of 27 of 59 strains representing 13 oral Streptococcus species screened using the amylase-ligand binding assay. N-terminal sequences from ABPs of diverse size were obtained from 18 strains representing six oral streptococcal species. Genome sequencing and BLAST searches using N-terminal sequences, protein size, and key words identified the gene associated with each ABP. Among the sequenced ABPs, 14 matched amylase-binding protein A (AbpA), 6 matched amylase-binding protein B (AbpB), and 11 unique ABPs were identified as peptidoglycan-binding, glutamine ABC-type transporter, hypothetical, or choline-binding proteins. Alignment and phylogenetic analyses performed to ascertain evolutionary relationships revealed that ABPs cluster into at least six distinct, unrelated families (AbpA, AbpB, and four novel ABPs) with no phylogenetic evidence that one group evolved from another, and no single ancestral gene found within each group. AbpA-like sequences can be divided into five subgroups based on the N-terminal sequences. Comparative genomics focusing on the abpA gene locus provides evidence of horizontal gene transfer. The acquisition of an ABP by oral streptococci provides an interesting example of adaptive evolution.
Echinoderm phosphorylated matrix proteins UTMP16 and UTMP19 have different functions in sea urchin tooth mineralization.

PubMed

Alvares, Keith; Dixit, Saryu N; Lux, Elizabeth; Veis, Arthur

2009-09-18

Studies of mineralization of embryonic spicules and of the sea urchin genome have identified several putative mineralization-related proteins. These predicted proteins have not been isolated or confirmed in mature mineralized tissues. Mature Lytechinus variegatus teeth were demineralized with 0.6 N HCl after prior removal of non-mineralized constituents with 4.0 M guanidinium HCl. The HCl-extracted proteins were fractionated on ceramic hydroxyapatite and separated into bound and unbound pools. Gel electrophoresis compared the protein distributions. The differentially present bands were purified and digested with trypsin, and the tryptic peptides were separated by high pressure liquid chromatography. NH2-terminal sequences were determined by Edman degradation and compared with the genomic sequence bank data. Two of the putative mineralization-related proteins were found. Their complete amino acid sequences were cloned from our L. variegatus cDNA library. Apatite-binding UTMP16 was found to be present in two isoforms; both isoforms had a signal sequence, a Ser-Asp-rich extracellular matrix domain, and a transmembrane and cytosolic insertion sequence. UTMP19, although rich in Glu and Thr did not bind to apatite. It had neither signal peptide nor transmembrane domain but did have typical nuclear localization and nuclear exit signal sequences. Both proteins were phosphorylated and good substrates for phosphatase. Immunolocalization studies with anti-UTMP16 show it to concentrate at the syncytial membranes in contact with the mineral. On the basis of our TOF-SIMS analyses of magnesium ion and Asp mapping of the mineral phase composition, we speculate that UTMP16 may be important in establishing the high magnesium columns that fuse the calcite plates together to enhance the mechanical strength of the mineralized tooth.
ProteomeVis: a web app for exploration of protein properties from structure to sequence evolution across organisms' proteomes.

PubMed

Razban, Rostam M; Gilson, Amy I; Durfee, Niamh; Strobelt, Hendrik; Dinkla, Kasper; Choi, Jeong-Mo; Pfister, Hanspeter; Shakhnovich, Eugene I

2018-05-08

Protein evolution spans time scales and its effects span the length of an organism. A web app named ProteomeVis is developed to provide a comprehensive view of protein evolution in the S. cerevisiae and E. coli proteomes. ProteomeVis interactively creates protein chain graphs, where edges between nodes represent structure and sequence similarities within user-defined ranges, to study the long time scale effects of protein structure evolution. The short time scale effects of protein sequence evolution are studied by sequence evolutionary rate (ER) correlation analyses with protein properties that span from the molecular to the organismal level. We demonstrate the utility and versatility of ProteomeVis by investigating the distribution of edges per node in organismal protein chain universe graphs (oPCUGs) and putative ER determinants. S. cerevisiae and E. coli oPCUGs are scale-free with scaling constants of 1.79 and 1.56, respectively. Both scaling constants can be explained by a previously reported theoretical model describing protein structure evolution (Dokholyan et al., 2002). Protein abundance most strongly correlates with ER among properties in ProteomeVis, with Spearman correlations of -0.49 (p-value<10-10) and -0.46 (p-value<10-10) for S. cerevisiae and E. coli, respectively. This result is consistent with previous reports that found protein expression to be the most important ER determinant (Zhang and Yang, 2015). ProteomeVis is freely accessible at http://proteomevis.chem.harvard.edu. Supplementary data are available at Bioinformatics. shakhnovich@chemistry.harvard.edu.
StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zemla, A; Lang, D; Kostova, T

2010-11-29

Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitatemore » the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position.« less

Osteoblast-specific factor 2: cloning of a putative bone adhesion protein with homology with the insect protein fasciclin I.

PubMed Central

Takeshita, S; Kikuno, R; Tezuka, K; Amann, E

1993-01-01

A cDNA library prepared from the mouse osteoblastic cell line MC3T3-E1 was screened for the presence of specifically expressed genes by employing a combined subtraction hybridization/differential screening approach. A cDNA was identified and sequenced which encodes a protein designated osteoblast-specific factor 2 (OSF-2) comprising 811 amino acids. OSF-2 has a typical signal sequence, followed by a cysteine-rich domain, a fourfold repeated domain and a C-terminal domain. The protein lacks a typical transmembrane region. The fourfold repeated domain of OSF-2 shows homology with the insect protein fasciclin I. RNA analyses revealed that OSF-2 is expressed in bone and to a lesser extent in lung, but not in other tissues. Mouse OSF-2 cDNA was subsequently used as a probe to clone the human counterpart. Mouse and human OSF-2 show a high amino acid sequence conservation except for the signal sequence and two regions in the C-terminal domain in which 'in-frame' insertions or deletions are observed, implying alternative splicing events. On the basis of the amino acid sequence homology with fasciclin I, we suggest that OSF-2 functions as a homophilic adhesion molecule in bone formation. Images Figure 3 Figure 4 Figure 5 Figure 6 PMID:8363580
A Nitrile Hydratase in the Eukaryote Monosiga brevicollis

PubMed Central

Foerstner, Konrad U.; Doerks, Tobias; Muller, Jean; Raes, Jeroen; Bork, Peer

2008-01-01

Bacterial nitrile hydratase (NHases) are important industrial catalysts and waste water remediation tools. In a global computational screening of conventional and metagenomic sequence data for NHases, we detected the two usually separated NHase subunits fused in one protein of the choanoflagellate Monosiga brevicollis, a recently sequenced unicellular model organism from the closest sister group of Metazoa. This is the first time that an NHase is found in eukaryotes and the first time it is observed as a fusion protein. The presence of an intron, subunit fusion and expressed sequence tags covering parts of the gene exclude contamination and suggest a functional gene. Phylogenetic analyses and genomic context imply a probable ancient horizontal gene transfer (HGT) from proteobacteria. The newly discovered NHase might open biotechnological routes due to its unconventional structure, its new type of host and its apparent integration into eukaryotic protein networks. PMID:19096720
Identification and Characterization of Putative Integron-Like Elements of the Heavy-Metal-Hypertolerant Strains of Pseudomonas spp.

PubMed

Ciok, Anna; Adamczuk, Marcin; Bartosik, Dariusz; Dziewit, Lukasz

2016-11-28

Pseudomonas strains isolated from the heavily contaminated Lubin copper mine and Zelazny Most post-flotation waste reservoir in Poland were screened for the presence of integrons. This analysis revealed that two strains carried homologous DNA regions composed of a gene encoding a DNA_BRE_C domain-containing tyrosine recombinase (with no significant sequence similarity to other integrases of integrons) plus a three-component array of putative integron gene cassettes. The predicted gene cassettes encode three putative polypeptides with homology to (i) transmembrane proteins, (ii) GCN5 family acetyltransferases, and (iii) hypothetical proteins of unknown function (homologous proteins are encoded by the gene cassettes of several class 1 integrons). Comparative sequence analyses identified three structural variants of these novel integron-like elements within the sequenced bacterial genomes. Analysis of their distribution revealed that they are found exclusively in strains of the genus Pseudomonas .
Data on publications, structural analyses, and queries used to build and utilize the AlloRep database.

PubMed

Sousa, Filipa L; Parente, Daniel J; Hessman, Jacob A; Chazelle, Allen; Teichmann, Sarah A; Swint-Kruse, Liskin

2016-09-01

The AlloRep database (www.AlloRep.org) (Sousa et al., 2016) [1] compiles extensive sequence, mutagenesis, and structural information for the LacI/GalR family of transcription regulators. Sequence alignments are presented for >3000 proteins in 45 paralog subfamilies and as a subsampled alignment of the whole family. Phenotypic and biochemical data on almost 6000 mutants have been compiled from an exhaustive search of the literature; citations for these data are included herein. These data include information about oligomerization state, stability, DNA binding and allosteric regulation. Protein structural data for 65 proteins are presented as easily-accessible, residue-contact networks. Finally, this article includes example queries to enable the use of the AlloRep database. See the related article, "AlloRep: a repository of sequence, structural and mutagenesis data for the LacI/GalR transcription regulators" (Sousa et al., 2016) [1].
[Inverse PCR amplification of the complete major capsid protein gene of lymphocystis disease virus isolated from Rachycentron canadum and the phylogenetic analysis of the virus].

PubMed

Fu, Xiao-Zhe; Shi, Cun-Bin; Li, Ning-Qiu; Pan, Hou-Jun; Chang, Ou-Qin; Wu, Shu-Qin

2007-09-01

The major capsid protein of lymphocystis disease virus isolated from Rachycentron canadum (LCDV-rc) was amplified and analysed. The 457bp DNA core fragment was amplified with the degenerate primers designed according to the conserved sequences of MCP gene of iridoviruses, then the flaking sequences adjacent to the core region were amplified by inverse PCR, and the complete sequence was obtained by combining all of them. The open reading frame of the gene is 1380bp in length, encoding a putative protein of 459 aa with molecular weight 51.12 kD and pI 6.87. Constructing the phylogenetic tree for comparing the MCP amino acid of iridoviruses, the results indicated that LCDV-rc is most homologous to the other Lymphocystis viruses and all of them constitute a branch. Accordingly LCDV-rc is identified as Lymphocystivirus.
CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

PubMed Central

2012-01-01

Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920
Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis.

PubMed

Fu, Jianmin; Liu, Huimin; Hu, Jingjing; Liang, Yuqin; Liang, Jinjun; Wuyun, Tana; Tan, Xiaofeng

2016-01-01

Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp) genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp) in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.
Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis

PubMed Central

Hu, Jingjing; Liang, Yuqin; Liang, Jinjun; Wuyun, Tana; Tan, Xiaofeng

2016-01-01

Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp) genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros ‘Jinzaoshi’ were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp) in the cp genome of D. ‘Jinzaoshi’, support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales. PMID:27442423
Complete genome sequence of Menghai rhabdovirus, a novel mosquito-borne rhabdovirus from China.

PubMed

Sun, Qiang; Zhao, Qiumin; An, Xiaoping; Guo, Xiaofang; Zuo, Shuqing; Zhang, Xianglilan; Pei, Guangqian; Liu, Wenli; Cheng, Shi; Wang, Yunfei; Shu, Peng; Mi, Zhiqiang; Huang, Yong; Zhang, Zhiyi; Tong, Yigang; Zhou, Hongning; Zhang, Jiusong

2017-04-01

Menghai rhabdovirus (MRV) was isolated from Aedes albopictus in Menghai county of Yunnan Province, China, in August 2010. Whole-genome sequencing of MRV was performed using an Ion PGM™ Sequencer. We found that MRV is a single-stranded, negative-sense RNA virus. The complete genome of MRV has 10,744 nt, with short inverted repeat termini, encoding five typical rhabdovirus proteins (N, P, M, G, and L) and an additional small hypothetical protein. Nucleotide BLAST analysis using the BLASTn method showed that the genome sequence most similar to that of MRV is that of Arboretum virus (NC_025393.1), with a Max score of 322, query coverage of 14%, and 66% identity. Genomic and phylogenetic analyses both demonstrated that MRV should be considered a member of a novel species of the family Rhabdoviridae.
The maize stripe virus major noncapsid protein messenger RNA transcripts contain heterogeneous leader sequences at their 5' termini.

PubMed

Huiet, L; Feldstein, P A; Tsai, J H; Falk, B W

1993-12-01

Primer extension analyses and a PCR-based cloning strategy were used to identify and characterize 5' nucleotide sequences on the maize stripe virus (MStV) RNA4 mRNA transcripts encoding the major noncapsid protein (NCP). Direct RNA sequence analysis by primer extension showed that the NCP mRNA transcripts had 10-15 nucleotides beyond the 5' terminus of the MStV RNA4 nucleotide sequence. MStV genomic RNAs isolated from ribonucleoprotein particles (RNPs) lacked the additional 5' nucleotides. cDNA clones representing the 5' region of the mRNA transcripts were constructed, and the nucleotide sequences of the 5' regions were determined for 16 clones. Each was found to have a distinct 10-15 nucleotide sequence immediately 5' of the MStV RNA4 sequence. Eleven of 16 clones had the correct MStV RNA4 5' nucleotide sequence, while five showed minor variations at or near the 5' most MStV RNA4 nucleotide. These characteristics show strong similarities to other viral mRNA transcripts which are synthesized by cap snatching.
Variability and molecular typing of the woody-tree infecting prunus necrotic ringspot ilarvirus.

PubMed

Vasková, D; Petrzik, K; Karesová, R

2000-01-01

The 3'-part of the movement protein gene, the intergenic region and the complete coat protein gene of sixteen isolates of Prunus necrotic ringspot virus (PNRSV) from five different host species from the Czech Republic were sequenced in order to search for the bases of extensive variability of viroses caused by this pathogen. According to phylogenetic analyses all the 46 isolates sequenced to date split into three main groups, which correlated to a certain extend with their geographic origin. Modelled serological properties showed that all the new isolates belong to one serotype.
Virome Assembly and Annotation: A Surprise in the Namib Desert

PubMed Central

Hesse, Uljana; van Heusden, Peter; Kirby, Bronwyn M.; Olonade, Israel; van Zyl, Leonardo J.; Trindade, Marla

2017-01-01

Sequencing, assembly, and annotation of environmental virome samples is challenging. Methodological biases and differences in species abundance result in fragmentary read coverage; sequence reconstruction is further complicated by the mosaic nature of viral genomes. In this paper, we focus on biocomputational aspects of virome analysis, emphasizing latent pitfalls in sequence annotation. Using simulated viromes that mimic environmental data challenges we assessed the performance of five assemblers (CLC-Workbench, IDBA-UD, SPAdes, RayMeta, ABySS). Individual analyses of relevant scaffold length fractions revealed shortcomings of some programs in reconstruction of viral genomes with excessive read coverage (IDBA-UD, RayMeta), and in accurate assembly of scaffolds ≥50 kb (SPAdes, RayMeta, ABySS). The CLC-Workbench assembler performed best in terms of genome recovery (including highly covered genomes) and correct reconstruction of large scaffolds; and was used to assemble a virome from a copper rich site in the Namib Desert. We found that scaffold network analysis and cluster-specific read reassembly improved reconstruction of sequences with excessive read coverage, and that strict data filtering for non-viral sequences prior to downstream analyses was essential. In this study we describe novel viral genomes identified in the Namib Desert copper site virome. Taxonomic affiliations of diverse proteins in the dataset and phylogenetic analyses of circovirus-like proteins indicated links to the marine habitat. Considering additional evidence from this dataset we hypothesize that viruses may have been carried from the Atlantic Ocean into the Namib Desert by fog and wind, highlighting the impact of the extended environment on an investigated niche in metagenome studies. PMID:28167933
Proteomic analysis of the cyanobacterium of the Azolla symbiosis: identity, adaptation, and NifH modification.

PubMed

Ekman, Martin; Tollbäck, Petter; Bergman, Birgitta

2008-01-01

Cyanobacteria are able to form stable nitrogen-fixing symbioses with diverse eukaryotes. To extend our understanding of adaptations imposed by plant hosts, two-dimensional gel electrophoresis and mass spectrometry (MS) were used for comparative protein expression profiling of a cyanobacterium (cyanobiont) dwelling in leaf cavities of the water-fern Azolla filiculoides. Homology-based protein identification using peptide mass fingerprinting [matrix-assisted laser desorption ionization-time of flight (MALDI-TOF-MS)], tandem MS analyses, and sequence homology searches resulted in an identification success rate of 79% of proteins analysed in the unsequenced cyanobiont. Compared with a free-living strain, processes related to energy production, nitrogen and carbon metabolism, and stress-related functions were up-regulated in the cyanobiont while photosynthesis and metabolic turnover rates were down-regulated, stressing a slow heterotrophic mode of growth, as well as high heterocyst frequencies and nitrogen-fixing capacities. The first molecular data set on the nature of the NifH post-translational modification in cyanobacteria was also obtained: peptide mass spectra of the protein demonstrated the presence of a 300-400 Da protein modification localized to a specific 13 amino acid sequence, within the part of the protein that is ADP-ribosylated in other bacteria and close to the active site of nitrogenase. Furthermore, the distribution of the highest scoring database hits for the identified proteins points to the possibility of using proteomic data in taxonomy.
Isolation, expression, and chromosomal localization of the human mitochondrial capsule selenoprotein gene (MCSP)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Hanne; Schwemmer, M.; Tessmann, D.

1996-03-01

The mitochondrial capsule selenoprotein (MCS) (HGMW-approved symbol MCSP) is one of three proteins that are important for the maintenance and stabilization of the crescent structure of the sperm mitochondria. We describe here the isolation of a cDNA, the exon-intron organization, the expression, and the chromosomal localization of the human MCS gene. Nucleotide sequence analysis of the human and mouse MCS cDNAs reveals that the 5{prime}- and 3{prime}-untranslated sequences are more conserved (71%) than the coding sequences (59%). The open reading frame encodes a 116-amino-acid protein and lacks the UGA codons, which have been reported to encode the selenocysteines in themore » N-terminal of the deduced mouse protein. The deduced human protein shows a low degree of amino acid sequence identity to the mouse protein. The deduced human protein shows a low degree of amino acid sequence identity to the mouse protein (39%). The most striking homology lies in the dicysteine motifs. Northern and Southern zooblot analyses reveal that the MCS gene in human, baboon, and bovine is more conserved than its counterparts in mouse and rat. The single intron in the human MCS gene is approximately 6 kb and interrupts the 5{prime}-untranslated region at a position equivalent to that in the mouse and rat genes. Northern blot and in situ hybridization experiments demonstrate that the expression of the human MCS gene is restricted to haploid spermatids. The human gene was assigned to q21 of chromosome 1. 30 refs., 9 figs.« less
Deciphering the complexities of the wheat flour proteome using quantitative two-dimensional electrophoresis, three proteases and tandem mass spectrometry

PubMed Central

2011-01-01

Background Wheat flour is one of the world's major food ingredients, in part because of the unique end-use qualities conferred by the abundant glutamine- and proline-rich gluten proteins. Many wheat flour proteins also present dietary problems for consumers with celiac disease or wheat allergies. Despite the importance of these proteins it has been particularly challenging to use MS/MS to distinguish the many proteins in a flour sample and relate them to gene sequences. Results Grain from the extensively characterized spring wheat cultivar Triticum aestivum 'Butte 86' was milled to white flour from which proteins were extracted, then separated and quantified by 2-DE. Protein spots were identified by separate digestions with three proteases, followed by tandem mass spectrometry analysis of the peptides. The spectra were used to interrogate an improved protein sequence database and results were integrated using the Scaffold program. Inclusion of cultivar specific sequences in the database greatly improved the results, and 233 spots were identified, accounting for 93.1% of normalized spot volume. Identified proteins were assigned to 157 wheat sequences, many for proteins unique to wheat and nearly 40% from Butte 86. Alpha-gliadins accounted for 20.4% of flour protein, low molecular weight glutenin subunits 18.0%, high molecular weight glutenin subunits 17.1%, gamma-gliadins 12.2%, omega-gliadins 10.5%, amylase/protease inhibitors 4.1%, triticins 1.6%, serpins 1.6%, purinins 0.9%, farinins 0.8%, beta-amylase 0.5%, globulins 0.4%, other enzymes and factors 1.9%, and all other 3%. Conclusions This is the first successful effort to identify the majority of abundant flour proteins for a single wheat cultivar, relate them to individual gene sequences and estimate their relative levels. Many genes for wheat flour proteins are not expressed, so this study represents further progress in describing the expressed wheat genome. Use of cultivar-specific contigs helped to overcome the difficulties of matching peptides to gene sequences for members of highly similar, rapidly evolving storage protein families. Prospects for simplifying this process for routine analyses are discussed. The ability to measure expression levels for individual flour protein genes complements information gained from efforts to sequence the wheat genome and is essential for studies of effects of environment on gene expression. PMID:21314956
Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56 419 completely sequenced and manually annotated full-length cDNAs

PubMed Central

Takeda, Jun-ichi; Suzuki, Yutaka; Nakao, Mitsuteru; Barrero, Roberto A.; Koyanagi, Kanako O.; Jin, Lihua; Motono, Chie; Hata, Hiroko; Isogai, Takao; Nagai, Keiichi; Otsuki, Tetsuji; Kuryshev, Vladimir; Shionyu, Masafumi; Yura, Kei; Go, Mitiko; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Wiemann, Stefan; Nomura, Nobuo; Sugano, Sumio; Gojobori, Takashi; Imanishi, Tadashi

2006-01-01

We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants. PMID:16914452
Introduction to bioinformatics.

PubMed

Can, Tolga

2014-01-01

Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.
Generation and analysis of expressed sequence tags from the bone marrow of Chinese Sika deer.

PubMed

Yao, Baojin; Zhao, Yu; Zhang, Mei; Li, Juan

2012-03-01

Sika deer is one of the best-known and highly valued animals of China. Despite its economic, cultural, and biological importance, there has not been a large-scale sequencing project for Sika deer to date. With the ultimate goal of sequencing the complete genome of this organism, we first established a bone marrow cDNA library for Sika deer and generated a total of 2,025 reads. After processing the sequences, 2,017 high-quality expressed sequence tags (ESTs) were obtained. These ESTs were assembled into 1,157 unigenes, including 238 contigs and 919 singletons. Comparative analyses indicated that 888 (76.75%) of the unigenes had significant matches to sequences in the non-redundant protein database, In addition to highly expressed genes, such as stearoyl-CoA desaturase, cytochrome c oxidase, adipocyte-type fatty acid-binding protein, adiponectin and thymosin beta-4, we also obtained vascular endothelial growth factor-A and heparin-binding growth-associated molecule, both of which are of great importance for angiogenesis research. There were 244 (21.09%) unigenes with no significant match to any sequence in current protein or nucleotide databases, and these sequences may represent genes with unknown function in Sika deer. Open reading frame analysis of the sequences was performed using the getorf program. In addition, the sequences were functionally classified using the gene ontology hierarchy, clusters of orthologous groups of proteins and Kyoto encyclopedia of genes and genomes databases. Analysis of ESTs described in this paper provides an important resource for the transcriptome exploration of Sika deer, and will also facilitate further studies on functional genomics, gene discovery and genome annotation of Sika deer.
Targeted Quantitation of Proteins by Mass Spectrometry

PubMed Central

2013-01-01

Quantitative measurement of proteins is one of the most fundamental analytical tasks in a biochemistry laboratory, but widely used immunochemical methods often have limited specificity and high measurement variation. In this review, we discuss applications of multiple-reaction monitoring (MRM) mass spectrometry, which allows sensitive, precise quantitative analyses of peptides and the proteins from which they are derived. Systematic development of MRM assays is permitted by databases of peptide mass spectra and sequences, software tools for analysis design and data analysis, and rapid evolution of tandem mass spectrometer technology. Key advantages of MRM assays are the ability to target specific peptide sequences, including variants and modified forms, and the capacity for multiplexing that allows analysis of dozens to hundreds of peptides. Different quantitative standardization methods provide options that balance precision, sensitivity, and assay cost. Targeted protein quantitation by MRM and related mass spectrometry methods can advance biochemistry by transforming approaches to protein measurement. PMID:23517332
Targeted quantitation of proteins by mass spectrometry.

PubMed

Liebler, Daniel C; Zimmerman, Lisa J

2013-06-04

Quantitative measurement of proteins is one of the most fundamental analytical tasks in a biochemistry laboratory, but widely used immunochemical methods often have limited specificity and high measurement variation. In this review, we discuss applications of multiple-reaction monitoring (MRM) mass spectrometry, which allows sensitive, precise quantitative analyses of peptides and the proteins from which they are derived. Systematic development of MRM assays is permitted by databases of peptide mass spectra and sequences, software tools for analysis design and data analysis, and rapid evolution of tandem mass spectrometer technology. Key advantages of MRM assays are the ability to target specific peptide sequences, including variants and modified forms, and the capacity for multiplexing that allows analysis of dozens to hundreds of peptides. Different quantitative standardization methods provide options that balance precision, sensitivity, and assay cost. Targeted protein quantitation by MRM and related mass spectrometry methods can advance biochemistry by transforming approaches to protein measurement.

Molecular interaction networks in the analyses of sequence variation and proteomics data.

PubMed

Stelzl, Ulrich

2013-12-01

Protein-protein interaction networks are typically generated in standard cell lines or model organisms as it is prohibitively difficult to record large interaction datasets from specific tissues or disease models at a reasonable pace. Although the interaction data are of high confidence, they thus do not reflect in vivo relationships as such. A wealth of physiologically relevant protein information, obtained under different conditions and from different systems, is available including information on genetic variation, protein levels, and PTMs. However, these data are difficult to assess comprehensively because the relationships between the entities remain elusive from the measurements. Here, we exemplarily highlight recent studies that gained deeper insight from genetic variation, protein, and PTM measurements using interaction information pointing toward the importance and potential of interaction networks for the interpretation of sequencing and proteomics data. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Applying the Concept of Peptide Uniqueness to Anti-Polio Vaccination.

PubMed

Kanduc, Darja; Fasano, Candida; Capone, Giovanni; Pesce Delfino, Antonella; Calabrò, Michele; Polimeno, Lorenzo

2015-01-01

Although rare, adverse events may associate with anti-poliovirus vaccination thus possibly hampering global polio eradication worldwide. To design peptide-based anti-polio vaccines exempt from potential cross-reactivity risks and possibly able to reduce rare potential adverse events such as the postvaccine paralytic poliomyelitis due to the tendency of the poliovirus genome to mutate. Proteins from poliovirus type 1, strain Mahoney, were analyzed for amino acid sequence identity to the human proteome at the pentapeptide level, searching for sequences that (1) have zero percent of identity to human proteins, (2) are potentially endowed with an immunologic potential, and (3) are highly conserved among poliovirus strains. Sequence analyses produced a set of consensus epitopic peptides potentially able to generate specific anti-polio immune responses exempt from cross-reactivity with the human host. Peptide sequences unique to poliovirus proteins and conserved among polio strains might help formulate a specific and universal anti-polio vaccine able to react with multiple viral strains and exempt from the burden of possible cross-reactions with human proteins. As an additional advantage, using a peptide-based vaccine instead of current anti-polio DNA vaccines would eliminate the rare post-polio poliomyelitis cases and other disabling symptoms that may appear following vaccination.
ST proteins, a new family of plant tandem repeat proteins with a DUF2775 domain mainly found in Fabaceae and Asteraceae.

PubMed

Albornos, Lucía; Martín, Ignacio; Iglesias, Rebeca; Jiménez, Teresa; Labrador, Emilia; Dopico, Berta

2012-11-07

Many proteins with tandem repeats in their sequence have been described and classified according to the length of the repeats: I) Repeats of short oligopeptides (from 2 to 20 amino acids), including structural cell wall proteins and arabinogalactan proteins. II) Repeats that range in length from 20 to 40 residues, including proteins with a well-established three-dimensional structure often involved in mediating protein-protein interactions. (III) Longer repeats in the order of 100 amino acids that constitute structurally and functionally independent units. Here we analyse ShooT specific (ST) proteins, a family of proteins with tandem repeats of unknown function that were first found in Leguminosae, and their possible similarities to other proteins with tandem repeats. ST protein sequences were only found in dicotyledonous plants, limited to several plant families, mainly the Fabaceae and the Asteraceae. ST mRNAs accumulate mainly in the roots and under biotic interactions. Most ST proteins have one or several Domain(s) of Unknown Function 2775 (DUF2775). All deduced ST proteins have a signal peptide, indicating that these proteins enter the secretory pathway, and the mature proteins have tandem repeat oligopeptides that share a hexapeptide (E/D)FEPRP followed by 4 partially conserved amino acids, which could determine a putative N-glycosylation signal, and a fully conserved tyrosine. In a phylogenetic tree, the sequences clade according to taxonomic group. A possible involvement in symbiosis and abiotic stress as well as in plant cell elongation is suggested, although different STs could play different roles in plant development. We describe a new family of proteins called ST whose presence is limited to the plant kingdom, specifically to a few families of dicotyledonous plants. They present 20 to 40 amino acid tandem repeat sequences with different characteristics (signal peptide, DUF2775 domain, conservative repeat regions) from the described group of 20 to 40 amino acid tandem repeat proteins and also from known cell wall proteins with repeat sequences. Several putative roles in plant physiology can be inferred from the characteristics found.
ST proteins, a new family of plant tandem repeat proteins with a DUF2775 domain mainly found in Fabaceae and Asteraceae

PubMed Central

2012-01-01

Background Many proteins with tandem repeats in their sequence have been described and classified according to the length of the repeats: I) Repeats of short oligopeptides (from 2 to 20 amino acids), including structural cell wall proteins and arabinogalactan proteins. II) Repeats that range in length from 20 to 40 residues, including proteins with a well-established three-dimensional structure often involved in mediating protein-protein interactions. (III) Longer repeats in the order of 100 amino acids that constitute structurally and functionally independent units. Here we analyse ShooT specific (ST) proteins, a family of proteins with tandem repeats of unknown function that were first found in Leguminosae, and their possible similarities to other proteins with tandem repeats. Results ST protein sequences were only found in dicotyledonous plants, limited to several plant families, mainly the Fabaceae and the Asteraceae. ST mRNAs accumulate mainly in the roots and under biotic interactions. Most ST proteins have one or several Domain(s) of Unknown Function 2775 (DUF2775). All deduced ST proteins have a signal peptide, indicating that these proteins enter the secretory pathway, and the mature proteins have tandem repeat oligopeptides that share a hexapeptide (E/D)FEPRP followed by 4 partially conserved amino acids, which could determine a putative N-glycosylation signal, and a fully conserved tyrosine. In a phylogenetic tree, the sequences clade according to taxonomic group. A possible involvement in symbiosis and abiotic stress as well as in plant cell elongation is suggested, although different STs could play different roles in plant development. Conclusions We describe a new family of proteins called ST whose presence is limited to the plant kingdom, specifically to a few families of dicotyledonous plants. They present 20 to 40 amino acid tandem repeat sequences with different characteristics (signal peptide, DUF2775 domain, conservative repeat regions) from the described group of 20 to 40 amino acid tandem repeat proteins and also from known cell wall proteins with repeat sequences. Several putative roles in plant physiology can be inferred from the characteristics found. PMID:23134664
Evolution of the vertebrate insulin receptor substrate (Irs) gene family.

PubMed

Al-Salam, Ahmad; Irwin, David M

2017-06-23

Insulin receptor substrate (Irs) proteins are essential for insulin signaling as they allow downstream effectors to dock with, and be activated by, the insulin receptor. A family of four Irs proteins have been identified in mice, however the gene for one of these, IRS3, has been pseudogenized in humans. While it is known that the Irs gene family originated in vertebrates, it is not known when it originated and which members are most closely related to each other. A better understanding of the evolution of Irs genes and proteins should provide insight into the regulation of metabolism by insulin. Multiple genes for Irs proteins were identified in a wide variety of vertebrate species. Phylogenetic and genomic neighborhood analyses indicate that this gene family originated very early in vertebrae evolution. Most Irs genes were duplicated and retained in fish after the fish-specific genome duplication. Irs genes have been lost of various lineages, including Irs3 in primates and birds and Irs1 in most fish. Irs3 and Irs4 experienced an episode of more rapid protein sequence evolution on the ancestral mammalian lineage. Comparisons of the conservation of the proteins sequences among Irs paralogs show that domains involved in binding to the plasma membrane and insulin receptors are most strongly conserved, while divergence has occurred in sequences involved in interacting with downstream effector proteins. The Irs gene family originated very early in vertebrate evolution, likely through genome duplications, and in parallel with duplications of other components of the insulin signaling pathway, including insulin and the insulin receptor. While the N-terminal sequences of these proteins are conserved among the paralogs, changes in the C-terminal sequences likely allowed changes in biological function.
Analysis and Prediction of Myristoylation Sites Using the mRMR Method, the IFS Method and an Extreme Learning Machine Algorithm.

PubMed

Wang, ShaoPeng; Zhang, Yu-Hang; Huang, GuoHua; Chen, Lei; Cai, Yu-Dong

2017-01-01

Myristoylation is an important hydrophobic post-translational modification that is covalently bound to the amino group of Gly residues on the N-terminus of proteins. The many diverse functions of myristoylation on proteins, such as membrane targeting, signal pathway regulation and apoptosis, are largely due to the lipid modification, whereas abnormal or irregular myristoylation on proteins can lead to several pathological changes in the cell. To better understand the function of myristoylated sites and to correctly identify them in protein sequences, this study conducted a novel computational investigation on identifying myristoylation sites in protein sequences. A training dataset with 196 positive and 84 negative peptide segments were obtained. Four types of features derived from the peptide segments following the myristoylation sites were used to specify myristoylatedand non-myristoylated sites. Then, feature selection methods including maximum relevance and minimum redundancy (mRMR), incremental feature selection (IFS), and a machine learning algorithm (extreme learning machine method) were adopted to extract optimal features for the algorithm to identify myristoylation sites in protein sequences, thereby building an optimal prediction model. As a result, 41 key features were extracted and used to build an optimal prediction model. The effectiveness of the optimal prediction model was further validated by its performance on a test dataset. Furthermore, detailed analyses were also performed on the extracted 41 features to gain insight into the mechanism of myristoylation modification. This study provided a new computational method for identifying myristoylation sites in protein sequences. We believe that it can be a useful tool to predict myristoylation sites from protein sequences. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Low-pass sequencing for microbial comparative genomics

PubMed Central

Goo, Young Ah; Roach, Jared; Glusman, Gustavo; Baliga, Nitin S; Deutsch, Kerry; Pan, Min; Kennedy, Sean; DasSarma, Shiladitya; Victor Ng, Wailap; Hood, Leroy

2004-01-01

Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich genome of H. sp. NRC-1. Identification of multiple TBP and TFB homologs in these four halophiles are consistent with the hypothesis that different types of complex transcriptional regulation may occur through multiple TBP-TFB combinations in response to rapidly changing environmental conditions. Low-pass shotgun sequence analyses of genomes permit extensive and diverse analyses, and should be generally useful for comparative microbial genomics. PMID:14718067
HotSpot Wizard 3.0: web server for automated design of mutations and smart libraries based on sequence input information.

PubMed

Sumbalova, Lenka; Stourac, Jan; Martinek, Tomas; Bednar, David; Damborsky, Jiri

2018-05-23

HotSpot Wizard is a web server used for the automated identification of hotspots in semi-rational protein design to give improved protein stability, catalytic activity, substrate specificity and enantioselectivity. Since there are three orders of magnitude fewer protein structures than sequences in bioinformatic databases, the major limitation to the usability of previous versions was the requirement for the protein structure to be a compulsory input for the calculation. HotSpot Wizard 3.0 now accepts the protein sequence as input data. The protein structure for the query sequence is obtained either from eight repositories of homology models or is modeled using Modeller and I-Tasser. The quality of the models is then evaluated using three quality assessment tools-WHAT_CHECK, PROCHECK and MolProbity. During follow-up analyses, the system automatically warns the users whenever they attempt to redesign poorly predicted parts of their homology models. The second main limitation of HotSpot Wizard's predictions is that it identifies suitable positions for mutagenesis, but does not provide any reliable advice on particular substitutions. A new module for the estimation of thermodynamic stabilities using the Rosetta and FoldX suites has been introduced which prevents destabilizing mutations among pre-selected variants entering experimental testing. HotSpot Wizard is freely available at http://loschmidt.chemi.muni.cz/hotspotwizard.
Wheat-specific gene, ribosomal protein l21, used as the endogenous reference gene for qualitative and real-time quantitative polymerase chain reaction detection of transgenes.

PubMed

Liu, Yi-Ke; Li, He-Ping; Huang, Tao; Cheng, Wei; Gao, Chun-Sheng; Zuo, Dong-Yun; Zhao, Zheng-Xi; Liao, Yu-Cai

2014-10-29

Wheat-specific ribosomal protein L21 (RPL21) is an endogenous reference gene suitable for genetically modified (GM) wheat identification. This taxon-specific RPL21 sequence displayed high homogeneity in different wheat varieties. Southern blots revealed 1 or 3 copies, and sequence analyses showed one amplicon in common wheat. Combined analyses with sequences from common wheat (AABBDD) and three diploid ancestral species, Triticum urartu (AA), Aegilops speltoides (BB), and Aegilops tauschii (DD), demonstrated the presence of this amplicon in the AA genome. Using conventional qualitative polymerase chain reaction (PCR), the limit of detection was 2 copies of wheat haploid genome per reaction. In the quantitative real-time PCR assay, limits of detection and quantification were about 2 and 8 haploid genome copies, respectively, the latter of which is 2.5-4-fold lower than other reported wheat endogenous reference genes. Construct-specific PCR assays were developed using RPL21 as an endogenous reference gene, and as little as 0.5% of GM wheat contents containing Arabidopsis NPR1 were properly quantified.
MEICPS: substitution mutations to engineer intracellular protein stability.

PubMed

Reddy, B V; Ramesh, P; Tiwari, S

1998-01-01

In MEICPS, results from earlier analyses are utilized to suggest possible substitution point mutations to engineer intracellular stability using a given sequence or structure of the protein. From bvbreddy@ccmb.ap.nic.in. This program needs data from other software, PSA and SSTRUC, available from sali@tamika.rockefeller.edu and tom@cryst.bioc.cam.ac.uk, respectively. bvbreddy@ccmb.ap.nic.in
Venomic and antivenomic analyses of the Central American coral snake, Micrurus nigrocinctus (Elapidae).

PubMed

Fernández, Julián; Alape-Girón, Alberto; Angulo, Yamileth; Sanz, Libia; Gutiérrez, José María; Calvete, Juan J; Lomonte, Bruno

2011-04-01

The proteome of the venom of Micrurus nigrocinctus (Central American coral snake) was analyzed by a "venomics" approach. Nearly 50 venom peaks were resolved by RP-HPLC, revealing a complex protein composition. Comparative analyses of venoms from individual specimens revealed that such complexity is an intrinsic feature of this species, rather than the sum of variable individual patterns of simpler composition. Proteins related to eight distinct families were identified by MS/MS de novo peptide sequencing or N-terminal sequencing: phospholipase A(2) (PLA(2)), three-finger toxin (3FTx), l-amino acid oxidase, C-type lectin/lectin-like, metalloproteinase, serine proteinase, ohanin, and nucleotidase. PLA(2)s and 3FTxs are predominant, representing 48 and 38% of the venom proteins, respectively. Within 3FTxs, several isoforms of short-chain α-neurotoxins as well as muscarinic-like toxins and proteins with similarity to long-chain κ-2 bungarotoxin were identified. PLA(2)s are also highly diverse, and a toxicity screening showed that they mainly exert myotoxicity, although some are lethal and may contribute to the known presynaptic neurotoxicity of this venom. An antivenomic characterization of a therapeutic monospecific M. nigrocinctus equine antivenom revealed differences in immunorecognition of venom proteins that correlate with their molecular mass, with the weakest recognition observed toward 3FTxs.
Sequence variation of koala retrovirus transmembrane protein p15E among koalas from different geographic regions.

PubMed

Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M; Greenwood, Alex D; Roca, Alfred L

2015-01-15

The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development. Copyright © 2014 Elsevier Inc. All rights reserved.
CoSMoS: Conserved Sequence Motif Search in the proteome

PubMed Central

Liu, Xiao I; Korde, Neeraj; Jakob, Ursula; Leichert, Lars I

2006-01-01

Background With the ever-increasing number of gene sequences in the public databases, generating and analyzing multiple sequence alignments becomes increasingly time consuming. Nevertheless it is a task performed on a regular basis by researchers in many labs. Results We have now created a database called CoSMoS to find the occurrences and at the same time evaluate the significance of sequence motifs and amino acids encoded in the whole genome of the model organism Escherichia coli K12. We provide a precomputed set of multiple sequence alignments for each individual E. coli protein with all of its homologues in the RefSeq database. The alignments themselves, information about the occurrence of sequence motifs together with information on the conservation of each of the more than 1.3 million amino acids encoded in the E. coli genome can be accessed via the web interface of CoSMoS. Conclusion CoSMoS is a valuable tool to identify highly conserved sequence motifs, to find regions suitable for mutational studies in functional analyses and to predict important structural features in E. coli proteins. PMID:16433915
'Candidatus Phytoplasma phoenicium' associated with almond witches'-broom disease: from draft genome to genetic diversity among strain populations.

PubMed

Quaglino, Fabio; Kube, Michael; Jawhari, Maan; Abou-Jawdah, Yusuf; Siewert, Christin; Choueiri, Elia; Sobh, Hana; Casati, Paola; Tedeschi, Rosemarie; Lova, Marina Molino; Alma, Alberto; Bianco, Piero Attilio

2015-07-30

Almond witches'-broom (AlmWB), a devastating disease of almond, peach and nectarine in Lebanon, is associated with 'Candidatus Phytoplasma phoenicium'. In the present study, we generated a draft genome sequence of 'Ca. P. phoenicium' strain SA213, representative of phytoplasma strain populations from different host plants, and determined the genetic diversity among phytoplasma strain populations by phylogenetic analyses of 16S rRNA, groEL, tufB and inmp gene sequences. Sequence-based typing and phylogenetic analysis of the gene inmp, coding an integral membrane protein, distinguished AlmWB-associated phytoplasma strains originating from diverse host plants, whereas their 16S rRNA, tufB and groEL genes shared 100 % sequence identity. Moreover, dN/dS analysis indicated positive selection acting on inmp gene. Additionally, the analysis of 'Ca. P. phoenicium' draft genome revealed the presence of integral membrane proteins and effector-like proteins and potential candidates for interaction with hosts. One of the integral membrane proteins was predicted as BI-1, an inhibitor of apoptosis-promoting Bax factor. Bioinformatics analyses revealed the presence of putative BI-1 in draft and complete genomes of other 'Ca. Phytoplasma' species. The genetic diversity within 'Ca. P. phoenicium' strain populations in Lebanon suggested that AlmWB disease could be associated with phytoplasma strains derived from the adaptation of an original strain to diverse hosts. Moreover, the identification of a putative inhibitor of apoptosis-promoting Bax factor (BI-1) in 'Ca. P. phoenicium' draft genome and within genomes of other 'Ca. Phytoplasma' species suggested its potential role as a phytoplasma fitness-increasing factor by modification of the host-defense response.
A novel NOTCH3 mutation identified in patients with oral cancer by whole exome sequencing.

PubMed

Yi, Yanjun; Tian, Zhuowei; Ju, Houyu; Ren, Guoxin; Hu, Jingzhou

2017-06-01

Oral cancer is a serious disease caused by environmental factors and/or susceptible genes. In the present study, in order to identify useful genetic biomarkers for cancer prediction and prevention, and for personalized treatment, we detected somatic mutations in 5 pairs of oral cancer tissues and blood samples using whole exome sequencing (WES). Finally, we confirmed a novel nonsense single-nucleotide polymorphism (SNP; chr19:15288426A>C) in the NOTCH3 gene with sanger sequencing, which resulted in a N1438T mutation in the protein sequence. Using multiple in silico analyses, this variant was found to mildly damaging effects on the NOTCH3 gene, which was supported by the results from analyses using PANTHER, SNAP and SNPs&GO. However, further analysis using Mutation Taster revealed that this SNP had a probability of 0.9997 to be 'disease causing'. In addition, we performed 3D structure simulation analysis and the results suggested that this variant had little effect on the solubility and hydrophobicity of the protein and thus on its function; however, it decreased the stability of the protein by increasing the total energy following minimization (-1,051.39 kcal/mol for the mutant and -1,229.84 kcal/mol for the native) and decreasing one stabilizing residue of the protein. Less stability of the N1438T mutant was also supported by analysis using I-Mutant with a DDG value of -1.67. Overall, the present study identified and confirmed a novel mutation in the NOTCH3 gene, which may decrease the stability of NOTCH3, and may thus prove to be helpful in cancer prognosis.
The 30 kDa protein co-purified with chick liver glutathione S-transferases is a carbonyl reductase.

PubMed

Tsai, S P; Wang, L Y; Yeh, H I; Tam, M F

1996-02-08

An unidentified 30 kDa protein was co-purified with chick liver glutathione S-transferases from S-hexylglutathione affinity column. The protein was isolated to apparent homogeneity with chromatofocusing. The molecular mass of the protein was determined to be 30 277 +/- 3 dalton by mass spectrometry. The protein was digested with Achromobacter proteinase I. Amino-acid sequence analyses of the resulting peptides show a high degree of identity with those of human carbonyl reductase. The protein is active with menadione as substrate. Thus, it is identified as chick liver carbonyl reductase.
Complete genome sequence of Fer-de-Lance Virus reveals a novel gene in reptilian Paramyxoviruses

USGS Publications Warehouse

Kurath, G.; Batts, W.N.; Ahne, W.; Winton, J.R.

2004-01-01

The complete RNA genome sequence of the archetype reptilian paramyxovirus, Fer-de-Lance virus (FDLV), has been determined. The genome is 15,378 nucleotides in length and consists of seven nonoverlapping genes in the order 3??? N-U-P-M-F-HN-L 5???, coding for the nucleocapsid, unknown, phospho-, matrix, fusion, hemagglutinin-neuraminidase, and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and tri-nucleotide intergenic regions similar to those of other Paramyxoviridae. The FDLV P gene expression strategy is like that of rubulaviruses, which express the accessory V protein from the primary transcript and edit a portion of the mRNA to encode P and I proteins. There is also an overlapping open reading frame potentially encoding a small basic protein in the P gene. The gene designated U (unknown), encodes a deduced protein of 19.4 kDa that has no counterpart in other paramyxoviruses and has no similarity with sequences in the National Center for Biotechnology Information database. Active transcription of the U gene in infected cells was demonstrated by Northern blot analysis, and bicistronic N-U mRNA was also evident. The genomes of two other snake paramyxovirus genotypes were also found to have U genes, with 11 to 16% nucleotide divergence from the FDLV U gene. Pairwise comparisons of amino acid identities and phylogenetic analyses of all deduced FDLV protein sequences with homologous sequences from other Paramyxoviridae indicate that FDLV represents a new genus within the subfamily Paramyxovirinae. We suggest the name Ferlavirus for the new genus, with FDLV as the type species.
Rebelling for a Reason: Protein Structural “Outliers”

PubMed Central

Arumugam, Gandhimathi; Nair, Anu G.; Hariharaputran, Sridhar; Ramanathan, Sowdhamini

2013-01-01

Analysis of structural variation in domain superfamilies can reveal constraints in protein evolution which aids protein structure prediction and classification. Structure-based sequence alignment of distantly related proteins, organized in PASS2 database, provides clues about structurally conserved regions among different functional families. Some superfamily members show large structural differences which are functionally relevant. This paper analyses the impact of structural divergence on function for multi-member superfamilies, selected from the PASS2 superfamily alignment database. Functional annotations within superfamilies, with structural outliers or ‘rebels’, are discussed in the context of structural variations. Overall, these data reinforce the idea that functional similarities cannot be extrapolated from mere structural conservation. The implication for fold-function prediction is that the functional annotations can only be inherited with very careful consideration, especially at low sequence identities. PMID:24073209
PLAAC: a web and command-line application to identify proteins with prion-like amino acid composition.

PubMed

Lancaster, Alex K; Nutter-Upham, Andrew; Lindquist, Susan; King, Oliver D

2014-09-01

Prions are self-templating protein aggregates that stably perpetuate distinct biological states and are of keen interest to researchers in both evolutionary and biomedical science. The best understood prions are from yeast and have a prion-forming domain with strongly biased amino acid composition, most notably enriched for Q or N. PLAAC is a web application that scans protein sequences for domains with P: rion- L: ike A: mino A: cid C: omposition. Users can upload sequence files, or paste sequences directly into a textbox. PLAAC ranks the input sequences by several summary scores and allows scores along sequences to be visualized. Text output files can be downloaded for further analyses, and visualizations saved in PDF and PNG formats. http://plaac.wi.mit.edu/. The Ruby-based web framework and the command-line software (implemented in Java, with visualization routines in R) are available at http://github.com/whitehead/plaac under the MIT license. All software can be run under OS X, Windows and Unix. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A network biology approach to understanding the importance of chameleon proteins in human physiology and pathology.

PubMed

Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Marashi, Sayed-Amir

2017-02-01

Chameleon proteins are proteins which include sequences that can adopt α-helix-β-strand (HE-chameleon) or α-helix-coil (HC-chameleon) or β-strand-coil (CE-chameleon) structures to operate their crucial biological functions. In this study, using a network-based approach, we examined the chameleon proteins to give a better knowledge on these proteins. We focused on proteins with identical chameleon sequences with more than or equal to seven residues long in different PDB entries, which adopt HE-chameleon, HC-chameleon, and CE-chameleon structures in the same protein. One hundred and ninety-one human chameleon proteins were identified via our in-house program. Then, protein-protein interaction (PPI) networks, Gene ontology (GO) enrichment, disease network, and pathway enrichment analyses were performed for our derived data set. We discovered that there are chameleon sequences which reside in protein-protein interaction regions between two proteins critical for their dual function. Analysis of the PPI networks for chameleon proteins introduced five hub proteins, namely TP53, EGFR, HSP90AA1, PPARA, and HIF1A, which were presented in four PPI clusters. The outcomes demonstrate that the chameleon regions are in critical domains of these proteins and are important in the development and treatment of human cancers. The present report is the first network-based functional study of chameleon proteins using computational approaches and might provide a new perspective for understanding the mechanisms of diseases helping us in developing new medical therapies along with discovering new proteins with chameleon properties which are highly important in cancer.

[Sequencing and analysis of complete genome of rabies viruses isolated from Chinese Ferret-Badger and dog in Zhejiang province].

PubMed

Lei, Yong-Liang; Wang, Xiao-Guang; Tao, Xiao-Yan; Li, Hao; Meng, Sheng-Li; Chen, Xiu-Ying; Liu, Fu-Ming; Ye, Bi-Feng; Tang, Qing

2010-01-01

Based on sequencing the full-length genomes of four Chinese Ferret-Badger and dog, we analyze the properties of rabies viruses genetic variation in molecular level, get the information about rabies viruses prevalence and variation in Zhejiang, and enrich the genome database of rabies viruses street strains isolated from China. Rabies viruses in suckling mice were isolated, overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses from Chinese Ferret-Badger, dog, sika deer, vole, used vaccine strain were determined. The four full-length genomes were sequenced completely and had the same genetic structure with the length of 11, 923 nts or 11, 925 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions(IGRs), 423 nts-Pseudogene-like sequence (psi), 70 nts-Trailer. The four full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by BLAST and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the four full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so the nucleotide mutations happened in these four genomes were most synonymous mutations. Compared with the reference rabies viruses, the lengths of the five protein coding regions had no change, no recombination, only with a few point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the four genomes were similar to the reference vaccine or street strains. And the four strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessed the distinct district characteristics of China. Therefore, these four rabies viruses are likely to be street viruses already existing in the natural world.
The chordate proteome history database.

PubMed

Levasseur, Anthony; Paganini, Julien; Dainat, Jacques; Thompson, Julie D; Poch, Olivier; Pontarotti, Pierre; Gouret, Philippe

2012-01-01

The chordate proteome history database (http://ioda.univ-provence.fr) comprises some 20,000 evolutionary analyses of proteins from chordate species. Our main objective was to characterize and study the evolutionary histories of the chordate proteome, and in particular to detect genomic events and automatic functional searches. Firstly, phylogenetic analyses based on high quality multiple sequence alignments and a robust phylogenetic pipeline were performed for the whole protein and for each individual domain. Novel approaches were developed to identify orthologs/paralogs, and predict gene duplication/gain/loss events and the occurrence of new protein architectures (domain gains, losses and shuffling). These important genetic events were localized on the phylogenetic trees and on the genomic sequence. Secondly, the phylogenetic trees were enhanced by the creation of phylogroups, whereby groups of orthologous sequences created using OrthoMCL were corrected based on the phylogenetic trees; gene family size and gene gain/loss in a given lineage could be deduced from the phylogroups. For each ortholog group obtained from the phylogenetic or the phylogroup analysis, functional information and expression data can be retrieved. Database searches can be performed easily using biological objects: protein identifier, keyword or domain, but can also be based on events, eg, domain exchange events can be retrieved. To our knowledge, this is the first database that links group clustering, phylogeny and automatic functional searches along with the detection of important events occurring during genome evolution, such as the appearance of a new domain architecture.
Mixed heterolobosean and novel gregarine lineage genes from culture ATCC 50646: Long-branch artefacts, not lateral gene transfer, distort α-tubulin phylogeny.

PubMed

Cavalier-Smith, Thomas

2015-04-01

Contradictory and confusing results can arise if sequenced 'monoprotist' samples really contain DNA of very different species. Eukaryote-wide phylogenetic analyses using five genes from the amoeboflagellate culture ATCC 50646 previously implied it was an undescribed percolozoan related to percolatean flagellates (Stephanopogon, Percolomonas). Contrastingly, three phylogenetic analyses of 18S rRNA alone, did not place it within Percolozoa, but as an isolated deep-branching excavate. I resolve that contradiction by sequence phylogenies for all five genes individually, using up to 652 taxa. Its 18S rRNA sequence (GQ377652) is near-identical to one from stained-glass windows, somewhat more distant from one from cooling-tower water, all three related to terrestrial actinocephalid gregarines Hoplorhynchus and Pyxinia. All four protein-gene sequences (Hsp90; α-tubulin; β-tubulin; actin) are from an amoeboflagellate heterolobosean percolozoan, not especially deeply branching. Contrary to previous conclusions from trees combining protein and rRNA sequences or rDNA trees including Eozoa only, this culture does not represent a major novel deep-branching eukaryote lineage distinct from Heterolobosea, and thus lacks special significance for deep eukaryote phylogeny, though the rDNA sequence is important for gregarine phylogeny. α-Tubulin trees for over 250 eukaryotes refute earlier suggestions of lateral gene transfer within eukaryotes, being largely congruent with morphology and other gene trees. Copyright © 2015. Published by Elsevier GmbH.
In Silico Analysis of Correlations between Protein Disorder and Post-Translational Modifications in Algae

PubMed Central

Kurotani, Atsushi; Sakurai, Tetsuya

2015-01-01

Recent proteome analyses have reported that intrinsically disordered regions (IDRs) of proteins play important roles in biological processes. In higher plants whose genomes have been sequenced, the correlation between IDRs and post-translational modifications (PTMs) has been reported. The genomes of various eukaryotic algae as common ancestors of plants have also been sequenced. However, no analysis of the relationship to protein properties such as structure and PTMs in algae has been reported. Here, we describe correlations between IDR content and the number of PTM sites for phosphorylation, glycosylation, and ubiquitination, and between IDR content and regions rich in proline, glutamic acid, serine, and threonine (PEST) and transmembrane helices in the sequences of 20 algae proteomes. Phosphorylation, O-glycosylation, ubiquitination, and PEST preferentially occurred in disordered regions. In contrast, transmembrane helices were favored in ordered regions. N-glycosylation tended to occur in ordered regions in most of the studied algae; however, it correlated positively with disordered protein content in diatoms. Additionally, we observed that disordered protein content and the number of PTM sites were significantly increased in the species-specific protein clusters compared to common protein clusters among the algae. Moreover, there were specific relationships between IDRs and PTMs among the algae from different groups. PMID:26307970
In Silico Analysis of Correlations between Protein Disorder and Post-Translational Modifications in Algae.

PubMed

Kurotani, Atsushi; Sakurai, Tetsuya

2015-08-20

Recent proteome analyses have reported that intrinsically disordered regions (IDRs) of proteins play important roles in biological processes. In higher plants whose genomes have been sequenced, the correlation between IDRs and post-translational modifications (PTMs) has been reported. The genomes of various eukaryotic algae as common ancestors of plants have also been sequenced. However, no analysis of the relationship to protein properties such as structure and PTMs in algae has been reported. Here, we describe correlations between IDR content and the number of PTM sites for phosphorylation, glycosylation, and ubiquitination, and between IDR content and regions rich in proline, glutamic acid, serine, and threonine (PEST) and transmembrane helices in the sequences of 20 algae proteomes. Phosphorylation, O-glycosylation, ubiquitination, and PEST preferentially occurred in disordered regions. In contrast, transmembrane helices were favored in ordered regions. N-glycosylation tended to occur in ordered regions in most of the studied algae; however, it correlated positively with disordered protein content in diatoms. Additionally, we observed that disordered protein content and the number of PTM sites were significantly increased in the species-specific protein clusters compared to common protein clusters among the algae. Moreover, there were specific relationships between IDRs and PTMs among the algae from different groups.
Developmental Regulation of Genes Encoding Universal Stress Proteins in Schistosoma mansoni

PubMed Central

Isokpehi, Raphael D.; Mahmud, Ousman; Mbah, Andreas N.; Simmons, Shaneka S.; Avelar, Lívia; Rajnarayanan, Rajendram V.; Udensi, Udensi K.; Ayensu, Wellington K.; Cohly, Hari H.; Brown, Shyretha D.; Dates, Centdrika R.; Hentz, Sonya D.; Hughes, Shawntae J.; Smith-McInnis, Dominique R.; Patterson, Carvey O.; Sims, Jennifer N.; Turner, Kelisha T.; Williams, Baraka S.; Johnson, Matilda O.; Adubi, Taiwo; Mbuh, Judith V.; Anumudu, Chiaka I.; Adeoye, Grace O.; Thomas, Bolaji N.; Nashiru, Oyekanmi; Oliveira, Guilherme

2011-01-01

The draft nuclear genome sequence of the snail-transmitted, dimorphic, parasitic, platyhelminth Schistosoma mansoni revealed eight genes encoding proteins that contain the Universal Stress Protein (USP) domain. Schistosoma mansoni is a causative agent of human schistosomiasis, a severe and debilitating Neglected Tropical Disease (NTD) of poverty, which is endemic in at least 76 countries. The availability of the genome sequences of Schistosoma species presents opportunities for bioinformatics and genomics analyses of associated gene families that could be targets for understanding schistosomiasis ecology, intervention, prevention and control. Proteins with the USP domain are known to provide bacteria, archaea, fungi, protists and plants with the ability to respond to diverse environmental stresses. In this research investigation, the functional annotations of the USP genes and predicted nucleotide and protein sequences were initially verified. Subsequently, sequence clusters and distinctive features of the sequences were determined. A total of twelve ligand binding sites were predicted based on alignment to the ATP-binding universal stress protein from Methanocaldococcus jannaschii. In addition, six USP sequences showed the presence of ATP-binding motif residues indicating that they may be regulated by ATP. Public domain gene expression data and RT-PCR assays confirmed that all the S. mansoni USP genes were transcribed in at least one of the developmental life cycle stages of the helminth. Six of these genes were up-regulated in the miracidium, a free-swimming stage that is critical for transmission to the snail intermediate host. It is possible that during the intra-snail stages, S. mansoni gene transcripts for universal stress proteins are low abundant and are induced to perform specialized functions triggered by environmental stressors such as oxidative stress due to hydrogen peroxide that is present in the snail hemocytes. This report serves to catalyze the formation of a network of researchers to understand the function and regulation of the universal stress proteins encoded in genomes of schistosomes and their snail intermediate hosts. PMID:22084571
Predicting the binding preference of transcription factors to individual DNA k-mers.

PubMed

Alleyne, Trevis M; Peña-Castillo, Lourdes; Badis, Gwenael; Talukder, Shaheynoor; Berger, Michael F; Gehrke, Andrew R; Philippakis, Anthony A; Bulyk, Martha L; Morris, Quaid D; Hughes, Timothy R

2009-04-15

Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA-protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members. We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF-DNA recognition, and suggest a rational approach for future analyses of TF families.
ProteinSeq: High-Performance Proteomic Analyses by Proximity Ligation and Next Generation Sequencing

PubMed Central

Vänelid, Johan; Siegbahn, Agneta; Ericsson, Olle; Fredriksson, Simon; Bäcklin, Christofer; Gut, Marta; Heath, Simon; Gut, Ivo Glynne; Wallentin, Lars; Gustafsson, Mats G.; Kamali-Moghaddam, Masood; Landegren, Ulf

2011-01-01

Despite intense interest, methods that provide enhanced sensitivity and specificity in parallel measurements of candidate protein biomarkers in numerous samples have been lacking. We present herein a multiplex proximity ligation assay with readout via realtime PCR or DNA sequencing (ProteinSeq). We demonstrate improved sensitivity over conventional sandwich assays for simultaneous analysis of sets of 35 proteins in 5 µl of blood plasma. Importantly, we observe a minimal tendency to increased background with multiplexing, compared to a sandwich assay, suggesting that higher levels of multiplexing are possible. We used ProteinSeq to analyze proteins in plasma samples from cardiovascular disease (CVD) patient cohorts and matched controls. Three proteins, namely P-selectin, Cystatin-B and Kallikrein-6, were identified as putative diagnostic biomarkers for CVD. The latter two have not been previously reported in the literature and their potential roles must be validated in larger patient cohorts. We conclude that ProteinSeq is promising for screening large numbers of proteins and samples while the technology can provide a much-needed platform for validation of diagnostic markers in biobank samples and in clinical use. PMID:21980495
Phenotype–genotype correlation in Hirschsprung disease is illuminated by comparative analysis of the RET protein sequence

PubMed Central

Kashuk, Carl S.; Stone, Eric A.; Grice, Elizabeth A.; Portnoy, Matthew E.; Green, Eric D.; Sidow, Arend; Chakravarti, Aravinda; McCallion, Andrew S.

2005-01-01

The ability to discriminate between deleterious and neutral amino acid substitutions in the genes of patients remains a significant challenge in human genetics. The increasing availability of genomic sequence data from multiple vertebrate species allows inclusion of sequence conservation and physicochemical properties of residues to be used for functional prediction. In this study, the RET receptor tyrosine kinase serves as a model disease gene in which a broad spectrum (≥116) of disease-associated mutations has been identified among patients with Hirschsprung disease and multiple endocrine neoplasia type 2. We report the alignment of the human RET protein sequence with the orthologous sequences of 12 non-human vertebrates (eight mammalian, one avian, and three teleost species), their comparative analysis, the evolutionary topology of the RET protein, and predicted tolerance for all published missense mutations. We show that, although evolutionary conservation alone provides significant information to predict the effect of a RET mutation, a model that combines comparative sequence data with analysis of physiochemical properties in a quantitative framework provides far greater accuracy. Although the ability to discern the impact of a mutation is imperfect, our analyses permit substantial discrimination between predicted functional classes of RET mutations and disease severity even for a multigenic disease such as Hirschsprung disease. PMID:15956201
Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

PubMed Central

2012-01-01

Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods. Conclusions Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems. PMID:22643026
Single-cell sequencing provides clues about the host interactions of segmented filamentous bacteria (SFB)

PubMed Central

Pamp, Sünje J.; Harrington, Eoghan D.; Quake, Stephen R.; Relman, David A.; Blainey, Paul C.

2012-01-01

Segmented filamentous bacteria (SFB) are host-specific intestinal symbionts that comprise a distinct clade within the Clostridiaceae, designated Candidatus Arthromitus. SFB display a unique life cycle within the host, involving differentiation into multiple cell types. The latter include filaments that attach intimately to intestinal epithelial cells, and from which “holdfasts” and spores develop. SFB induce a multifaceted immune response, leading to host protection from intestinal pathogens. Cultivation resistance has hindered characterization of these enigmatic bacteria. In the present study, we isolated five SFB filaments from a mouse using a microfluidic device equipped with laser tweezers, generated genome sequences from each, and compared these sequences with each other, as well as to recently published SFB genome sequences. Based on the resulting analyses, SFB appear to be dependent on the host for a variety of essential nutrients. SFB have a relatively high abundance of predicted proteins devoted to cell cycle control and to envelope biogenesis, and have a group of SFB-specific autolysins and a dynamin-like protein. Among the five filament genomes, an average of 8.6% of predicted proteins were novel, including a family of secreted SFB-specific proteins. Four ADP-ribosyltransferase (ADPRT) sequence types, and a myosin-cross-reactive antigen (MCRA) protein were discovered; we hypothesize that they are involved in modulation of host responses. The presence of polymorphisms among mouse SFB genomes suggests the evolution of distinct SFB lineages. Overall, our results reveal several aspects of SFB adaptation to the mammalian intestinal tract. PMID:22434425
Exploring the Genomic Roadmap and Molecular Phylogenetics Associated with MODY Cascades Using Computational Biology.

PubMed

Chakraborty, Chiranjib; Bandyopadhyay, Sanghamitra; Doss, C George Priya; Agoramoorthy, Govindasamy

2015-04-01

Maturity onset diabetes of the young (MODY) is a metabolic and genetic disorder. It is different from type 1 and type 2 diabetes with low occurrence level (1-2%) among all diabetes. This disorder is a consequence of β-cell dysfunction. Till date, 11 subtypes of MODY have been identified, and all of them can cause gene mutations. However, very little is known about the gene mapping, molecular phylogenetics, and co-expression among MODY genes and networking between cascades. This study has used latest servers and software such as VarioWatch, ClustalW, MUSCLE, G Blocks, Phylogeny.fr, iTOL, WebLogo, STRING, and KEGG PATHWAY to perform comprehensive analyses of gene mapping, multiple sequences alignment, molecular phylogenetics, protein-protein network design, co-expression analysis of MODY genes, and pathway development. The MODY genes are located in chromosomes-2, 7, 8, 9, 11, 12, 13, 17, and 20. Highly aligned block shows Pro, Gly, Leu, Arg, and Pro residues are highly aligned in the positions of 296, 386, 437, 455, 456 and 598, respectively. Alignment scores inform us that HNF1A and HNF1B proteins have shown high sequence similarity among MODY proteins. Protein-protein network design shows that HNF1A, HNF1B, HNF4A, NEUROD1, PDX1, PAX4, INS, and GCK are strongly connected, and the co-expression analyses between MODY genes also show distinct association between HNF1A and HNF4A genes. This study has used latest tools of bioinformatics to develop a rapid method to assess the evolutionary relationship, the network development, and the associations among eleven MODY genes and cascades. The prediction of sequence conservation, molecular phylogenetics, protein-protein network and the association between the MODY cascades enhances opportunities to get more insights into the less-known MODY disease.
Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome

PubMed Central

Margulies, Elliott H.; Cooper, Gregory M.; Asimenos, George; Thomas, Daryl J.; Dewey, Colin N.; Siepel, Adam; Birney, Ewan; Keefe, Damian; Schwartz, Ariel S.; Hou, Minmei; Taylor, James; Nikolaev, Sergey; Montoya-Burgos, Juan I.; Löytynoja, Ari; Whelan, Simon; Pardi, Fabio; Massingham, Tim; Brown, James B.; Bickel, Peter; Holmes, Ian; Mullikin, James C.; Ureta-Vidal, Abel; Paten, Benedict; Stone, Eric A.; Rosenbloom, Kate R.; Kent, W. James; Bouffard, Gerard G.; Guan, Xiaobin; Hansen, Nancy F.; Idol, Jacquelyn R.; Maduro, Valerie V.B.; Maskeri, Baishali; McDowell, Jennifer C.; Park, Morgan; Thomas, Pamela J.; Young, Alice C.; Blakesley, Robert W.; Muzny, Donna M.; Sodergren, Erica; Wheeler, David A.; Worley, Kim C.; Jiang, Huaiyang; Weinstock, George M.; Gibbs, Richard A.; Graves, Tina; Fulton, Robert; Mardis, Elaine R.; Wilson, Richard K.; Clamp, Michele; Cuff, James; Gnerre, Sante; Jaffe, David B.; Chang, Jean L.; Lindblad-Toh, Kerstin; Lander, Eric S.; Hinrichs, Angie; Trumbower, Heather; Clawson, Hiram; Zweig, Ann; Kuhn, Robert M.; Barber, Galt; Harte, Rachel; Karolchik, Donna; Field, Matthew A.; Moore, Richard A.; Matthewson, Carrie A.; Schein, Jacqueline E.; Marra, Marco A.; Antonarakis, Stylianos E.; Batzoglou, Serafim; Goldman, Nick; Hardison, Ross; Haussler, David; Miller, Webb; Pachter, Lior; Green, Eric D.; Sidow, Arend

2007-01-01

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization. PMID:17567995
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

PubMed

Wan, Shixiang; Zou, Quan

2017-01-01

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Engineering A-kinase Anchoring Protein (AKAP)-selective Regulatory Subunits of Protein Kinase A (PKA) through Structure-based Phage Selection*

PubMed Central

Gold, Matthew G.; Fowler, Douglas M.; Means, Christopher K.; Pawson, Catherine T.; Stephany, Jason J.; Langeberg, Lorene K.; Fields, Stanley; Scott, John D.

2013-01-01

PKA is retained within distinct subcellular environments by the association of its regulatory type II (RII) subunits with A-kinase anchoring proteins (AKAPs). Conventional reagents that universally disrupt PKA anchoring are patterned after a conserved AKAP motif. We introduce a phage selection procedure that exploits high-resolution structural information to engineer RII mutants that are selective for a particular AKAP. Selective RII (RSelect) sequences were obtained for eight AKAPs following competitive selection screening. Biochemical and cell-based experiments validated the efficacy of RSelect proteins for AKAP2 and AKAP18. These engineered proteins represent a new class of reagents that can be used to dissect the contributions of different AKAP-targeted pools of PKA. Molecular modeling and high-throughput sequencing analyses revealed the molecular basis of AKAP-selective interactions and shed new light on native RII-AKAP interactions. We propose that this structure-directed evolution strategy might be generally applicable for the investigation of other protein interaction surfaces. PMID:23625929
Genes encoding calmodulin-binding proteins in the Arabidopsis genome

NASA Technical Reports Server (NTRS)

Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.

2002-01-01

Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.
Distribution of cold adaptation proteins in microbial mats in Lake Joyce, Antarctica: Analysis of metagenomic data by using two bioinformatics tools.

PubMed

Koo, Hyunmin; Hakim, Joseph A; Fisher, Phillip R E; Grueneberg, Alexander; Andersen, Dale T; Bej, Asim K

2016-01-01

In this study, we report the distribution and abundance of cold-adaptation proteins in microbial mat communities in the perennially ice-covered Lake Joyce, located in the McMurdo Dry Valleys, Antarctica. We have used MG-RAST and R code bioinformatics tools on Illumina HiSeq2000 shotgun metagenomic data and compared the filtering efficacy of these two methods on cold-adaptation proteins. Overall, the abundance of cold-shock DEAD-box protein A (CSDA), antifreeze proteins (AFPs), fatty acid desaturase (FAD), trehalose synthase (TS), and cold-shock family of proteins (CSPs) were present in all mat samples at high, moderate, or low levels, whereas the ice nucleation protein (INP) was present only in the ice and bulbous mat samples at insignificant levels. Considering the near homogeneous temperature profile of Lake Joyce (0.08-0.29 °C), the distribution and abundance of these proteins across various mat samples predictively correlated with known functional attributes necessary for microbial communities to thrive in this ecosystem. The comparison of the MG-RAST and the R code methods showed dissimilar occurrences of the cold-adaptation protein sequences, though with insignificant ANOSIM (R = 0.357; p-value = 0.012), ADONIS (R(2) = 0.274; p-value = 0.03) and STAMP (p-values = 0.521-0.984) statistical analyses. Furthermore, filtering targeted sequences using the R code accounted for taxonomic groups by avoiding sequence redundancies, whereas the MG-RAST provided total counts resulting in a higher sequence output. The results from this study revealed for the first time the distribution of cold-adaptation proteins in six different types of microbial mats in Lake Joyce, while suggesting a simpler and more manageable user-defined method of R code, as compared to a web-based MG-RAST pipeline.
Biomolecular characterization and protein sequences of the Campanian hadrosaur B. canadensis.

PubMed

Schweitzer, Mary H; Zheng, Wenxia; Organ, Chris L; Avci, Recep; Suo, Zhiyong; Freimark, Lisa M; Lebleu, Valerie S; Duncan, Michael B; Vander Heiden, Matthew G; Neveu, John M; Lane, William S; Cottrell, John S; Horner, John R; Cantley, Lewis C; Kalluri, Raghu; Asara, John M

2009-05-01

Molecular preservation in non-avian dinosaurs is controversial. We present multiple lines of evidence that endogenous proteinaceous material is preserved in bone fragments and soft tissues from an 80-million-year-old Campanian hadrosaur, Brachylophosaurus canadensis [Museum of the Rockies (MOR) 2598]. Microstructural and immunological data are consistent with preservation of multiple bone matrix and vessel proteins, and phylogenetic analyses of Brachylophosaurus collagen sequenced by mass spectrometry robustly support the bird-dinosaur clade, consistent with an endogenous source for these collagen peptides. These data complement earlier results from Tyrannosaurus rex (MOR 1125) and confirm that molecular preservation in Cretaceous dinosaurs is not a unique event.
From algae to angiosperms–inferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes

PubMed Central

2014-01-01

Background Next-generation sequencing has provided a wealth of plastid genome sequence data from an increasingly diverse set of green plants (Viridiplantae). Although these data have helped resolve the phylogeny of numerous clades (e.g., green algae, angiosperms, and gymnosperms), their utility for inferring relationships across all green plants is uncertain. Viridiplantae originated 700-1500 million years ago and may comprise as many as 500,000 species. This clade represents a major source of photosynthetic carbon and contains an immense diversity of life forms, including some of the smallest and largest eukaryotes. Here we explore the limits and challenges of inferring a comprehensive green plant phylogeny from available complete or nearly complete plastid genome sequence data. Results We assembled protein-coding sequence data for 78 genes from 360 diverse green plant taxa with complete or nearly complete plastid genome sequences available from GenBank. Phylogenetic analyses of the plastid data recovered well-supported backbone relationships and strong support for relationships that were not observed in previous analyses of major subclades within Viridiplantae. However, there also is evidence of systematic error in some analyses. In several instances we obtained strongly supported but conflicting topologies from analyses of nucleotides versus amino acid characters, and the considerable variation in GC content among lineages and within single genomes affected the phylogenetic placement of several taxa. Conclusions Analyses of the plastid sequence data recovered a strongly supported framework of relationships for green plants. This framework includes: i) the placement of Zygnematophyceace as sister to land plants (Embryophyta), ii) a clade of extant gymnosperms (Acrogymnospermae) with cycads + Ginkgo sister to remaining extant gymnosperms and with gnetophytes (Gnetophyta) sister to non-Pinaceae conifers (Gnecup trees), and iii) within the monilophyte clade (Monilophyta), Equisetales + Psilotales are sister to Marattiales + leptosporangiate ferns. Our analyses also highlight the challenges of using plastid genome sequences in deep-level phylogenomic analyses, and we provide suggestions for future analyses that will likely incorporate plastid genome sequence data for thousands of species. We particularly emphasize the importance of exploring the effects of different partitioning and character coding strategies. PMID:24533922
Genetic analysis of duck circovirus in Pekin ducks from South Korea.

PubMed

Cha, S-Y; Kang, M; Cho, J-G; Jang, H-K

2013-11-01

The genetic organization of the 24 duck circovirus (DuCV) strains detected in commercial Pekin ducks from South Korea between 2011 and 2012 is described in this study. Multiple sequence alignment and phylogenetic analyses were performed on the 24 viral genome sequences as well as on 45 genome sequences available from the GenBank database. Phylogenetic analyses based on the genomic and open reading frame 2/cap sequences demonstrated that all DuCV strains belonged to genotype 1 and were designated in a subcluster under genotype 1. Analysis of the capsid protein amino acid sequences of the 24 Korean DuCV strains showed 10 substitutions compared with that of other genotype 1 strains. Our analysis showed that genotype 1 is predominant and circulating in South Korea. These present results serve as incentive to add more data to the DuCV database and provide insight to conduct further intensive study on the geographic relationships among these virus strains.

First complete genome sequence of vanilla mosaic strain of Dasheen mosaic virus isolated from the Cook Islands.

PubMed

Puli'uvea, Christopher; Khan, Subuhi; Chang, Wee-Leong; Valmonte, Gardette; Pearson, Michael N; Higgins, Colleen M

2017-02-01

We present the first complete genome of vanilla mosaic virus (VanMV). The VanMV genomic structure is consistent with that of a potyvirus, containing a single open reading frame (ORF) encoding a polyprotein of 3139 amino acids. Motif analyses indicate the polyprotein can be cleaved into the expected ten individual proteins; other recognised potyvirus motifs are also present. As expected, the VanMV genome shows high sequence similarity to the published Dasheen mosaic virus (DsMV) genome sequences; comparisons with DsMV continue to support VanMV as a vanilla infecting strain of DsMV. Phylogenetic analyses indicate that VanMV and DsMV share a common ancestor, with VanMV having the closest relationship with DsMV strains from the South Pacific.
Confirmation of translatability and functionality certifies the dual endothelin1/VEGFsp receptor (DEspR) protein.

PubMed

Herrera, Victoria L M; Steffen, Martin; Moran, Ann Marie; Tan, Glaiza A; Pasion, Khristine A; Rivera, Keith; Pappin, Darryl J; Ruiz-Opazo, Nelson

2016-06-14

In contrast to rat and mouse databases, the NCBI gene database lists the human dual-endothelin1/VEGFsp receptor (DEspR, formerly Dear) as a unitary transcribed pseudogene due to a stop [TGA]-codon at codon#14 in automated DNA and RNA sequences. However, re-analysis is needed given prior single gene studies detected a tryptophan [TGG]-codon#14 by manual Sanger sequencing, demonstrated DEspR translatability and functionality, and since the demonstration of actual non-translatability through expression studies, the standard-of-excellence for pseudogene designation, has not been performed. Re-analysis must meet UNIPROT criteria for demonstration of a protein's existence at the highest (protein) level, which a priori, would override DNA- or RNA-based deductions. To dissect the nucleotide sequence discrepancy, we performed Maxam-Gilbert sequencing and reviewed 727 RNA-seq entries. To comply with the highest level multiple UNIPROT criteria for determining DEspR's existence, we performed various experiments using multiple anti-DEspR monoclonal antibodies (mAbs) targeting distinct DEspR epitopes with one spanning the contested tryptophan [TGG]-codon#14, assessing: (a) DEspR protein expression, (b) predicted full-length protein size, (c) sequence-predicted protein-specific properties beyond codon#14: receptor glycosylation and internalization, (d) protein-partner interactions, and (e) DEspR functionality via DEspR-inhibition effects. Maxam-Gilbert sequencing and some RNA-seq entries demonstrate two guanines, hence a tryptophan [TGG]-codon#14 within a compression site spanning an error-prone compression sequence motif. Western blot analysis using anti-DEspR mAbs targeting distinct DEspR epitopes detect the identical glycosylated 17.5 kDa pull-down protein. Decrease in DEspR-protein size after PNGase-F digest demonstrates post-translational glycosylation, concordant with the consensus-glycosylation site beyond codon#14. Like other small single-transmembrane proteins, mass spectrometry analysis of anti-DEspR mAb pull-down proteins do not detect DEspR, but detect DEspR-protein interactions with proteins implicated in intracellular trafficking and cancer. FACS analyses also detect DEspR-protein in different human cancer stem-like cells (CSCs). DEspR-inhibition studies identify DEspR-roles in CSC survival and growth. Live cell imaging detects fluorescently-labeled anti-DEspR mAb targeted-receptor internalization, concordant with the single internalization-recognition sequence also located beyond codon#14. Data confirm translatability of DEspR, the full-length DEspR protein beyond codon#14, and elucidate DEspR-specific functionality. Along with detection of the tryptophan [TGG]-codon#14 within an error-prone compression site, cumulative data demonstrating DEspR protein existence fulfill multiple UNIPROT criteria, thus refuting its pseudogene designation.
Identification and sequence analyses of novel lipase encoding novel thermophillic bacilli isolated from Armenian geothermal springs.

PubMed

Shahinyan, Grigor; Margaryan, Armine; Panosyan, Hovik; Trchounian, Armen

2017-05-02

Among the huge diversity of thermophilic bacteria mainly bacilli have been reported as active thermostable lipase producers. Geothermal springs serve as the main source for isolation of thermostable lipase producing bacilli. Thermostable lipolytic enzymes, functioning in the harsh conditions, have promising applications in processing of organic chemicals, detergent formulation, synthesis of biosurfactants, pharmaceutical processing etc. In order to study the distribution of lipase-producing thermophilic bacilli and their specific lipase protein primary structures, three lipase producers from different genera were isolated from mesothermal (27.5-70 °C) springs distributed on the territory of Armenia and Nagorno Karabakh. Based on phenotypic characteristics and 16S rRNA gene sequencing the isolates were identified as Geobacillus sp., Bacillus licheniformis and Anoxibacillus flavithermus strains. The lipase genes of isolates were sequenced by using initially designed primer sets. Multiple alignments generated from primary structures of the lipase proteins and annotated lipase protein sequences, conserved regions analysis and amino acid composition have illustrated the similarity (98-99%) of the lipases with true lipases (family I) and GDSL esterase family (family II). A conserved sequence block that determines the thermostability has been identified in the multiple alignments of the lipase proteins. The results are spreading light on the lipase producing bacilli distribution in geothermal springs in Armenia and Nagorno Karabakh. Newly isolated bacilli strains could be prospective source for thermostable lipases and their genes.
Functional Genomics Analysis of Singapore Grouper Iridovirus: Complete Sequence Determination and Proteomic Analysis

PubMed Central

Song, Wen Jun; Qin, Qi Wei; Qiu, Jin; Huang, Can Hua; Wang, Fan; Hew, Choy Leong

2004-01-01

Here we report the complete genome sequence of Singapore grouper iridovirus (SGIV). Sequencing of the random shotgun and restriction endonuclease genomic libraries showed that the entire SGIV genome consists of 140,131 nucleotide bp. One hundred sixty-two open reading frames (ORFs) from the sense and antisense DNA strands, coding for lengths varying from 41 to 1,268 amino acids, were identified. Computer-assisted analyses of the deduced amino acid sequences revealed that 77 of the ORFs exhibited homologies to known virus genes, 23 of which matched functional iridovirus proteins. Forty-two putative conserved domains or signatures were detected in the National Center for Biotechnology Information CD-Search database and PROSITE database. An assortment of enzyme activities involved in DNA replication, transcription, nucleotide metabolism, cell signaling, etc., were identified. Viruses were cultured on a cell line derived from the embryonated egg of the grouper Epinephelus tauvina, isolated, and purified by sucrose gradient ultracentrifugation. The protein extract from the purified virions was analyzed by polyacrylamide gel electrophoresis followed by in-gel digestion of protein bands. Matrix-assisted laser desorption ionization-time of flight mass spectrometry and database searching led to identification of 26 proteins. Twenty of these represented novel or previously unidentified genes, which were further confirmed by reverse transcription-PCR (RT-PCR) and DNA sequencing of their respective RT-PCR products. PMID:15507645
The genomic sequence of ectromelia virus, the causative agent of mousepox.

PubMed

Chen, Nanhai; Danila, Maria I; Feng, Zehua; Buller, R Mark L; Wang, Chunlin; Han, Xiaosi; Lefkowitz, Elliot J; Upton, Chris

2003-12-05

Ectromelia virus is the causative agent of mousepox, an acute exanthematous disease of mouse colonies in Europe, Japan, China, and the U.S. The Moscow, Hampstead, and NIH79 strains are the most thoroughly studied with the Moscow strain being the most infectious and virulent for the mouse. In the late 1940s mousepox was proposed as a model for the study of the pathogenesis of smallpox and generalized vaccinia in humans. Studies in the last five decades from a succession of investigators have resulted in a detailed description of the virologic and pathologic disease course in genetically susceptible and resistant inbred and out-bred mice. We report the DNA sequence of the left-hand end, the predicted right-hand terminal repeat, and central regions of the genome of the Moscow strain of ectromelia virus (approximately 177,500 bp), which together with the previously sequenced right-hand end, yields a genome of 209,771 bp. We identified 175 potential genes specifying proteins of between 53 and 1924 amino acids, and 29 regions containing sequences related to genes predicted in other poxviruses, but unlikely to encode for functional proteins in ectromelia virus. The translated protein sequences were compared with the protein database for structure/function relationships, and these analyses were used to investigate poxvirus evolution and to attempt to explain at the cellular and molecular level the well-characterized features of the ectromelia virus natural life cycle.
Computational analysis of protein-protein interfaces involving an alpha helix: insights for terphenyl-like molecules binding.

PubMed

Isvoran, Adriana; Craciun, Dana; Martiny, Virginie; Sperandio, Olivier; Miteva, Maria A

2013-06-14

Protein-Protein Interactions (PPIs) are key for many cellular processes. The characterization of PPI interfaces and the prediction of putative ligand binding sites and hot spot residues are essential to design efficient small-molecule modulators of PPI. Terphenyl and its derivatives are small organic molecules known to mimic one face of protein-binding alpha-helical peptides. In this work we focus on several PPIs mediated by alpha-helical peptides. We performed computational sequence- and structure-based analyses in order to evaluate several key physicochemical and surface properties of proteins known to interact with alpha-helical peptides and/or terphenyl and its derivatives. Sequence-based analysis revealed low sequence identity between some of the analyzed proteins binding alpha-helical peptides. Structure-based analysis was performed to calculate the volume, the fractal dimension roughness and the hydrophobicity of the binding regions. Besides the overall hydrophobic character of the binding pockets, some specificities were detected. We showed that the hydrophobicity is not uniformly distributed in different alpha-helix binding pockets that can help to identify key hydrophobic hot spots. The presence of hydrophobic cavities at the protein surface with a more complex shape than the entire protein surface seems to be an important property related to the ability of proteins to bind alpha-helical peptides and low molecular weight mimetics. Characterization of similarities and specificities of PPI binding sites can be helpful for further development of small molecules targeting alpha-helix binding proteins.
Transcriptome Sequencing and Developmental Regulation of Gene Expression in Anopheles aquasalis

PubMed Central

Silva, Maria C. P.; Lopes, Adriana R.; Barros, Michele S.; Sá-Nunes, Anderson; Kojin, Bianca B.; Carvalho, Eneas; Suesdek, Lincoln; Silva-Neto, Mário Alberto C.; James, Anthony A.; Capurro, Margareth L.

2014-01-01

Background Anopheles aquasalis is a major malaria vector in coastal areas of South and Central America where it breeds preferentially in brackish water. This species is very susceptible to Plasmodium vivax and it has been already incriminated as responsible vector in malaria outbreaks. There has been no high-throughput investigation into the sequencing of An. aquasalis genes, transcripts and proteins despite its epidemiological relevance. Here we describe the sequencing, assembly and annotation of the An. aquasalis transcriptome. Methodology/Principal Findings A total of 419 thousand cDNA sequence reads, encompassing 164 million nucleotides, were assembled in 7544 contigs of ≥2 sequences, and 1999 singletons. The majority of the An. aquasalis transcripts encode proteins with their closest counterparts in another neotropical malaria vector, An. darlingi. Several analyses in different protein databases were used to annotate and predict the putative functions of the deduced An. aquasalis proteins. Larval and adult-specific transcripts were represented by 121 and 424 contig sequences, respectively. Fifty-one transcripts were only detected in blood-fed females. The data also reveal a list of transcripts up- or down-regulated in adult females after a blood meal. Transcripts associated with immunity, signaling networks and blood feeding and digestion are discussed. Conclusions/Significance This study represents the first large-scale effort to sequence the transcriptome of An. aquasalis. It provides valuable information that will facilitate studies on the biology of this species and may lead to novel strategies to reduce malaria transmission on the South American continent. The An. aquasalis transcriptome is accessible at http://exon.niaid.nih.gov/transcriptome/An_aquasalis/Anaquexcel.xlsx. PMID:25033462
Influence of Molecular Resolution on Sequence-Based Discovery of Ecological Diversity among Synechococcus Populations in an Alkaline Siliceous Hot Spring Microbial Mat ▿ †

PubMed Central

Melendrez, Melanie C.; Lange, Rachel K.; Cohan, Frederick M.; Ward, David M.

2011-01-01

Previous research has shown that sequences of 16S rRNA genes and 16S-23S rRNA internal transcribed spacer regions may not have enough genetic resolution to define all ecologically distinct Synechococcus populations (ecotypes) inhabiting alkaline, siliceous hot spring microbial mats. To achieve higher molecular resolution, we studied sequence variation in three protein-encoding loci sampled by PCR from 60°C and 65°C sites in the Mushroom Spring mat (Yellowstone National Park, WY). Sequences were analyzed using the ecotype simulation (ES) and AdaptML algorithms to identify putative ecotypes. Between 4 and 14 times more putative ecotypes were predicted from variation in protein-encoding locus sequences than from variation in 16S rRNA and 16S-23S rRNA internal transcribed spacer sequences. The number of putative ecotypes predicted depended on the number of sequences sampled and the molecular resolution of the locus. Chao estimates of diversity indicated that few rare ecotypes were missed. Many ecotypes hypothesized by sequence analyses were different in their habitat specificities, suggesting different adaptations to temperature or other parameters that vary along the flow channel. PMID:21169433
MUC1-ARF-A Novel MUC1 Protein That Resides in the Nucleus and Is Expressed by Alternate Reading Frame Translation of MUC1 mRNA.

PubMed

Chalick, Michael; Jacobi, Oded; Pichinuk, Edward; Garbar, Christian; Bensussan, Armand; Meeker, Alan; Ziv, Ravit; Zehavi, Tania; Smorodinsky, Nechama I; Hilkens, John; Hanisch, Franz-Georg; Rubinstein, Daniel B; Wreschner, Daniel H

2016-01-01

Translation of mRNA in alternate reading frames (ARF) is a naturally occurring process heretofore underappreciated as a generator of protein diversity. The MUC1 gene encodes MUC1-TM, a signal-transducing trans-membrane protein highly expressed in human malignancies. Here we show that an AUG codon downstream to the MUC1-TM initiation codon initiates an alternate reading frame thereby generating a novel protein, MUC1-ARF. MUC1-ARF, like its MUC1-TM 'parent' protein, contains a tandem repeat (VNTR) domain. However, the amino acid sequence of the MUC1-ARF tandem repeat as well as N- and C- sequences flanking it differ entirely from those of MUC1-TM. In vitro protein synthesis assays and extensive immunohistochemical as well as western blot analyses with MUC1-ARF specific monoclonal antibodies confirmed MUC1-ARF expression. Rather than being expressed at the cell membrane like MUC1-TM, immunostaining showed that MUC1-ARF protein localizes mainly in the nucleus: Immunohistochemical analyses of MUC1-expressing tissues demonstrated MUC1-ARF expression in the nuclei of secretory luminal epithelial cells. MUC1-ARF expression varies in different malignancies. While the malignant epithelial cells of pancreatic cancer show limited expression, in breast cancer tissue MUC1-ARF demonstrates strong nuclear expression. Proinflammatory cytokines upregulate expression of MUC1-ARF protein and co-immunoprecipitation analyses demonstrate association of MUC1-ARF with SH3 domain-containing proteins. Mass spectrometry performed on proteins coprecipitating with MUC1-ARF demonstrated Glucose-6-phosphate 1-dehydrogenase (G6PD) and Dynamin 2 (DNM2). These studies not only reveal that the MUC1 gene generates a previously unidentified MUC1-ARF protein, they also show that just like its 'parent' MUC1-TM protein, MUC1-ARF is apparently linked to signaling and malignancy, yet a definitive link to these processes and the roles it plays awaits a precise identification of its molecular functions. Comprising at least 524 amino acids, MUC1-ARF is, furthermore, the longest ARF protein heretofore described.
MUC1-ARF—A Novel MUC1 Protein That Resides in the Nucleus and Is Expressed by Alternate Reading Frame Translation of MUC1 mRNA

PubMed Central

Pichinuk, Edward; Garbar, Christian; Bensussan, Armand; Meeker, Alan; Ziv, Ravit; Zehavi, Tania; Smorodinsky, Nechama I.; Hilkens, John; Hanisch, Franz-Georg; Rubinstein, Daniel B.; Wreschner, Daniel H.

2016-01-01

Translation of mRNA in alternate reading frames (ARF) is a naturally occurring process heretofore underappreciated as a generator of protein diversity. The MUC1 gene encodes MUC1-TM, a signal-transducing trans-membrane protein highly expressed in human malignancies. Here we show that an AUG codon downstream to the MUC1-TM initiation codon initiates an alternate reading frame thereby generating a novel protein, MUC1-ARF. MUC1-ARF, like its MUC1-TM 'parent’ protein, contains a tandem repeat (VNTR) domain. However, the amino acid sequence of the MUC1-ARF tandem repeat as well as N- and C- sequences flanking it differ entirely from those of MUC1-TM. In vitro protein synthesis assays and extensive immunohistochemical as well as western blot analyses with MUC1-ARF specific monoclonal antibodies confirmed MUC1-ARF expression. Rather than being expressed at the cell membrane like MUC1-TM, immunostaining showed that MUC1-ARF protein localizes mainly in the nucleus: Immunohistochemical analyses of MUC1-expressing tissues demonstrated MUC1-ARF expression in the nuclei of secretory luminal epithelial cells. MUC1-ARF expression varies in different malignancies. While the malignant epithelial cells of pancreatic cancer show limited expression, in breast cancer tissue MUC1-ARF demonstrates strong nuclear expression. Proinflammatory cytokines upregulate expression of MUC1-ARF protein and co-immunoprecipitation analyses demonstrate association of MUC1-ARF with SH3 domain-containing proteins. Mass spectrometry performed on proteins coprecipitating with MUC1-ARF demonstrated Glucose-6-phosphate 1-dehydrogenase (G6PD) and Dynamin 2 (DNM2). These studies not only reveal that the MUC1 gene generates a previously unidentified MUC1-ARF protein, they also show that just like its ‘parent’ MUC1-TM protein, MUC1-ARF is apparently linked to signaling and malignancy, yet a definitive link to these processes and the roles it plays awaits a precise identification of its molecular functions. Comprising at least 524 amino acids, MUC1-ARF is, furthermore, the longest ARF protein heretofore described. PMID:27768738
Structural studies of the Sputnik virophage.

PubMed

Sun, Siyang; La Scola, Bernard; Bowman, Valorie D; Ryan, Christopher M; Whitelegge, Julian P; Raoult, Didier; Rossmann, Michael G

2010-01-01

The virophage Sputnik is a satellite virus of the giant mimivirus and is the only satellite virus reported to date whose propagation adversely affects its host virus' production. Genome sequence analysis showed that Sputnik has genes related to viruses infecting all three domains of life. Here, we report structural studies of Sputnik, which show that it is about 740 A in diameter, has a T=27 icosahedral capsid, and has a lipid membrane inside the protein shell. Structural analyses suggest that the major capsid protein of Sputnik is likely to have a double jelly-roll fold, although sequence alignments do not show any detectable similarity with other viral double jelly-roll capsid proteins. Hence, the origin of Sputnik's capsid might have been derived from other viruses prior to its association with mimivirus.
Structural Studies of the Sputnik Virophage▿

PubMed Central

Sun, Siyang; La Scola, Bernard; Bowman, Valorie D.; Ryan, Christopher M.; Whitelegge, Julian P.; Raoult, Didier; Rossmann, Michael G.

2010-01-01

The virophage Sputnik is a satellite virus of the giant mimivirus and is the only satellite virus reported to date whose propagation adversely affects its host virus' production. Genome sequence analysis showed that Sputnik has genes related to viruses infecting all three domains of life. Here, we report structural studies of Sputnik, which show that it is about 740 Å in diameter, has a T=27 icosahedral capsid, and has a lipid membrane inside the protein shell. Structural analyses suggest that the major capsid protein of Sputnik is likely to have a double jelly-roll fold, although sequence alignments do not show any detectable similarity with other viral double jelly-roll capsid proteins. Hence, the origin of Sputnik's capsid might have been derived from other viruses prior to its association with mimivirus. PMID:19889775
Identifying the Basal Angiosperm Node in Chloroplast GenomePhylogenies: Sampling One's Way Out of the Felsenstein Zone

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leebens-Mack, Jim; Raubeson, Linda A.; Cui, Liying

2005-05-27

While there has been strong support for Amborella and Nymphaeales (water lilies) as branching from basal-most nodes in the angiosperm phylogeny, this hypothesis has recently been challenged by phylogenetic analyses of 61 protein-coding genes extracted from the chloroplast genome sequences of Amborella, Nymphaea and 12 other available land plant chloroplast genomes. These character-rich analyses placed the monocots, represented by three grasses (Poaceae), as sister to all other extant angiosperm lineages. We have extracted protein-coding regions from draft sequences for six additional chloroplast genomes to test whether this surprising result could be an artifact of long-branch attraction due to limited taxonmore » sampling. The added taxa include three monocots (Acorus, Yucca and Typha), a water lily (Nuphar), a ranunculid(Ranunculus), and a gymnosperm (Ginkgo). Phylogenetic analyses of the expanded DNA and protein datasets together with microstructural characters (indels) provided unambiguous support for Amborella and the Nymphaeales as branching from the basal-most nodes in the angiospermphylogeny. However, their relative positions proved to be dependent on method of analysis, with parsimony favoring Amborella as sister to all other angiosperms, and maximum likelihood and neighbor-joining methods favoring an Amborella + Nympheales clade as sister. The maximum likelihood phylogeny supported the later hypothesis, but the likelihood for the former hypothesis was not significantly different. Parametric bootstrap analysis, single gene phylogenies, estimated divergence dates and conflicting in del characters all help to illuminate the nature of the conflict in resolution of the most basal nodes in the angiospermphylogeny. Molecular dating analyses provided median age estimates of 161 mya for the most recent common ancestor of all extant angiosperms and 145 mya for the most recent common ancestor of monocots, magnoliids andeudicots. Whereas long sequences reduce variance in branch lengths and molecular dating estimates, the impact of improved taxon sampling on the rooting of the angiosperm phylogeny together with the results of parametric bootstrap analyses demonstrate how long-branch attraction can mislead genome-scale phylogenetic analyses.« less
Molecular and functional characterization of two drought-induced zinc finger proteins, ZmZnF1 and ZmZnF2 from maize kernels

USDA-ARS?s Scientific Manuscript database

We have isolated two cDNA clones encoding Zinc Finger proteins, designated as ZmZnF1 and ZmZnF2, from water-stressed maize kernels. Sequence analyses indicates that ZmZnF1 is homologous to the A20/AN1-type zinc finger protein and contains the zinc finger motif of Cx2–Cx10–CxCx4Cx2Hx5HxC. Whereas ZmZ...
Signatures of pleiotropy, economy and convergent evolution in a domain-resolved map of human-virus protein-protein interaction networks.

PubMed

Garamszegi, Sara; Franzosa, Eric A; Xia, Yu

2013-01-01

A central challenge in host-pathogen systems biology is the elucidation of general, systems-level principles that distinguish host-pathogen interactions from within-host interactions. Current analyses of host-pathogen and within-host protein-protein interaction networks are largely limited by their resolution, treating proteins as nodes and interactions as edges. Here, we construct a domain-resolved map of human-virus and within-human protein-protein interaction networks by annotating protein interactions with high-coverage, high-accuracy, domain-centric interaction mechanisms: (1) domain-domain interactions, in which a domain in one protein binds to a domain in a second protein, and (2) domain-motif interactions, in which a domain in one protein binds to a short, linear peptide motif in a second protein. Analysis of these domain-resolved networks reveals, for the first time, significant mechanistic differences between virus-human and within-human interactions at the resolution of single domains. While human proteins tend to compete with each other for domain binding sites by means of sequence similarity, viral proteins tend to compete with human proteins for domain binding sites in the absence of sequence similarity. Independent of their previously established preference for targeting human protein hubs, viral proteins also preferentially target human proteins containing linear motif-binding domains. Compared to human proteins, viral proteins participate in more domain-motif interactions, target more unique linear motif-binding domains per residue, and contain more unique linear motifs per residue. Together, these results suggest that viruses surmount genome size constraints by convergently evolving multiple short linear motifs in order to effectively mimic, hijack, and manipulate complex host processes for their survival. Our domain-resolved analyses reveal unique signatures of pleiotropy, economy, and convergent evolution in viral-host interactions that are otherwise hidden in the traditional binary network, highlighting the power and necessity of high-resolution approaches in host-pathogen systems biology.
Genome analysis of the platypus reveals unique signatures of evolution.

PubMed

Warren, Wesley C; Hillier, LaDeana W; Marshall Graves, Jennifer A; Birney, Ewan; Ponting, Chris P; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P; Miethke, Pat; Waters, Paul D; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S; López-Otín, Carlos; Ordóñez, Gonzalo R; Eichler, Evan E; Chen, Lin; Cheng, Ze; Deakin, Janine E; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T; Wakefield, Matthew J; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A; Smit, Arian F A; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A; Walker, Jerilyn A; Konkel, Miriam K; Harris, Robert S; Whittington, Camilla M; Wong, Emily S W; Gemmell, Neil J; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R; Ray, David A; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H; Taylor, James; Jones, Russell C; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N; Pohl, Craig S; Smith, Scott M; Hou, Shunfeng; Nefedov, Mikhail; de Jong, Pieter J; Renfree, Marilyn B; Mardis, Elaine R; Wilson, Richard K

2008-05-08

We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.
Genome analysis of the platypus reveals unique signatures of evolution

PubMed Central

Warren, Wesley C.; Hillier, LaDeana W.; Marshall Graves, Jennifer A.; Birney, Ewan; Ponting, Chris P.; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T.; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P.; Miethke, Pat; Waters, Paul D.; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S.; López-Otín, Carlos; Ordóñez, Gonzalo R.; Eichler, Evan E.; Chen, Lin; Cheng, Ze; Deakin, Janine E.; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T.; Wakefield, Matthew J.; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A.; Smit, Arian F. A.; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A.; Walker, Jerilyn A.; Konkel, Miriam K.; Harris, Robert S.; Whittington, Camilla M.; Wong, Emily S. W.; Gemmell, Neil J.; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M.; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P.; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J.; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M.; Sharp, Julie A.; Nicholas, Kevin R.; Ray, David A.; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H.; Taylor, James; Jones, Russell C.; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N.; Pohl, Craig S.; Smith, Scott M.; Hou, Shunfeng; Renfree, Marilyn B.; Mardis, Elaine R.; Wilson, Richard K.

2009-01-01

We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation. PMID:18464734
Determination of the complete genomic sequence and analysis of the gene products of the virus of Spring Viremia of Carp, a fish rhabdovirus.

PubMed

Hoffmann, Bernd; Schütze, Heike; Mettenleiter, Thomas C

2002-03-20

The complete genome of spring viremia of carp virus (SVCV) was cloned and the sequence of 11019 nucleotides was determined. It contains five open reading frames (ORF's) encoding for the nucleoprotein N; phosphoprotein P; matrix protein M; glycoprotein G; and the viral RNA dependent RNA polymerase L. Genes are organised in the order typical for rhabdoviruses: 3'-N-P-M-G-L-5'. The short leader and trailer regions of SVCV exhibit inverse complementarity and are similar to the respective 3' and 5' ends of the genome of vesicular stomatitis virus. To verify the predicted open reading frames proteins were expressed in bacteria and analysed with a polyclonal anti-SVCV serum. Furthermore, monospecific antisera against the distinct viral proteins were generated. Comparison of genome and protein confirm the assignment of SVCV to the genus Vesiculovirus.
Deriving Heterospecific Self-Assembling Protein-Protein Interactions Using a Computational Interactome Screen.

PubMed

Crooks, Richard O; Baxter, Daniel; Panek, Anna S; Lubben, Anneke T; Mason, Jody M

2016-01-29

Interactions between naturally occurring proteins are highly specific, with protein-network imbalances associated with numerous diseases. For designed protein-protein interactions (PPIs), required specificity can be notoriously difficult to engineer. To accelerate this process, we have derived peptides that form heterospecific PPIs when combined. This is achieved using software that generates large virtual libraries of peptide sequences and searches within the resulting interactome for preferentially interacting peptides. To demonstrate feasibility, we have (i) generated 1536 peptide sequences based on the parallel dimeric coiled-coil motif and varied residues known to be important for stability and specificity, (ii) screened the 1,180,416 member interactome for predicted Tm values and (iii) used predicted Tm cutoff points to isolate eight peptides that form four heterospecific PPIs when combined. This required that all 32 hypothetical off-target interactions within the eight-peptide interactome be disfavoured and that the four desired interactions pair correctly. Lastly, we have verified the approach by characterising all 36 pairs within the interactome. In analysing the output, we hypothesised that several sequences are capable of adopting antiparallel orientations. We subsequently improved the software by removing sequences where doing so led to fully complementary electrostatic pairings. Our approach can be used to derive increasingly large and therefore complex sets of heterospecific PPIs with a wide range of potential downstream applications from disease modulation to the design of biomaterials and peptides in synthetic biology. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Hydrophobic cluster analysis of G protein-coupled receptors: a powerful tool to derive structural and functional information from 2D-representation of protein sequences.

PubMed

Lentes, K U; Mathieu, E; Bischoff, R; Rasmussen, U B; Pavirani, A

1993-01-01

Current methods for comparative analyses of protein sequences are 1D-alignments of amino acid sequences based on the maximization of amino acid identity (homology) and the prediction of secondary structure elements. This method has a major drawback once the amino acid identity drops below 20-25%, since maximization of a homology score does not take into account any structural information. A new technique called Hydrophobic Cluster Analysis (HCA) has been developed by Lemesle-Varloot et al. (Biochimie 72, 555-574), 1990). This consists of comparing several sequences simultaneously and combining homology detection with secondary structure analysis. HCA is primarily based on the detection and comparison of structural segments constituting the hydrophobic core of globular protein domains, with or without transmembrane domains. We have applied HCA to the analysis of different families of G-protein coupled receptors, such as catecholamine receptors as well as peptide hormone receptors. Utilizing HCA the thrombin receptor, a new and as yet unique member of the family of G-protein coupled receptors, can be clearly classified as being closely related to the family of neuropeptide receptors rather than to the catecholamine receptors for which the shape of the hydrophobic clusters and the length of their third cytoplasmic loop are very different. Furthermore, the potential of HCA to predict relationships between new putative and already characterized members of this family of receptors will be presented.

Identification and characterization of a novel zebrafish (Danio rerio) pentraxin-carbonic anhydrase.

PubMed

Patrikainen, Maarit S; Tolvanen, Martti E E; Aspatwar, Ashok; Barker, Harlan R; Ortutay, Csaba; Jänis, Janne; Laitaoja, Mikko; Hytönen, Vesa P; Azizi, Latifeh; Manandhar, Prajwol; Jáger, Edit; Vullo, Daniela; Kukkurainen, Sampo; Hilvo, Mika; Supuran, Claudiu T; Parkkila, Seppo

2017-01-01

Carbonic anhydrases (CAs) are ubiquitous, essential enzymes which catalyze the conversion of carbon dioxide and water to bicarbonate and H + ions. Vertebrate genomes generally contain gene loci for 15-21 different CA isoforms, three of which are enzymatically inactive. CA VI is the only secretory protein of the enzymatically active isoforms. We discovered that non-mammalian CA VI contains a C-terminal pentraxin (PTX) domain, a novel combination for both CAs and PTXs. We isolated and sequenced zebrafish ( Danio rerio ) CA VI cDNA, complete with the sequence coding for the PTX domain, and produced the recombinant CA VI-PTX protein. Enzymatic activity and kinetic parameters were measured with a stopped-flow instrument. Mass spectrometry, analytical gel filtration and dynamic light scattering were used for biophysical characterization. Sequence analyses and Bayesian phylogenetics were used in generating hypotheses of protein structure and CA VI gene evolution. A CA VI-PTX antiserum was produced, and the expression of CA VI protein was studied by immunohistochemistry. A knock-down zebrafish model was constructed, and larvae were observed up to five days post-fertilization (dpf). The expression of ca6 mRNA was quantitated by qRT-PCR in different developmental times in morphant and wild-type larvae and in different adult fish tissues. Finally, the swimming behavior of the morphant fish was compared to that of wild-type fish. The recombinant enzyme has a very high carbonate dehydratase activity. Sequencing confirms a 530-residue protein identical to one of the predicted proteins in the Ensembl database (ensembl.org). The protein is pentameric in solution, as studied by gel filtration and light scattering, presumably joined by the PTX domains. Mass spectrometry confirms the predicted signal peptide cleavage and disulfides, and N-glycosylation in two of the four observed glycosylation motifs. Molecular modeling of the pentamer is consistent with the modifications observed in mass spectrometry. Phylogenetics and sequence analyses provide a consistent hypothesis of the evolutionary history of domains associated with CA VI in mammals and non-mammals. Briefly, the evidence suggests that ancestral CA VI was a transmembrane protein, the exon coding for the cytoplasmic domain was replaced by one coding for PTX domain, and finally, in the therian lineage, the PTX-coding exon was lost. We knocked down CA VI expression in zebrafish embryos with antisense morpholino oligonucleotides, resulting in phenotype features of decreased buoyancy and swim bladder deflation in 4 dpf larvae. These findings provide novel insights into the evolution, structure, and function of this unique CA form.
Analysis of ER Resident Proteins in S. cerevisiae: Implementation of H/KDEL Retrieval Sequences

PubMed Central

Young, Carissa L.; Raden, David L.; Robinson, Anne S.

2013-01-01

An elaborate quality control system regulates ER homeostasis by ensuring the fidelity of protein synthesis and maturation. In budding yeast, genomic analyses and high-throughput proteomic studies have identified ER resident proteins that restore homeostasis following local perturbations. Yet, how these folding factors modulate stress has been largely unexplored. In this study, we designed a series of PCR-based modules including codon-optimized epitopes and FP variants complete with C-terminal H/KDEL retrieval motifs. These conserved sequences are inherent to most soluble ER resident proteins. To monitor multiple proteins simultaneously, H/KDEL cassettes are available with six different selection markers, providing optimal flexibility for live-cell imaging and multicolor labeling in vivo. A single pair of PCR primers can be used for the amplification of these 26 modules, enabling numerous combinations of tags and selection markers. The versatility of pCY H/KDEL cassettes was demonstrated by labeling BiP/Kar2p, Pdi1p, and Scj1p with all novel tags, thus providing a direct comparison among FP variants. Furthermore, to advance in vitro studies of yeast ER proteins, Strep-tag II was engineered with a C-terminal retrieval sequence. Here, an efficient purification strategy was established for BiP under physiological conditions. PMID:23324027
Proteomic analysis of the phytopathogenic soilborne fungus Verticillium dahliae reveals differential protein expression in isolates that differ in aggressiveness.

PubMed

El-Bebany, Ahmed F; Rampitsch, Christof; Daayf, Fouad

2010-01-01

Verticillium dahliae is a soilborne fungus that causes a vascular wilt disease of plants and losses in a broad range of economically important crops worldwide. In this study, we compared the proteomes of highly (Vd1396-9) and weakly (Vs06-14) aggressive isolates of V. dahliae to identify protein factors that may contribute to pathogenicity. Twenty-five protein spots were consistently observed as differential in the proteome profiles of the two isolates. The protein sequences in the spots were identified by LC-ESI-MS/MS and MASCOT database searches. Some of the identified sequences shared homology with fungal proteins that have roles in stress response, colonization, melanin biosynthesis, microsclerotia formation, antibiotic resistance, and fungal penetration. These are important functions for infection of the host and survival of the pathogen in soil. One protein found only in the highly aggressive isolate was identified as isochorismatase hydrolase, a potential plant-defense suppressor. This enzyme may inhibit the production of salicylic acid, which is important for plant defense response signaling. Other sequences corresponding to potential pathogenicity factors were identified in the highly aggressive isolate. This work indicates that, in combination with functional genomics, proteomics-based analyses can provide additional insights into pathogenesis and potential management strategies for this disease.
Signatures of Pleiotropy, Economy and Convergent Evolution in a Domain-Resolved Map of Human–Virus Protein–Protein Interaction Networks

PubMed Central

Garamszegi, Sara; Franzosa, Eric A.; Xia, Yu

2013-01-01

A central challenge in host-pathogen systems biology is the elucidation of general, systems-level principles that distinguish host-pathogen interactions from within-host interactions. Current analyses of host-pathogen and within-host protein-protein interaction networks are largely limited by their resolution, treating proteins as nodes and interactions as edges. Here, we construct a domain-resolved map of human-virus and within-human protein-protein interaction networks by annotating protein interactions with high-coverage, high-accuracy, domain-centric interaction mechanisms: (1) domain-domain interactions, in which a domain in one protein binds to a domain in a second protein, and (2) domain-motif interactions, in which a domain in one protein binds to a short, linear peptide motif in a second protein. Analysis of these domain-resolved networks reveals, for the first time, significant mechanistic differences between virus-human and within-human interactions at the resolution of single domains. While human proteins tend to compete with each other for domain binding sites by means of sequence similarity, viral proteins tend to compete with human proteins for domain binding sites in the absence of sequence similarity. Independent of their previously established preference for targeting human protein hubs, viral proteins also preferentially target human proteins containing linear motif-binding domains. Compared to human proteins, viral proteins participate in more domain-motif interactions, target more unique linear motif-binding domains per residue, and contain more unique linear motifs per residue. Together, these results suggest that viruses surmount genome size constraints by convergently evolving multiple short linear motifs in order to effectively mimic, hijack, and manipulate complex host processes for their survival. Our domain-resolved analyses reveal unique signatures of pleiotropy, economy, and convergent evolution in viral-host interactions that are otherwise hidden in the traditional binary network, highlighting the power and necessity of high-resolution approaches in host-pathogen systems biology. PMID:24339775
Applying the Concept of Peptide Uniqueness to Anti-Polio Vaccination

PubMed Central

Kanduc, Darja; Fasano, Candida; Capone, Giovanni; Pesce Delfino, Antonella; Calabrò, Michele; Polimeno, Lorenzo

2015-01-01

Background. Although rare, adverse events may associate with anti-poliovirus vaccination thus possibly hampering global polio eradication worldwide. Objective. To design peptide-based anti-polio vaccines exempt from potential cross-reactivity risks and possibly able to reduce rare potential adverse events such as the postvaccine paralytic poliomyelitis due to the tendency of the poliovirus genome to mutate. Methods. Proteins from poliovirus type 1, strain Mahoney, were analyzed for amino acid sequence identity to the human proteome at the pentapeptide level, searching for sequences that (1) have zero percent of identity to human proteins, (2) are potentially endowed with an immunologic potential, and (3) are highly conserved among poliovirus strains. Results. Sequence analyses produced a set of consensus epitopic peptides potentially able to generate specific anti-polio immune responses exempt from cross-reactivity with the human host. Conclusion. Peptide sequences unique to poliovirus proteins and conserved among polio strains might help formulate a specific and universal anti-polio vaccine able to react with multiple viral strains and exempt from the burden of possible cross-reactions with human proteins. As an additional advantage, using a peptide-based vaccine instead of current anti-polio DNA vaccines would eliminate the rare post-polio poliomyelitis cases and other disabling symptoms that may appear following vaccination. PMID:26568962
Computational analyses of mammalian lactate dehydrogenases: human, mouse, opossum and platypus LDHs.

PubMed

Holmes, Roger S; Goldberg, Erwin

2009-10-01

Computational methods were used to predict the amino acid sequences and gene locations for mammalian lactate dehydrogenase (LDH) genes and proteins using genome sequence databanks. Human LDHA, LDHC and LDH6A genes were located in tandem on chromosome 11, while LDH6B and LDH6C genes were on chromosomes 15 and 12, respectively. Opossum LDHC and LDH6B genes were located in tandem with the opossum LDHA gene on chromosome 5 and contained 7 (LDHA and LDHC) or 8 (LDH6B) exons. An amino acid sequence prediction for the opossum LDH6B subunit gave an extended N-terminal sequence, similar to the human and mouse LDH6B sequences, which may support the export of this enzyme into mitochondria. The platypus genome contained at least 3 LDH genes encoding LDHA, LDHB and LDH6B subunits. Phylogenetic studies and sequence analyses indicated that LDHA, LDHB and LDH6B genes are present in all mammalian genomes examined, including a monotreme species (platypus), whereas the LDHC gene may have arisen more recently in marsupial mammals.
Computational analyses of mammalian lactate dehydrogenases: human, mouse, opossum and platypus LDHs

PubMed Central

Holmes, Roger S; Goldberg, Erwin

2009-01-01

Computational methods were used to predict the amino acid sequences and gene locations for mammalian lactate dehydrogenase (LDH) genes and proteins using genome sequence databanks. Human LDHA, LDHC and LDH6A genes were located in tandem on chromosome 11, while LDH6B and LDH6C genes were on chromosomes 15 and 12, respectively. Opossum LDHC and LDH6B genes were located in tandem with the opossum LDHA gene on chromosome 5 and contained 7 (LDHA and LDHC) or 8 (LDH6B) exons. An amino acid sequence prediction for the opossum LDH6B subunit gave an extended N-terminal sequence, similar to the human and mouse LDH6B sequences, which may support the export of this enzyme into mitochondria. The platypus genome contained at least 3 LDH genes encoding LDHA, LDHB and LDH6B subunits. Phylogenetic studies and sequence analyses indicated that LDHA, LDHB and LDH6B genes are present in all mammalian genomes examined, including a monotreme species (platypus), whereas the LDHC gene may have arisen more recently in marsupial mammals. PMID:19679512
Adaptive Covariation between the Coat and Movement Proteins of Prunus Necrotic Ringspot Virus

PubMed Central

Codoñer, Francisco M.; Fares, Mario A.; Elena, Santiago F.

2006-01-01

The relative functional and/or structural importance of different amino acid sites in a protein can be assessed by evaluating the selective constraints to which they have been subjected during the course of evolution. Here we explore such constraints at the linear and three-dimensional levels for the movement protein (MP) and coat protein (CP) encoded by RNA 3 of prunus necrotic ringspot ilarvirus (PNRSV). By a maximum-parsimony approach, the nucleotide sequences from 46 isolates of PNRSV varying in symptomatology, host tree, and geographic origin have been analyzed and sites under different selective pressures have been identified in both proteins. We have also performed covariation analyses to explore whether changes in certain amino acid sites condition subsequent variation in other sites of the same protein or the other protein. These covariation analyses shed light on which particular amino acids should be involved in the physical and functional interaction between MP and CP. Finally, we discuss these findings in the light of what is already known about the implication of certain sites and domains in structure and protein-protein and RNA-protein interactions. PMID:16731922
Adaptive covariation between the coat and movement proteins of prunus necrotic ringspot virus.

PubMed

Codoñer, Francisco M; Fares, Mario A; Elena, Santiago F

2006-06-01

The relative functional and/or structural importance of different amino acid sites in a protein can be assessed by evaluating the selective constraints to which they have been subjected during the course of evolution. Here we explore such constraints at the linear and three-dimensional levels for the movement protein (MP) and coat protein (CP) encoded by RNA 3 of prunus necrotic ringspot ilarvirus (PNRSV). By a maximum-parsimony approach, the nucleotide sequences from 46 isolates of PNRSV varying in symptomatology, host tree, and geographic origin have been analyzed and sites under different selective pressures have been identified in both proteins. We have also performed covariation analyses to explore whether changes in certain amino acid sites condition subsequent variation in other sites of the same protein or the other protein. These covariation analyses shed light on which particular amino acids should be involved in the physical and functional interaction between MP and CP. Finally, we discuss these findings in the light of what is already known about the implication of certain sites and domains in structure and protein-protein and RNA-protein interactions.
The “Naked Coral” Hypothesis Revisited – Evidence for and Against Scleractinian Monophyly

PubMed Central

Forêt, Sylvain; Huttley, Gavin; Miller, David J.; Chen, Chaolun Allen

2014-01-01

The relationship between Scleractinia and Corallimorpharia, Orders within Anthozoa distinguished by the presence of an aragonite skeleton in the former, is controversial. Although classically considered distinct groups, some phylogenetic analyses have placed the Corallimorpharia within a larger Scleractinia/Corallimorpharia clade, leading to the suggestion that the Corallimorpharia are “naked corals” that arose via skeleton loss during the Cretaceous from a Scleractinian ancestor. Scleractinian paraphyly is, however, contradicted by a number of recent phylogenetic studies based on mt nucleotide (nt) sequence data. Whereas the “naked coral” hypothesis was based on analysis of the sequences of proteins encoded by a relatively small number of mt genomes, here a much-expanded dataset was used to reinvestigate hexacorallian phylogeny. The initial observation was that, whereas analyses based on nt data support scleractinian monophyly, those based on amino acid (aa) data support the “naked coral” hypothesis, irrespective of the method and with very strong support. To better understand the bases of these contrasting results, the effects of systematic errors were examined. Compared to other hexacorallians, the mt genomes of “Robust” corals have a higher (A+T) content, codon usage is far more constrained, and the proteins that they encode have a markedly higher phenylalanine content, leading us to suggest that mt DNA repair may be impaired in this lineage. Thus the “naked coral” topology could be caused by high levels of saturation in these mitochondrial sequences, long-branch effects or model violations. The equivocal results of these extensive analyses highlight the fundamental problems of basing coral phylogeny on mitochondrial sequence data. PMID:24740380
Cloning of Giardia lamblia heat shock protein HSP70 homologs: implications regarding origin of eukaryotic cells and of endoplasmic reticulum.

PubMed Central

Gupta, R S; Aitken, K; Falah, M; Singh, B

1994-01-01

The genes for two different 70-kDa heat shock protein (HSP70) homologs have been cloned and sequenced from the protozoan Giardia lamblia. On the basis of their sequence features, one of these genes corresponds to the cytoplasmic form of HSP70. The second gene, on the basis of its characteristic N-terminal hydrophobic signal sequence and C-terminal endoplasmic reticulum (ER) retention sequence (Lys-Asp-Glu-Leu), is the equivalent of ER-resident GRP78 or the Bip family of proteins. Phylogenetic trees based on HSP70 sequences show that G. lamblia homologs show the deepest divergence among eukaryotic species. The identification of a GRP78 or Bip homolog in G. lamblia strongly suggests the existence of ER in this ancient eukaryote. Detailed phylogenetic analyses of HSP70 sequences by boot-strap neighbor-joining and maximum-parsimony methods show that the cytoplasmic and ER homologs form distinct subfamilies that evolved from a common eukaryotic ancestor by gene duplication that occurred very early in the evolution of eukaryotic cells. It is postulated that because of the essential "molecular chaperone" function of these proteins in translocation of other proteins across membranes, duplication of their genes accompanied the evolution of ER or nucleus in the eukaryotic cell ancestor. The presence in all eukaryotic cytoplasmic HSP70 homologs (including the cognate, heat-induced, and ER forms) of a number of autapomorphic sequence signatures that are not present in any prokaryotic or organellar homologs provides strong evidence regarding the monophyletic nature of eukaryotic lineage. Further, all eukaryotic HSP70 homologs share in common with the Gram-negative group of eubacteria a number of sequence features that are not present in any archaebacterium or Gram-positive bacterium, indicating their evolution from this group of organisms. Some implications of these findings regarding the evolution of eukaryotic cells and ER are discussed. Images PMID:8159675
Phylogenomic analyses and molecular signatures for the class Halobacteria and its two major clades: a proposal for division of the class Halobacteria into an emended order Halobacteriales and two new orders, Haloferacales ord. nov. and Natrialbales ord. nov., containing the novel families Haloferacaceae fam. nov. and Natrialbaceae fam. nov.

PubMed

Gupta, Radhey S; Naushad, Sohail; Baker, Sheridan

2015-03-01

The Halobacteria constitute one of the largest groups within the Archaea. The hierarchical relationship among members of this large class, which comprises a single order and a single family, has proven difficult to determine based upon 16S rRNA gene trees and morphological and physiological characteristics. This work reports detailed phylogenetic and comparative genomic studies on >100 halobacterial (haloarchaeal) genomes containing representatives from 30 genera to investigate their evolutionary relationships. In phylogenetic trees reconstructed on the basis of 32 conserved proteins, using both neighbour-joining and maximum-likelihood methods, two major clades (clades A and B) encompassing nearly two-thirds of the sequenced haloarchaeal species were strongly supported. Clades grouping the same species/genera were also supported by the 16S rRNA gene trees and trees for several individual highly conserved proteins (RpoC, EF-Tu, UvrD, GyrA, EF-2/EF-G). In parallel, our comparative analyses of protein sequences from haloarchaeal genomes have identified numerous discrete molecular markers in the form of conserved signature indels (CSI) in protein sequences and conserved signature proteins (CSPs) that are found uniquely in specific groups of haloarchaea. Thirteen CSIs in proteins involved in diverse functions and 68 CSPs that are uniquely present in all or most genome-sequenced haloarchaea provide novel molecular means for distinguishing members of the class Halobacteria from all other prokaryotes. The members of clade A are distinguished from all other haloarchaea by the unique shared presence of two CSIs in the ribose operon protein and small GTP-binding protein and eight CSPs that are found specifically in members of this clade. Likewise, four CSIs in different proteins and five other CSPs are present uniquely in members of clade B and distinguish them from all other haloarchaea. Based upon their specific clustering in phylogenetic trees for different gene/protein sequences and the unique shared presence of large numbers of molecular signatures, members of clades A and B are indicated to be distinct from all other haloarchaea because of their uniquely shared evolutionary histories. Based upon these results, it is proposed that clades A and B be recognized as two new orders, Natrialbales ord. nov. and Haloferacales ord. nov., within the class Halobacteria, containing the novel families Natrialbaceae fam. nov. and Haloferacaceae fam. nov. Other members of the class Halobacteria that are not members of these two orders will remain part of the emended order Halobacteriales in an emended family Halobacteriaceae. © 2015 IUMS.
Diversity and Evolution of Bacterial Twin Arginine Translocase Protein, TatC, Reveals a Protein Secretion System That Is Evolving to Fit Its Environmental Niche

PubMed Central

Simone, Domenico; Bay, Denice C.; Leach, Thorin; Turner, Raymond J.

2013-01-01

Background The twin-arginine translocation (Tat) protein export system enables the transport of fully folded proteins across a membrane. This system is composed of two integral membrane proteins belonging to TatA and TatC protein families and in some systems a third component, TatB, a homolog of TatA. TatC participates in substrate protein recognition through its interaction with a twin arginine leader peptide sequence. Methodology/Principal Findings The aim of this study was to explore TatC diversity, evolution and sequence conservation in bacteria to identify how TatC is evolving and diversifying in various bacterial phyla. Surveying bacterial genomes revealed that 77% of all species possess one or more tatC loci and half of these classes possessed only tatC and tatA genes. Phylogenetic analysis of diverse TatC homologues showed that they were primarily inherited but identified a small subset of taxonomically unrelated bacteria that exhibited evidence supporting lateral gene transfer within an ecological niche. Examination of bacilli tatCd/tatCy isoform operons identified a number of known and potentially new Tat substrate genes based on their frequent association to tatC loci. Evolutionary analysis of these Bacilli isoforms determined that TatCy was the progenitor of TatCd. A bacterial TatC consensus sequence was determined and highlighted conserved and variable regions within a three dimensional model of the Escherichia coli TatC protein. Comparative analysis between the TatC consensus sequence and Bacilli TatCd/y isoform consensus sequences revealed unique sites that may contribute to isoform substrate specificity or make TatA specific contacts. Synonymous to non-synonymous nucleotide substitution analyses of bacterial tatC homologues determined that tatC sequence variation differs dramatically between various classes and suggests TatC specialization in these species. Conclusions/Significance TatC proteins appear to be diversifying within particular bacterial classes and its specialization may be driven by the substrates it transports and the environment of its host. PMID:24236045
Nonsyndromic cleft lip with or without cleft palate: Increased burden of rare variants within Gremlin-1, a component of the bone morphogenetic protein 4 pathway.

PubMed

Al Chawa, Taofik; Ludwig, Kerstin U; Fier, Heide; Pötzsch, Bernd; Reich, Rudolf H; Schmidt, Gül; Braumann, Bert; Daratsianos, Nikolaos; Böhmer, Anne C; Schuencke, Hannah; Alblas, Margrieta; Fricker, Nadine; Hoffmann, Per; Knapp, Michael; Lange, Christoph; Nöthen, Markus M; Mangold, Elisabeth

2014-06-01

The genes Gremlin-1 (GREM1) and Noggin (NOG) are components of the bone morphogenetic protein 4 pathway, which has been implicated in craniofacial development. Both genes map to recently identified susceptibility loci (chromosomal region 15q13, 17q22) for nonsyndromic cleft lip with or without cleft palate (nsCL/P). The aim of the present study was to determine whether rare variants in either gene are implicated in nsCL/P etiology. The complete coding regions, untranslated regions, and splice sites of GREM1 and NOG were sequenced in 96 nsCL/P patients and 96 controls of Central European ethnicity. Three burden and four nonburden tests were performed. Statistically significant results were followed up in a second case-control sample (n = 96, respectively). For rare variants observed in cases, segregation analyses were performed. In NOG, four rare sequence variants (minor allele frequency < 1%) were identified. Here, burden and nonburden analyses generated nonsignificant results. In GREM1, 33 variants were identified, 15 of which were rare. Of these, five were novel. Significant p-values were generated in three nonburden analyses. Segregation analyses revealed incomplete penetrance for all variants investigated. Our study did not provide support for NOG being the causal gene at 17q22. However, the observation of a significant excess of rare variants in GREM1 supports the hypothesis that this is the causal gene at chr. 15q13. Because no single causal variant was identified, future sequencing analyses of GREM1 should involve larger samples and the investigation of regulatory elements. © 2014 Wiley Periodicals, Inc.
GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.

PubMed

Catanho, Marcos; Mascarenhas, Daniel; Degrave, Wim; Miranda, Antonio Basílio de

2006-03-31

Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.
Identification of the first nonsense CDSN mutation with expression of a truncated protein causing peeling skin syndrome type B.

PubMed

Mallet, A; Kypriotou, M; George, K; Leclerc, E; Rivero, D; Mazereeuw-Hautier, J; Serre, G; Huber, M; Jonca, N; Hohl, D

2013-12-01

Peeling skin disease (PSD), a generalized inflammatory form of peeling skin syndrome, is caused by autosomal recessive nonsense mutations in the corneodesmosin gene (CDSN). To investigate a novel mutation in CDSN. A 50-year-old white woman showed widespread peeling with erythema and elevated serum IgE. DNA sequencing, immunohistochemistry, Western blot and real-time polymerase chain reaction analyses of skin biopsies were performed in order to study the genetics and to characterize the molecular profile of the disease. Histology showed hyperkeratosis and acanthosis of the epidermis, and inflammatory infiltrates in the dermis. DNA sequencing revealed a homozygous mutation leading to a premature termination codon in CDSN: p.Gly142*. Protein analyses showed reduced expression of a 16-kDa corneodesmosin mutant in the upper epidermal layers, whereas the full-length protein was absent. These results are interesting regarding the genotype-phenotype correlations in diseases caused by CDSN mutations. The PSD-causing CDSN mutations identified heretofore result in total corneodesmosin loss, suggesting that PSD is due to full corneodesmosin deficiency. Here, we show for the first time that a mutant corneodesmosin can be stably expressed in some patients with PSD, and that this truncated protein is very probably nonfunctional. © 2013 British Association of Dermatologists.
The genome organisation and taxonomy of Sugarcane striate mosaic associated virus.

PubMed

Thompson, N; Randles, J W

2001-08-01

Sugarcane striate mosaic associated virus (SCSMaV) has slightly flexuous 950 nm x 15 nm filamentous particles and is associated with sugarcane striate mosaic disease in central Queensland, Australia. We report the full sequence of its RNA genome, which comprises 5 open reading frames representing the polymerase, movement function proteins encoded in a triple gene block and coat protein. Phylogenetic analyses based on either the full nucleotide sequence, the polymerase protein, or the coat protein all placed SCSMaV in an intermediate position between the genera Foveavirus and Carlavirus, but outside both genera. In addition, the absence of a sixth open reading frame excludes it from the genus Carlavirus, and the coat protein is approximately half the size of the type member for the genus Foveavirus. Although SCSMaV was most closely allied to Cherry green ring mottle virus by genome analysis, the two viruses are morphologically and biologically dissimilar. SCSMaV may therefore represent a new plant virus taxon.
Discovery of a novel Parvovirinae virus, porcine parvovirus 7, by metagenomic sequencing of porcine rectal swabs.

PubMed

Palinski, Rachel M; Mitra, Namita; Hause, Ben M

2016-08-01

Parvoviruses are a diverse group of viruses containing some of the smallest known species that are capable of infecting a wide range of animals. Metagenomic sequencing of pooled rectal swabs from adult pigs identified a 4103-bp contig consisting of two major open reading frames encoding proteins of 672 and 469 amino acids (aa) in length. BLASTP analysis of the 672-aa protein found 42.4 % identity to fruit bat (Eidolon helvum) parvovirus 2 (EhPV2) and 37.9 % to turkey parvovirus (TuPV) TP1-2012/HUN NS1 proteins. The 469-aa protein had no significant similarity to known proteins. Genetic and phylogenetic analyses suggest that PPV7, EhPV2, and TuPV represent a novel genus in the family Parvoviridae. Quantitative PCR screening of 182 porcine diagnostic samples found a total of 16 positives (8.6 %). Together, these data suggest that PPV7 is a highly divergent novel parvovirus prevalent within the US swine.
Genetic and structural analyses of cytochrome P450 hydroxylases in sex hormone biosynthesis: Sequential origin and subsequent coevolution.

PubMed

Goldstone, Jared V; Sundaramoorthy, Munirathinam; Zhao, Bin; Waterman, Michael R; Stegeman, John J; Lamb, David C

2016-01-01

Biosynthesis of steroid hormones in vertebrates involves three cytochrome P450 hydroxylases, CYP11A1, CYP17A1 and CYP19A1, which catalyze sequential steps in steroidogenesis. These enzymes are conserved in the vertebrates, but their origin and existence in other chordate subphyla (Tunicata and Cephalochordata) have not been clearly established. In this study, selected protein sequences of CYP11A1, CYP17A1 and CYP19A1 were compiled and analyzed using multiple sequence alignment and phylogenetic analysis. Our analyses show that cephalochordates have sequences orthologous to vertebrate CYP11A1, CYP17A1 or CYP19A1, and that echinoderms and hemichordates possess CYP11-like but not CYP19 genes. While the cephalochordate sequences have low identity with the vertebrate sequences, reflecting evolutionary distance, the data show apparent origin of CYP11 prior to the evolution of CYP19 and possibly CYP17, thus indicating a sequential origin of these functionally related steroidogenic CYPs. Co-occurrence of the three CYPs in early chordates suggests that the three genes may have coevolved thereafter, and that functional conservation should be reflected in functionally important residues in the proteins. CYP19A1 has the largest number of conserved residues while CYP11A1 sequences are less conserved. Structural analyses of human CYP11A1, CYP17A1 and CYP19A1 show that critical substrate binding site residues are highly conserved in each enzyme family. The results emphasize that the steroidogenic pathways producing glucocorticoids and reproductive steroids are several hundred million years old and that the catalytic structural elements of the enzymes have been conserved over the same period of time. Analysis of these elements may help to identify when precursor functions linked to these enzymes first arose. Copyright © 2015 Elsevier Inc. All rights reserved.
Distribution and cluster analysis of predicted intrinsically disordered protein Pfam domains

PubMed Central

Williams, Robert W; Xue, Bin; Uversky, Vladimir N; Dunker, A Keith

2013-01-01

The Pfam database groups regions of proteins by how well hidden Markov models (HMMs) can be trained to recognize similarities among them. Conservation pressure is probably in play here. The Pfam seed training set includes sequence and structure information, being drawn largely from the PDB. A long standing hypothesis among intrinsically disordered protein (IDP) investigators has held that conservation pressures are also at play in the evolution of different kinds of intrinsic disorder, but we find that predicted intrinsic disorder (PID) is not always conserved across Pfam domains. Here we analyze distributions and clusters of PID regions in 193024 members of the version 23.0 Pfam seed database. To include the maximum information available for proteins that remain unfolded in solution, we employ the 10 linearly independent Kidera factors1–3 for the amino acids, combined with PONDR4 predictions of disorder tendency, to transform the sequences of these Pfam members into an 11 column matrix where the number of rows is the length of each Pfam region. Cluster analyses of the set of all regions, including those that are folded, show 6 groupings of domains. Cluster analyses of domains with mean VSL2b scores greater than 0.5 (half predicted disorder or more) show at least 3 separated groups. It is hypothesized that grouping sets into shorter sequences with more uniform length will reveal more information about intrinsic disorder and lead to more finely structured and perhaps more accurate predictions. HMMs could be trained to include this information. PMID:28516017

Towards fully automated structure-based function prediction in structural genomics: a case study.

PubMed

Watson, James D; Sanderson, Steve; Ezersky, Alexandra; Savchenko, Alexei; Edwards, Aled; Orengo, Christine; Joachimiak, Andrzej; Laskowski, Roman A; Thornton, Janet M

2007-04-13

As the global Structural Genomics projects have picked up pace, the number of structures annotated in the Protein Data Bank as hypothetical protein or unknown function has grown significantly. A major challenge now involves the development of computational methods to assign functions to these proteins accurately and automatically. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI), we review the success of the pipeline and the importance of structure-based function prediction. As a dataset, we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity that is difficult to identify using current sequence-based methods. No one method is successful in all cases, so, through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chances that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology (GO) schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment.
Sequence-structure mapping errors in the PDB: OB-fold domains

PubMed Central

Venclovas, Česlovas; Ginalski, Krzysztof; Kang, Chulhee

2004-01-01

The Protein Data Bank (PDB) is the single most important repository of structural data for proteins and other biologically relevant molecules. Therefore, it is critically important to keep the PDB data, as much as possible, error-free. In this study, we have analyzed PDB crystal structures possessing oligonucleotide/oligosaccharide binding (OB)-fold, one of the highly populated folds, for the presence of sequence-structure mapping errors. Using energy-based structure quality assessment coupled with sequence analyses, we have found that there are at least five OB-structures in the PDB that have regions where sequences have been incorrectly mapped onto the structure. We have demonstrated that the combination of these computation techniques is effective not only in detecting sequence-structure mapping errors, but also in providing guidance to correct them. Namely, we have used results of computational analysis to direct a revision of X-ray data for one of the PDB entries containing a fairly inconspicuous sequence-structure mapping error. The revised structure has been deposited with the PDB. We suggest use of computational energy assessment and sequence analysis techniques to facilitate structure determination when homologs having known structure are available to use as a reference. Such computational analysis may be useful in either guiding the sequence-structure assignment process or verifying the sequence mapping within poorly defined regions. PMID:15133161
Studies of a biochemical factory: tomato trichome deep expressed sequence tag sequencing and proteomics.

PubMed

Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L

2010-07-01

Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.
Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

PubMed Central

Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

2010-01-01

Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087
Order within disorder: Aggrecan chondroitin sulphate-attachment region provides new structural insights into protein sequences classified as disordered

PubMed Central

Jowitt, Thomas A; Murdoch, Alan D; Baldock, Clair; Berry, Richard; Day, Joanna M; Hardingham, Timothy E

2010-01-01

Structural investigation of proteins containing large stretches of sequences without predicted secondary structure is the focus of much increased attention. Here, we have produced an unglycosylated 30 kDa peptide from the chondroitin sulphate (CS)-attachment region of human aggrecan (CS-peptide), which was predicted to be intrinsically disordered and compared its structure with the adjacent aggrecan G3 domain. Biophysical analyses, including analytical ultracentrifugation, light scattering, and circular dichroism showed that the CS-peptide had an elongated and stiffened conformation in contrast to the globular G3 domain. The results suggested that it contained significant secondary structure, which was sensitive to urea, and we propose that the CS-peptide forms an elongated wormlike molecule based on a dynamic range of energetically equivalent secondary structures stabilized by hydrogen bonds. The dimensions of the structure predicted from small-angle X-ray scattering analysis were compatible with EM images of fully glycosylated aggrecan and a partly glycosylated aggrecan CS2-G3 construct. The semiordered structure identified in CS-peptide was not predicted by common structural algorithms and identified a potentially distinct class of semiordered structure within sequences currently identified as disordered. Sequence comparisons suggested some evidence for comparable structures in proteins encoded by other genes (PRG4, MUC5B, and CBP). The function of these semiordered sequences may serve to spatially position attached folded modules and/or to present polypeptides for modification, such as glycosylation, and to provide templates for the multiple pleiotropic interactions proposed for disordered proteins. Proteins 2010. © 2010 Wiley-Liss, Inc. PMID:20806220
The Interaction Properties of the Human Rab GTPase Family – A Comparative Analysis Reveals Determinants of Molecular Binding Selectivity

PubMed Central

Stein, Matthias; Pilli, Manohar; Bernauer, Sabine; Habermann, Bianca H.; Zerial, Marino; Wade, Rebecca C.

2012-01-01

Background Rab GTPases constitute the largest subfamily of the Ras protein superfamily. Rab proteins regulate organelle biogenesis and transport, and display distinct binding preferences for effector and activator proteins, many of which have not been elucidated yet. The underlying molecular recognition motifs, binding partner preferences and selectivities are not well understood. Methodology/Principal Findings Comparative analysis of the amino acid sequences and the three-dimensional electrostatic and hydrophobic molecular interaction fields of 62 human Rab proteins revealed a wide range of binding properties with large differences between some Rab proteins. This analysis assists the functional annotation of Rab proteins 12, 14, 26, 37 and 41 and provided an explanation for the shared function of Rab3 and 27. Rab7a and 7b have very different electrostatic potentials, indicating that they may bind to different effector proteins and thus, exert different functions. The subfamily V Rab GTPases which are associated with endosome differ subtly in the interaction properties of their switch regions, and this may explain exchange factor specificity and exchange kinetics. Conclusions/Significance We have analysed conservation of sequence and of molecular interaction fields to cluster and annotate the human Rab proteins. The analysis of three dimensional molecular interaction fields provides detailed insight that is not available from a sequence-based approach alone. Based on our results, we predict novel functions for some Rab proteins and provide insights into their divergent functions and the determinants of their binding partner selectivity. PMID:22523562
Evolutionary Descent of Prion Genes from the ZIP Family of Metal Ion Transporters

PubMed Central

Schmitt-Ulms, Gerold; Ehsani, Sepehr; Watts, Joel C.; Westaway, David; Wille, Holger

2009-01-01

In the more than twenty years since its discovery, both the phylogenetic origin and cellular function of the prion protein (PrP) have remained enigmatic. Insights into a possible function of PrP may be obtained through the characterization of its molecular neighborhood in cells. Quantitative interactome data demonstrated the spatial proximity of two metal ion transporters of the ZIP family, ZIP6 and ZIP10, to mammalian prion proteins in vivo. A subsequent bioinformatic analysis revealed the unexpected presence of a PrP-like amino acid sequence within the N-terminal, extracellular domain of a distinct sub-branch of the ZIP protein family that includes ZIP5, ZIP6 and ZIP10. Additional structural threading and orthologous sequence alignment analyses argued that the prion gene family is phylogenetically derived from a ZIP-like ancestral molecule. The level of sequence homology and the presence of prion protein genes in most chordate species place the split from the ZIP-like ancestor gene at the base of the chordate lineage. This relationship explains structural and functional features found within mammalian prion proteins as elements of an ancient involvement in the transmembrane transport of divalent cations. The phylogenetic and spatial connection to ZIP proteins is expected to open new avenues of research to elucidate the biology of the prion protein in health and disease. PMID:19784368
GASP: Gapped Ancestral Sequence Prediction for proteins

PubMed Central

Edwards, Richard J; Shields, Denis C

2004-01-01

Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike. PMID:15350199
The complete mitochondrial genome of Koerneria sudhausi (Diplogasteromorpha: Nematoda) supports monophyly of Diplogasteromorpha within Rhabditomorpha.

PubMed

Kim, Taeho; Kim, Jiyeon; Nadler, Steven A; Park, Joong-Ki

2016-05-01

Testing hypotheses of monophyly for different nematode groups in the context of broad representation of nematode diversity is central to understanding the patterns and processes of nematode evolution. Herein sequence information from mitochondrial genomes is used to test the monophyly of diplogasterids, which includes an important nematode model organism. The complete mitochondrial genome sequence of Koerneria sudhausi, a representative of Diplogasteromorpha, was determined and used for phylogenetic analyses along with 60 other nematode species. The mtDNA of K. sudhausi is comprised of 16,005 bp that includes 36 genes (12 protein-coding genes, 2 ribosomal RNA genes and 22 transfer RNA genes) encoded in the same direction. Phylogenetic trees inferred from amino acid and nucleotide sequence data for the 12 protein-coding genes strongly supported the sister relationship of K. sudhausi with Pristionchus pacificus, supporting Diplogasteromorpha. The gene order of K. sudhausi is identical to that most commonly found in members of the Rhabditomorpha + Ascaridomorpha + Diplogasteromorpha clade, with an exception of some tRNA translocations. Both the gene order pattern and sequence-based phylogenetic analyses support a close relationship between the diplogasterid species and Rhabditomorpha. The nesting of the two diplogasteromorph species within Rhabditomorpha is consistent with most molecular phylogenies for the group, but inconsistent with certain morphology-based hypotheses that asserted phylogenetic affinity between diplogasteromorphs and tylenchomorphs. Phylogenetic analysis of mitochondrial genome sequences strongly supports monophyly of the diplogasteromorpha.
The complete chloroplast genome of Gentiana straminea (Gentianaceae), an endemic species to the Sino-Himalayan subregion.

PubMed

Ni, Lianghong; Zhao, Zhili; Xu, Hongxi; Chen, Shilin; Dorje, Gaawe

2016-02-15

Endemic to the Sino-Himalayan subregion, the medicinal alpine plant Gentiana straminea is a threatened species. The genetic and molecular data about it is deficient. Here we report the complete chloroplast (cp) genome sequence of G. straminea, as the first sequenced member of the family Gentianaceae. The cp genome is 148,991bp in length, including a large single copy (LSC) region of 81,240bp, a small single copy (SSC) region of 17,085bp and a pair of inverted repeats (IRs) of 25,333bp. It contains 112 unique genes, including 78 protein-coding genes, 30 tRNAs and 4 rRNAs. The rps16 gene lacks exon2 between trnK-UUU and trnQ-UUG, which is the first rps16 pseudogene found in the nonparasitic plants of Asterids clade. Sequence analysis revealed the presence of 13 forward repeats, 13 palindrome repeats and 39 simple sequence repeats (SSRs). An entire cp genome comparison study of G. straminea and four other species in Gentianales was carried out. Phylogenetic analyses using maximum likelihood (ML) and maximum parsimony (MP) were performed based on 69 protein-coding genes from 36 species of Asterids. The results strongly supported the position of Gentianaceae as one member of the order Gentianales. The complete chloroplast genome sequence will provide intragenic information for its conservation and contribute to research on the genetic and phylogenetic analyses of Gentianales and Asterids. Copyright © 2015 Elsevier B.V. All rights reserved.
Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks.

PubMed

Li, Hongdong; Zhang, Yang; Guan, Yuanfang; Menon, Rajasree; Omenn, Gilbert S

2017-01-01

Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies.
Lessons on RNA Silencing Mechanisms in Plants from Eukaryotic Argonaute Structures[W

PubMed Central

Poulsen, Christian; Vaucheret, Hervé; Brodersen, Peter

2013-01-01

RNA silencing refers to a collection of gene regulatory mechanisms that use small RNAs for sequence specific repression. These mechanisms rely on ARGONAUTE (AGO) proteins that directly bind small RNAs and thereby constitute the central component of the RNA-induced silencing complex (RISC). AGO protein function has been probed extensively by mutational analyses, particularly in plants where large allelic series of several AGO proteins have been isolated. Structures of entire human and yeast AGO proteins have only very recently been obtained, and they allow more precise analyses of functional consequences of mutations obtained by forward genetics. To a large extent, these analyses support current models of regions of particular functional importance of AGO proteins. Interestingly, they also identify previously unrecognized parts of AGO proteins with profound structural and functional importance and provide the first hints at structural elements that have important functions specific to individual AGO family members. A particularly important outcome of the analysis concerns the evidence for existence of Gly-Trp (GW) repeat interactors of AGO proteins acting in the plant microRNA pathway. The parallel analysis of AGO structures and plant AGO mutations also suggests that such interactions with GW proteins may be a determinant of whether an endonucleolytically competent RISC is formed. PMID:23303917
Lessons on RNA silencing mechanisms in plants from eukaryotic argonaute structures.

PubMed

Poulsen, Christian; Vaucheret, Hervé; Brodersen, Peter

2013-01-01

RNA silencing refers to a collection of gene regulatory mechanisms that use small RNAs for sequence specific repression. These mechanisms rely on ARGONAUTE (AGO) proteins that directly bind small RNAs and thereby constitute the central component of the RNA-induced silencing complex (RISC). AGO protein function has been probed extensively by mutational analyses, particularly in plants where large allelic series of several AGO proteins have been isolated. Structures of entire human and yeast AGO proteins have only very recently been obtained, and they allow more precise analyses of functional consequences of mutations obtained by forward genetics. To a large extent, these analyses support current models of regions of particular functional importance of AGO proteins. Interestingly, they also identify previously unrecognized parts of AGO proteins with profound structural and functional importance and provide the first hints at structural elements that have important functions specific to individual AGO family members. A particularly important outcome of the analysis concerns the evidence for existence of Gly-Trp (GW) repeat interactors of AGO proteins acting in the plant microRNA pathway. The parallel analysis of AGO structures and plant AGO mutations also suggests that such interactions with GW proteins may be a determinant of whether an endonucleolytically competent RISC is formed.
Targeted sequencing for high-resolution evolutionary analyses following genome duplication in salmonid fish: Proof of concept for key components of the insulin-like growth factor axis.

PubMed

Lappin, Fiona M; Shaw, Rebecca L; Macqueen, Daniel J

2016-12-01

High-throughput sequencing has revolutionised comparative and evolutionary genome biology. It has now become relatively commonplace to generate multiple genomes and/or transcriptomes to characterize the evolution of large taxonomic groups of interest. Nevertheless, such efforts may be unsuited to some research questions or remain beyond the scope of some research groups. Here we show that targeted high-throughput sequencing offers a viable alternative to study genome evolution across a vertebrate family of great scientific interest. Specifically, we exploited sequence capture and Illumina sequencing to characterize the evolution of key components from the insulin-like growth (IGF) signalling axis of salmonid fish at unprecedented phylogenetic resolution. The IGF axis represents a central governor of vertebrate growth and its core components were expanded by whole genome duplication in the salmonid ancestor ~95Ma. Using RNA baits synthesised to genes encoding the complete family of IGF binding proteins (IGFBP) and an IGF hormone (IGF2), we captured, sequenced and assembled orthologous and paralogous exons from species representing all ten salmonid genera. This approach generated 299 novel sequences, most as complete or near-complete protein-coding sequences. Phylogenetic analyses confirmed congruent evolutionary histories for all nineteen recognized salmonid IGFBP family members and identified novel salmonid-specific IGF2 paralogues. Moreover, we reconstructed the evolution of duplicated IGF axis paralogues across a replete salmonid phylogeny, revealing complex historic selection regimes - both ancestral to salmonids and lineage-restricted - that frequently involved asymmetric paralogue divergence under positive and/or relaxed purifying selection. Our findings add to an emerging literature highlighting diverse applications for targeted sequencing in comparative-evolutionary genomics. We also set out a viable approach to obtain large sets of nuclear genes for any member of the salmonid family, which should enable insights into the evolutionary role of whole genome duplication before additional nuclear genome sequences become available. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Detection and sequence/structure mapping of biophysical constraints to protein variation in saturated mutational libraries and protein sequence alignments with a dedicated server.

PubMed

Abriata, Luciano A; Bovigny, Christophe; Dal Peraro, Matteo

2016-06-17

Protein variability can now be studied by measuring high-resolution tolerance-to-substitution maps and fitness landscapes in saturated mutational libraries. But these rich and expensive datasets are typically interpreted coarsely, restricting detailed analyses to positions of extremely high or low variability or dubbed important beforehand based on existing knowledge about active sites, interaction surfaces, (de)stabilizing mutations, etc. Our new webserver PsychoProt (freely available without registration at http://psychoprot.epfl.ch or at http://lucianoabriata.altervista.org/psychoprot/index.html ) helps to detect, quantify, and sequence/structure map the biophysical and biochemical traits that shape amino acid preferences throughout a protein as determined by deep-sequencing of saturated mutational libraries or from large alignments of naturally occurring variants. We exemplify how PsychoProt helps to (i) unveil protein structure-function relationships from experiments and from alignments that are consistent with structures according to coevolution analysis, (ii) recall global information about structural and functional features and identify hitherto unknown constraints to variation in alignments, and (iii) point at different sources of variation among related experimental datasets or between experimental and alignment-based data. Remarkably, metabolic costs of the amino acids pose strong constraints to variability at protein surfaces in nature but not in the laboratory. This and other differences call for caution when extrapolating results from in vitro experiments to natural scenarios in, for example, studies of protein evolution. We show through examples how PsychoProt can be a useful tool for the broad communities of structural biology and molecular evolution, particularly for studies about protein modeling, evolution and design.
Genome analyses of the sunflower pathogen Plasmopara halstedii provide insights into effector evolution in downy mildews and Phytophthora.

PubMed

Sharma, Rahul; Xia, Xiaojuan; Cano, Liliana M; Evangelisti, Edouard; Kemen, Eric; Judelson, Howard; Oome, Stan; Sambles, Christine; van den Hoogen, D Johan; Kitner, Miloslav; Klein, Joël; Meijer, Harold J G; Spring, Otmar; Win, Joe; Zipper, Reinhard; Bode, Helge B; Govers, Francine; Kamoun, Sophien; Schornack, Sebastian; Studholme, David J; Van den Ackerveken, Guido; Thines, Marco

2015-10-05

Downy mildews are the most speciose group of oomycetes and affect crops of great economic importance. So far, there is only a single deeply-sequenced downy mildew genome available, from Hyaloperonospora arabidopsidis. Further genomic resources for downy mildews are required to study their evolution, including pathogenicity effector proteins, such as RxLR effectors. Plasmopara halstedii is a devastating pathogen of sunflower and a potential pathosystem model to study downy mildews, as several Avr-genes and R-genes have been predicted and unlike Arabidopsis downy mildew, large quantities of almost contamination-free material can be obtained easily. Here a high-quality draft genome of Plasmopara halstedii is reported and analysed with respect to various aspects, including genome organisation, secondary metabolism, effector proteins and comparative genomics with other sequenced oomycetes. Interestingly, the present analyses revealed further variation of the RxLR motif, suggesting an important role of the conservation of the dEER-motif. Orthology analyses revealed the conservation of 28 RxLR-like core effectors among Phytophthora species. Only six putative RxLR-like effectors were shared by the two sequenced downy mildews, highlighting the fast and largely independent evolution of two of the three major downy mildew lineages. This is seemingly supported by phylogenomic results, in which downy mildews did not appear to be monophyletic. The genome resource will be useful for developing markers for monitoring the pathogen population and might provide the basis for new approaches to fight Phytophthora and downy mildew pathogens by targeting core pathogenicity effectors.
Three reasons protein disorder analysis makes more sense in the light of collagen

PubMed Central

Oates, Matt E.; Tompa, Peter; Gough, Julian

2016-01-01

Abstract We have identified that the collagen helix has the potential to be disruptive to analyses of intrinsically disordered proteins. The collagen helix is an extended fibrous structure that is both promiscuous and repetitive. Whilst its sequence is predicted to be disordered, this type of protein structure is not typically considered as intrinsic disorder. Here, we show that collagen‐encoding proteins skew the distribution of exon lengths in genes. We find that previous results, demonstrating that exons encoding disordered regions are more likely to be symmetric, are due to the abundance of the collagen helix. Other related results, showing increased levels of alternative splicing in disorder‐encoding exons, still hold after considering collagen‐containing proteins. Aside from analyses of exons, we find that the set of proteins that contain collagen significantly alters the amino acid composition of regions predicted as disordered. We conclude that research in this area should be conducted in the light of the collagen helix. PMID:26941008
Analysis of the complete nucleotide sequence and functional organization of the genome of Streptococcus pneumoniae bacteriophage Cp-1.

PubMed

Martín, A C; López, R; García, P

1996-06-01

Cp-1, a bacteriophage infecting Streptococcus pneumoniae, has a linear double-stranded DNA genome, with a terminal protein covalently linked to its 5' ends, that replicates by the protein-priming mechanism. We describe here the complete DNA sequence and transcriptional map of the Cp-1 genome. These analyses have led to the firm assignment of 10 genes and the localization of 19 additional open reading frames in the 19,345-bp Cp-1 DNA. Striking similarities and differences between some of these proteins and those of the Bacillus subtilis phage phi 29, a system that also replicates its DNA by the protein-priming mechanism, have been revealed. The genes coding for structural proteins and assembly factors are located in the central part of the Cp-1 genome. Several proteins corresponding to the predicted gene products were identified by in vitro and in vivo expression of the cloned genes. Mature major head protein from the virion particles results from hydrolysis of the primary gene product at the His-49 residue, whereas the phage gene is expressed in Escherichia coli without modification. We have also identified two open reading frames coding for proteins that show high degrees of similarity to the N- and C-terminal regions, respectively, of the single tail protein identified in phi 29. Sequencing and primer extension analysis suggest transcription of a small RNA showing a secondary structure similar to that of the prohead RNA required for the ATP-dependent packaging of phi 29 DNA. On the basis of its temporal expression, transcription of the Cp-1 genome takes place in two stages, early and late. Combined Northern (RNA) blot and primer extension experiments allowed us to map the 5' initiation sites of the transcripts, and we found that only three genes were transcribed from right to left. These analyses reveal that there are also noticeable differences between Cp-l and phi 29 in transcriptional organization. Considered together, the observations reported here provide new tangible evidence on phylogenetic relationships between B. subtilis and S. pneumoniae.
Nucleotide sequence analysis of the 3' terminal region of a wasabi strain of crucifer tobamovirus genomic RNA: subgrouping of crucifer tobamoviruses.

PubMed

Shimamoto, I; Sonoda, S; Vazquez, P; Minaka, N; Nishiguchi, M

1998-01-01

The 3' terminal 2378 nucleotides of a wasabi strain of crucifer tobamovirus (CTMV-W) infectious to crucifer plants was determined. This includes the 3' non-coding region of 235 nucleotides, coat protein (CP) gene (468 nucleotides), movement protein (MP) gene (798 nucleotides) and C-terminal partial readthrough portion of 180 K protein gene (940 nucleotides). Comparison of the sequence with homologous regions of thirteen other tobamovirus genomes showed that it had much higher identity to those of four other crucifer tobamoviruses, 85.2% to cr-TMV and turnip vein-clearing virus (TVCV), 87.4% to oilseed rape mosaic virus (ORMV) and 87.1% to TMV-Cg, than to those of other tobamoviruses. Thus CTMV-W was most similar to ORMV and TMV-Cg in sequence, but only marginally so, whereas the location and size of its MP gene was the same as cr-TMV amd TVCV. These results, together with other analyses, show that CTMV-W is a new crucifer tobamovirus, that the five crucifer tobamoviruses can be classified into two subgroups based on MP gene organization, and that the rate of sequence change is not the same in all lineages.
Origin and diversification of leucine-rich repeat receptor-like protein kinase (LRR-RLK) genes in plants.

PubMed

Liu, Ping-Li; Du, Liang; Huang, Yuan; Gao, Shu-Min; Yu, Meng

2017-02-07

Leucine-rich repeat receptor-like protein kinases (LRR-RLKs) are the largest group of receptor-like kinases in plants and play crucial roles in development and stress responses. The evolutionary relationships among LRR-RLK genes have been investigated in flowering plants; however, no comprehensive studies have been performed for these genes in more ancestral groups. The subfamily classification of LRR-RLK genes in plants, the evolutionary history and driving force for the evolution of each LRR-RLK subfamily remain to be understood. We identified 119 LRR-RLK genes in the Physcomitrella patens moss genome, 67 LRR-RLK genes in the Selaginella moellendorffii lycophyte genome, and no LRR-RLK genes in five green algae genomes. Furthermore, these LRR-RLK sequences, along with previously reported LRR-RLK sequences from Arabidopsis thaliana and Oryza sativa, were subjected to evolutionary analyses. Phylogenetic analyses revealed that plant LRR-RLKs belong to 19 subfamilies, eighteen of which were established in early land plants, and one of which evolved in flowering plants. More importantly, we found that the basic structures of LRR-RLK genes for most subfamilies are established in early land plants and conserved within subfamilies and across different plant lineages, but divergent among subfamilies. In addition, most members of the same subfamily had common protein motif compositions, whereas members of different subfamilies showed variations in protein motif compositions. The unique gene structure and protein motif compositions of each subfamily differentiate the subfamily classifications and, more importantly, provide evidence for functional divergence among LRR-RLK subfamilies. Maximum likelihood analyses showed that some sites within four subfamilies were under positive selection. Much of the diversity of plant LRR-RLK genes was established in early land plants. Positive selection contributed to the evolution of a few LRR-RLK subfamilies.

Towards proteomic analysis of milk proteins in historical building materials

NASA Astrophysics Data System (ADS)

Kuckova, S.; Crhova, M.; Vankova, L.; Hnizda, A.; Hynek, R.; Kodicek, M.

2009-07-01

The addition of proteinaceous binders to mortars and plasters has a long tradition. The protein additions were identified in many sacral and secular historical buildings. For this method of peptide mass mapping, three model mortar samples with protein additives were prepared. These samples were analysed fresh (1-2 weeks old) and after 9 months of natural ageing. The optimal duration of tryptic cleavage (2 h) and the lowest amount of material needed for relevant analysis of fresh and weathered samples were found; the sufficient amounts of weathered and fresh mortars were set to 0.05 and 0.005 g. The list of main tryptic peptides coming from milk additives (bovine milk, curd, and whey), their relative intensities and theoretical amino acid sequences assignment is presented. Several sequences have been "de novo" confirmed by mass spectrometry.
[Convergent origin of repeats in genes coding for globular proteins. An analysis of the factors determining the presence of inverted and symmetrical repeats].

PubMed

Solov'ev, V V; Kel', A E; Kolchanov, N A

1989-01-01

The factors, determining the presence of inverted and symmetrical repeats in genes coding for globular proteins, have been analysed. An interesting property of genetical code has been revealed in the analysis of symmetrical repeats: the pairs of symmetrical codons corresponded to pairs of amino acids with mostly similar physical-chemical parameters. This property may explain the presence of symmetrical repeats and palindromes only in genes coding for beta-structural proteins-polypeptides, where amino acids with similar physical-chemical properties occupy symmetrical positions. A stochastic model of evolution of polynucleotide sequences has been used for analysis of inverted repeats. The modelling demonstrated that only limiting of sequences (uneven frequencies of used codons) is enough for arising of nonrandom inverted repeats in genes.
alpha-Crystallin A sequences of Alligator mississippiensis and the lizard Tupinambis teguixin: molecular evolution and reptilian phylogeny.

PubMed

de Jong, W W; Zweers, A; Versteeg, M; Dessauer, H C; Goodman, M

1985-11-01

The amino acid sequences of the eye lens protein alpha-crystallin A from many mammalian and avian species, two frog species, and a dogfish have provided detailed information about the molecular evolution of this protein and allowed some useful inferences about phylogenetic relationships among these species. We now have isolated and sequenced the alpha-crystallins of the American alligator and the common tegu lizard. The reptilian alpha A chains appear to have evolved as slowly as those of other vertebrates, i.e., at two to three amino acid replacements per 100 residues in 100 Myr. The lack of charged replacements and the general types and distribution of replacements also are similar to those in other vertebrate alpha A chains. Maximum-parsimony analyses of the total data set of 67 vertebrate alpha A sequences support the monophyletic origin of alligator, tegu, and birds and favor the grouping of crocodilians and birds as surviving sister groups in the subclass Archosauria.
Identification of Two Novel Amalgaviruses in the Common Eelgrass (Zostera marina) and in Silico Analysis of the Amalgavirus +1 Programmed Ribosomal Frameshifting Sites.

PubMed

Park, Dongbin; Goh, Chul Jun; Kim, Hyein; Hahn, Yoonsoo

2018-04-01

The genome sequences of two novel monopartite RNA viruses were identified in a common eelgrass ( Zostera marina ) transcriptome dataset. Sequence comparison and phylogenetic analyses revealed that these two novel viruses belong to the genus Amalgavirus in the family Amalgaviridae . They were named Zostera marina amalgavirus 1 (ZmAV1) and Zostera marina amalgavirus 2 (ZmAV2). Genomes of both ZmAV1 and ZmAV2 contain two overlapping open reading frames (ORFs). ORF1 encodes a putative replication factory matrix-like protein, while ORF2 encodes a RNA-dependent RNA polymerase (RdRp) domain. The fusion protein (ORF1+2) of ORF1 and ORF2, which mediates RNA replication, was produced using the +1 programmed ribosomal frameshifting (PRF) mechanism. The +1 PRF motif sequence, UUU_CGN, which is highly conserved among known amalgaviruses, was also found in ZmAV1 and ZmAV2. Multiple sequence alignment of the ORF1+2 fusion proteins from 24 amalgaviruses revealed that +1 PRF occurred only at three different positions within the 13-amino acid-long segment, which was surrounded by highly conserved regions on both sides. This suggested that the +1 PRF may be constrained by the structure of fusion proteins. Genome sequences of ZmAV1 and ZmAV2, which are the first viruses to be identified in common eelgrass, will serve as useful resources for studying evolution and diversity of amalgaviruses.
Identification of Two Novel Amalgaviruses in the Common Eelgrass (Zostera marina) and in Silico Analysis of the Amalgavirus +1 Programmed Ribosomal Frameshifting Sites

PubMed Central

Park, Dongbin; Goh, Chul Jun; Kim, Hyein; Hahn, Yoonsoo

2018-01-01

The genome sequences of two novel monopartite RNA viruses were identified in a common eelgrass (Zostera marina) transcriptome dataset. Sequence comparison and phylogenetic analyses revealed that these two novel viruses belong to the genus Amalgavirus in the family Amalgaviridae. They were named Zostera marina amalgavirus 1 (ZmAV1) and Zostera marina amalgavirus 2 (ZmAV2). Genomes of both ZmAV1 and ZmAV2 contain two overlapping open reading frames (ORFs). ORF1 encodes a putative replication factory matrix-like protein, while ORF2 encodes a RNA-dependent RNA polymerase (RdRp) domain. The fusion protein (ORF1+2) of ORF1 and ORF2, which mediates RNA replication, was produced using the +1 programmed ribosomal frameshifting (PRF) mechanism. The +1 PRF motif sequence, UUU_CGN, which is highly conserved among known amalgaviruses, was also found in ZmAV1 and ZmAV2. Multiple sequence alignment of the ORF1+2 fusion proteins from 24 amalgaviruses revealed that +1 PRF occurred only at three different positions within the 13-amino acid-long segment, which was surrounded by highly conserved regions on both sides. This suggested that the +1 PRF may be constrained by the structure of fusion proteins. Genome sequences of ZmAV1 and ZmAV2, which are the first viruses to be identified in common eelgrass, will serve as useful resources for studying evolution and diversity of amalgaviruses. PMID:29628822
Deciphering the shape and deformation of secondary structures through local conformation analysis

PubMed Central

2011-01-01

Background Protein deformation has been extensively analysed through global methods based on RMSD, torsion angles and Principal Components Analysis calculations. Here we use a local approach, able to distinguish among the different backbone conformations within loops, α-helices and β-strands, to address the question of secondary structures' shape variation within proteins and deformation at interface upon complexation. Results Using a structural alphabet, we translated the 3 D structures of large sets of protein-protein complexes into sequences of structural letters. The shape of the secondary structures can be assessed by the structural letters that modeled them in the structural sequences. The distribution analysis of the structural letters in the three protein compartments (surface, core and interface) reveals that secondary structures tend to adopt preferential conformations that differ among the compartments. The local description of secondary structures highlights that curved conformations are preferred on the surface while straight ones are preferred in the core. Interfaces display a mixture of local conformations either preferred in core or surface. The analysis of the structural letters transition occurring between protein-bound and unbound conformations shows that the deformation of secondary structure is tightly linked to the compartment preference of the local conformations. Conclusion The conformation of secondary structures can be further analysed and detailed thanks to a structural alphabet which allows a better description of protein surface, core and interface in terms of secondary structures' shape and deformation. Induced-fit modification tendencies described here should be valuable information to identify and characterize regions under strong structural constraints for functional reasons. PMID:21284872
Deciphering the shape and deformation of secondary structures through local conformation analysis.

PubMed

Baussand, Julie; Camproux, Anne-Claude

2011-02-01

Protein deformation has been extensively analysed through global methods based on RMSD, torsion angles and Principal Components Analysis calculations. Here we use a local approach, able to distinguish among the different backbone conformations within loops, α-helices and β-strands, to address the question of secondary structures' shape variation within proteins and deformation at interface upon complexation. Using a structural alphabet, we translated the 3 D structures of large sets of protein-protein complexes into sequences of structural letters. The shape of the secondary structures can be assessed by the structural letters that modeled them in the structural sequences. The distribution analysis of the structural letters in the three protein compartments (surface, core and interface) reveals that secondary structures tend to adopt preferential conformations that differ among the compartments. The local description of secondary structures highlights that curved conformations are preferred on the surface while straight ones are preferred in the core. Interfaces display a mixture of local conformations either preferred in core or surface. The analysis of the structural letters transition occurring between protein-bound and unbound conformations shows that the deformation of secondary structure is tightly linked to the compartment preference of the local conformations. The conformation of secondary structures can be further analysed and detailed thanks to a structural alphabet which allows a better description of protein surface, core and interface in terms of secondary structures' shape and deformation. Induced-fit modification tendencies described here should be valuable information to identify and characterize regions under strong structural constraints for functional reasons.
Friend and Moloney murine leukemia viruses specifically recombine with different endogenous retroviral sequences to generate mink cell focus-forming viruses.

PubMed

Evans, L H; Cloyd, M W

1985-01-01

A group of mink cell focus-forming (MCF) viruses was derived by inoculation of NFS/N mice with Moloney murine leukemia virus (Mo-MuLV 1387) and was compared to a similarly derived group of MCF viruses from mice inoculated with Friend MuLV (Fr-MuLV 57). Antigenic analyses using monoclonal antibodies specific for MCF virus and xenotropic MuLV envelope proteins and genomic structural analyses by RNase T1-resistant oligonucleotide finger-printing indicated that the Moloney and Friend MCF viruses arose by recombination of the respective ecotropic MuLVs with different endogenous retrovirus sequences of NFS mice.
Systematic analysis of protein identity between Zika virus and other arthropod-borne viruses.

PubMed

Chang, Hsiao-Han; Huber, Roland G; Bond, Peter J; Grad, Yonatan H; Camerini, David; Maurer-Stroh, Sebastian; Lipsitch, Marc

2017-07-01

To analyse the proportions of protein identity between Zika virus and dengue, Japanese encephalitis, yellow fever, West Nile and chikungunya viruses as well as polymorphism between different Zika virus strains. We used published protein sequences for the Zika virus and obtained protein sequences for the other viruses from the National Center for Biotechnology Information (NCBI) protein database or the NCBI virus variation resource. We used BLASTP to find regions of identity between viruses. We quantified the identity between the Zika virus and each of the other viruses, as well as within-Zika virus polymorphism for all amino acid k -mers across the proteome, with k ranging from 6 to 100. We assessed accessibility of protein fragments by calculating the solvent accessible surface area for the envelope and nonstructural-1 (NS1) proteins. In total, we identified 294 Zika virus protein fragments with both low proportion of identity with other viruses and low levels of polymorphisms among Zika virus strains. The list includes protein fragments from all Zika virus proteins, except NS3. NS4A has the highest number (190 k -mers) of protein fragments on the list. We provide a candidate list of protein fragments that could be used when developing a sensitive and specific serological test to detect previous Zika virus infections.
Sequencing and comparative analyses of the genomes of zoysiagrasses

PubMed Central

Tanaka, Hidenori; Hirakawa, Hideki; Kosugi, Shunichi; Nakayama, Shinobu; Ono, Akiko; Watanabe, Akiko; Hashiguchi, Masatsugu; Gondo, Takahiro; Ishigaki, Genki; Muguerza, Melody; Shimizu, Katsuya; Sawamura, Noriko; Inoue, Takayasu; Shigeki, Yuichi; Ohno, Naoki; Tabata, Satoshi; Akashi, Ryo; Sato, Shusei

2016-01-01

Zoysia is a warm-season turfgrass, which comprises 11 allotetraploid species (2n = 4x = 40), each possessing different morphological and physiological traits. To characterize the genetic systems of Zoysia plants and to analyse their structural and functional differences in individual species and accessions, we sequenced the genomes of Zoysia species using HiSeq and MiSeq platforms. As a reference sequence of Zoysia species, we generated a high-quality draft sequence of the genome of Z. japonica accession ‘Nagirizaki’ (334 Mb) in which 59,271 protein-coding genes were predicted. In parallel, draft genome sequences of Z. matrella ‘Wakaba’ and Z. pacifica ‘Zanpa’ were also generated for comparative analyses. To investigate the genetic diversity among the Zoysia species, genome sequence reads of three additional accessions, Z. japonica ‘Kyoto’, Z. japonica ‘Miyagi’ and Z. matrella ‘Chiba Fair Green’, were accumulated, and aligned against the reference genome of ‘Nagirizaki’ along with those from ‘Wakaba’ and ‘Zanpa’. As a result, we detected 7,424,163 single-nucleotide polymorphisms and 852,488 short indels among these species. The information obtained in this study will be valuable for basic studies on zoysiagrass evolution and genetics as well as for the breeding of zoysiagrasses, and is made available in the ‘Zoysia Genome Database’ at http://zoysia.kazusa.or.jp. PMID:26975196
Sequencing and comparative analyses of the genomes of zoysiagrasses.

PubMed

Tanaka, Hidenori; Hirakawa, Hideki; Kosugi, Shunichi; Nakayama, Shinobu; Ono, Akiko; Watanabe, Akiko; Hashiguchi, Masatsugu; Gondo, Takahiro; Ishigaki, Genki; Muguerza, Melody; Shimizu, Katsuya; Sawamura, Noriko; Inoue, Takayasu; Shigeki, Yuichi; Ohno, Naoki; Tabata, Satoshi; Akashi, Ryo; Sato, Shusei

2016-04-01

Zoysiais a warm-season turfgrass, which comprises 11 allotetraploid species (2n= 4x= 40), each possessing different morphological and physiological traits. To characterize the genetic systems of Zoysia plants and to analyse their structural and functional differences in individual species and accessions, we sequenced the genomes of Zoysia species using HiSeq and MiSeq platforms. As a reference sequence of Zoysia species, we generated a high-quality draft sequence of the genome of Z. japonica accession 'Nagirizaki' (334 Mb) in which 59,271 protein-coding genes were predicted. In parallel, draft genome sequences of Z. matrella 'Wakaba' and Z. pacifica 'Zanpa' were also generated for comparative analyses. To investigate the genetic diversity among the Zoysia species, genome sequence reads of three additional accessions, Z. japonica'Kyoto', Z. japonica'Miyagi' and Z. matrella'Chiba Fair Green', were accumulated, and aligned against the reference genome of 'Nagirizaki' along with those from 'Wakaba' and 'Zanpa'. As a result, we detected 7,424,163 single-nucleotide polymorphisms and 852,488 short indels among these species. The information obtained in this study will be valuable for basic studies on zoysiagrass evolution and genetics as well as for the breeding of zoysiagrasses, and is made available in the 'Zoysia Genome Database' at http://zoysia.kazusa.or.jp. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Comparison of the aggregation of homologous β2-microglobulin variants reveals protein solubility as a key determinant of amyloid formation

PubMed Central

Pashley, Clare L.; Hewitt, Eric W.; Radford, Sheena E.

2016-01-01

The mouse and human β2-microglobulin protein orthologs are 70 % identical in sequence and share 88 % sequence similarity. These proteins are predicted by various algorithms to have similar aggregation and amyloid propensities. However, whilst human β2m (hβ2m) forms amyloid-like fibrils in denaturing conditions (e.g. pH 2.5) in the absence of NaCl, mouse β2m (mβ2m) requires the addition of 0.3 M NaCl to cause fibrillation. Here, the factors which give rise to this difference in amyloid propensity are investigated. We utilise structural and mutational analyses, fibril growth kinetics and solubility measurements under a range of pH and salt conditions, to determine why these two proteins have different amyloid propensities. The results show that, although other factors influence the fibril growth kinetics, a striking difference in the solubility of the proteins is a key determinant of the different amyloidogenicity of hβ2m and mβ2m. The relationship between protein solubility and lag time of amyloid formation is not captured by current aggregation or amyloid prediction algorithms, indicating a need to better understand the role of solubility on the lag time of amyloid formation. The results demonstrate the key contribution of protein solubility in determining amyloid propensity and lag time of amyloid formation, highlighting how small differences in protein sequence can have dramatic effects on amyloid formation. PMID:26780548
Evolutionary distance from human homologs reflects allergenicity of animal food proteins.

PubMed

Jenkins, John A; Breiteneder, Heimo; Mills, E N Clare

2007-12-01

In silico analysis of allergens can identify putative relationships among protein sequence, structure, and allergenic properties. Such systematic analysis reveals that most plant food allergens belong to a restricted number of protein superfamilies, with pollen allergens behaving similarly. We have investigated the structural relationships of animal food allergens and their evolutionary relatedness to human homologs to define how closely a protein must resemble a human counterpart to lose its allergenic potential. Profile-based sequence homology methods were used to classify animal food allergens into Pfam families, and in silico analyses of their evolutionary and structural relationships were performed. Animal food allergens could be classified into 3 main families--tropomyosins, EF-hand proteins, and caseins--along with 14 minor families each composed of 1 to 3 allergens. The evolutionary relationships of each of these allergen superfamilies showed that in general, proteins with a sequence identity to a human homolog above approximately 62% were rarely allergenic. Single substitutions in otherwise highly conserved regions containing IgE epitopes in EF-hand parvalbumins may modulate allergenicity. These data support the premise that certain protein structures are more allergenic than others. Contrasting with plant food allergens, animal allergens, such as the highly conserved tropomyosins, challenge the capability of the human immune system to discriminate between foreign and self-proteins. Such immune responses run close to becoming autoimmune responses. Exploiting the closeness between animal allergens and their human homologs in the development of recombinant allergens for immunotherapy will need to consider the potential for developing unanticipated autoimmune responses.
Protein-Protein Interactions in a Crowded Environment: An Analysis via Cross-Docking Simulations and Evolutionary Information

PubMed Central

Lopes, Anne; Sacquin-Mora, Sophie; Dimitrova, Viktoriya; Laine, Elodie; Ponty, Yann; Carbone, Alessandra

2013-01-01

Large-scale analyses of protein-protein interactions based on coarse-grain molecular docking simulations and binding site predictions resulting from evolutionary sequence analysis, are possible and realizable on hundreds of proteins with variate structures and interfaces. We demonstrated this on the 168 proteins of the Mintseris Benchmark 2.0. On the one hand, we evaluated the quality of the interaction signal and the contribution of docking information compared to evolutionary information showing that the combination of the two improves partner identification. On the other hand, since protein interactions usually occur in crowded environments with several competing partners, we realized a thorough analysis of the interactions of proteins with true partners but also with non-partners to evaluate whether proteins in the environment, competing with the true partner, affect its identification. We found three populations of proteins: strongly competing, never competing, and interacting with different levels of strength. Populations and levels of strength are numerically characterized and provide a signature for the behavior of a protein in the crowded environment. We showed that partner identification, to some extent, does not depend on the competing partners present in the environment, that certain biochemical classes of proteins are intrinsically easier to analyze than others, and that small proteins are not more promiscuous than large ones. Our approach brings to light that the knowledge of the binding site can be used to reduce the high computational cost of docking simulations with no consequence in the quality of the results, demonstrating the possibility to apply coarse-grain docking to datasets made of thousands of proteins. Comparison with all available large-scale analyses aimed to partner predictions is realized. We release the complete decoys set issued by coarse-grain docking simulations of both true and false interacting partners, and their evolutionary sequence analysis leading to binding site predictions. Download site: http://www.lgm.upmc.fr/CCDMintseris/ PMID:24339765
Expression of the histone chaperone SET/TAF-Iβ during the strobilation process of Mesocestoides corti (Platyhelminthes, Cestoda).

PubMed

Costa, Caroline B; Monteiro, Karina M; Teichmann, Aline; da Silva, Edileuza D; Lorenzatto, Karina R; Cancela, Martín; Paes, Jéssica A; Benitz, André de N D; Castillo, Estela; Margis, Rogério; Zaha, Arnaldo; Ferreira, Henrique B

2015-08-01

The histone chaperone SET/TAF-Iβ is implicated in processes of chromatin remodelling and gene expression regulation. It has been associated with the control of developmental processes, but little is known about its function in helminth parasites. In Mesocestoides corti, a partial cDNA sequence related to SET/TAF-Iβ was isolated in a screening for genes differentially expressed in larvae (tetrathyridia) and adult worms. Here, the full-length coding sequence of the M. corti SET/TAF-Iβ gene was analysed and the encoded protein (McSET/TAF) was compared with orthologous sequences, showing that McSET/TAF can be regarded as a SET/TAF-Iβ family member, with a typical nucleosome-assembly protein (NAP) domain and an acidic tail. The expression patterns of the McSET/TAF gene and protein were investigated during the strobilation process by RT-qPCR, using a set of five reference genes, and by immunoblot and immunofluorescence, using monospecific polyclonal antibodies. A gradual increase in McSET/TAF transcripts and McSET/TAF protein was observed upon development induction by trypsin, demonstrating McSET/TAF differential expression during strobilation. These results provided the first evidence for the involvement of a protein from the NAP family of epigenetic effectors in the regulation of cestode development.
Type I allergy to elderberry (Sambucus nigra) is elicited by a 33.2 kDa allergen with significant homology to ribosomal inactivating proteins.

PubMed

Förster-Waldl, E; Marchetti, M; Schöll, I; Focke, M; Radauer, C; Kinaciyan, T; Nentwich, I; Jäger, S; Schmid, E R; Boltz-Nitulescu, G; Scheiner, O; Jensen-Jarolim, E

2003-12-01

Patients suffering from allergic rhinoconjunctivitis and dyspnoea during summer may exhibit these symptoms after contact with flowers or dietary products of the elderberry tree Sambucus nigra. Patients with a history of summer hayfever were tested in a routine setting for sensitization to elderberry. Nine patients having allergic symptoms due to elderberry and specific sensitization were investigated in detail. We studied the responsible allergens in extracts from elderberry pollen, flowers and berries, and investigated cross-reactivity with allergens from birch, grass and mugwort. Sera from patients were tested for IgE reactivity to elderberry proteins by one-dimensional (1D) and 2D electrophoresis/immunoblotting. Inhibition studies with defined allergens and elderberry-specific antibodies were used to evaluate cross-reactivity. The main elderberry allergen was purified by gel filtration and reversed-phase HPLC, and subjected to mass spectrometry. The in-gel-digested allergen was analysed by the MS/MS sequence analysis and peptide mapping. The N-terminal sequence of the predominant allergen was analysed. 0.6% of 3668 randomly tested patients showed positive skin prick test and/or RAST to elderberry. IgE in patients' sera detected a predominant allergen of 33.2 kDa in extracts from elderberry pollen, flowers and berries, with an isoelectric point at pH 7.0. Pre-incubation of sera with extracts from birch, mugwort or grass pollen rendered insignificant or no inhibition of IgE binding to blotted elderberry proteins. Specific mouse antisera reacted exclusively with proteins from elderberry. N-terminal sequence analysis, as well as MS/MS spectrometry of the purified elderberry allergen, indicated homology with ribosomal inactivating proteins (RIPs). We present evidence that the elderberry plant S. nigra harbours allergenic potency. Independent methodologies argue for a significant homology of the predominant 33.2 kDa elderberry allergen with homology to RIPs. We conclude that this protein is a candidate for a major elderberry allergen with designation Sam n 1.
No evidence for a role of modified live virus vaccines in the emergence of canine parvovirus.

PubMed

Truyen, U; Geissler, K; Parrish, C R; Hermanns, W; Siegl, G

1998-05-01

In this study the early evolution and potential origins of canine parvovirus (CPV) were examined. We cloned and sequenced the VP2 capsid protein genes of three German CPV strains isolated in 1979-1980, as well as two feline panleukopenia virus (FPV) vaccine viruses that were previously shown to have some restriction enzyme cleavage sites in common with CPV. Other partial VP2 gene sequences were obtained by amplifying CPV DNA from paraffin-embedded tissues of dogs which were early parvovirus disease cases in Germany in 1978-1979. Sequences were analysed with respect to their evolutionary relationships to other CPV and FPV isolates. Those analyses did not support the hypothesis that CPV emerged as a variant of an FPV vaccine virus. Neither did they reveal ancestral sequences among the very early CPV isolates examined. Other possible sources for the origin of CPV are examined, including the involvement of viruses from wild carnivores.
Complete genomic sequence of a Tobacco rattle virus isolate from Michigan-grown potatoes.

PubMed

Crosslin, James M; Hamm, Philip B; Kirk, William W; Hammond, Rosemarie W

2010-04-01

Tobacco rattle virus (TRV) causes stem mottle on potato leaves and necrotic arcs and rings in potato tubers, known as corky ringspot disease. Recently, TRV was reported in Michigan potato tubers cv. FL1879 exhibiting corky ringspot disease. Sequence analysis of the RNA-1-encoded 16-kDa gene of the Michigan isolate, designated MI-1, revealed homology to TRV isolates from Florida and Washington. Here, we report the complete genomic sequence of RNA-1 (6,791 nt) and RNA-2 (3,685 nt) of TRV MI-1. RNA-1 is predicted to contain four open reading frames, and the genome structure and phylogenetic analyses of the RNA-1 nucleotide sequence revealed significant homologies to the known sequences of other TRV-1 isolates. The relationships based on the full-length nucleotide sequence were different from than those based on the 16-kDa gene encoded on genomic RNA-1 and reflect sequence variation within a 20-25-aa residue region of the 16-kDa protein. MI-1 RNA-2 is predicted to contain three ORFs, encoding the coat protein (CP), a 37.6-kDa protein (ORF 2b), and a 33.6-kDa protein (ORF 2c). In addition, it contains a region of similarity to the 3' terminus of RNA-1, including a truncated portion of the 16-kDa cistron. Phylogenetic analysis of RNA-2, based on a comparison of nucleotide sequences with other members of the genus Tobravirus, indicates that TRV MI-1 and other North American isolates cluster as a distinct group. TRV M1-1 is only the second North American isolate for which there is a complete sequence of the genome, and it is distinct from the North American isolate TRV ORY. The relationship of the TRV MI-1 isolate to other tobravirus isolates is discussed.
Evidence for the Concerted Evolution between Short Linear Protein Motifs and Their Flanking Regions

PubMed Central

Chica, Claudia; Diella, Francesca; Gibson, Toby J.

2009-01-01

Background Linear motifs are short modules of protein sequences that play a crucial role in mediating and regulating many protein–protein interactions. The function of linear motifs strongly depends on the context, e.g. functional instances mainly occur inside flexible regions that are accessible for interaction. Sometimes linear motifs appear as isolated islands of conservation in multiple sequence alignments. However, they also occur in larger blocks of sequence conservation, suggesting an active role for the neighbouring amino acids. Results The evolution of regions flanking 116 functional linear motif instances was studied. The conservation of the amino acid sequence and order/disorder tendency of those regions was related to presence/absence of the instance. For the majority of the analysed instances, the pairs of sequences conserving the linear motif were also observed to maintain a similar local structural tendency and/or to have higher local sequence conservation when compared to pairs of sequences where one is missing the linear motif. Furthermore, those instances have a higher chance to co–evolve with the neighbouring residues in comparison to the distant ones. Those findings are supported by examples where the regulation of the linear motif–mediated interaction has been shown to depend on the modifications (e.g. phosphorylation) at neighbouring positions or is thought to benefit from the binding versatility of disordered regions. Conclusion The results suggest that flanking regions are relevant for linear motif–mediated interactions, both at the structural and sequence level. More interestingly, they indicate that the prediction of linear motif instances can be enriched with contextual information by performing a sequence analysis similar to the one presented here. This can facilitate the understanding of the role of these predicted instances in determining the protein function inside the broader context of the cellular network where they arise. PMID:19584925
Proteogenomic characterization of human colon and rectal cancer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Bing; Wang, Jing; Wang, Xiaojing

2014-09-18

We analyzed proteomes of colon and rectal tumors previously characterized by the Cancer Genome Atlas (TCGA) and performed integrated proteogenomic analyses. Protein sequence variants encoded by somatic genomic variations displayed reduced expression compared to protein variants encoded by germline variations. mRNA transcript abundance did not reliably predict protein expression differences between tumors. Proteomics identified five protein expression subtypes, two of which were associated with the TCGA "MSI/CIMP" transcriptional subtype, but had distinct mutation and methylation patterns and associated with different clinical outcomes. Although CNAs showed strong cis- and trans-effects on mRNA expression, relatively few of these extend to the proteinmore » level. Thus, proteomics data enabled prioritization of candidate driver genes. Our analyses identified HNF4A, a novel candidate driver gene in tumors with chromosome 20q amplifications. Integrated proteogenomic analysis provides functional context to interpret genomic abnormalities and affords novel insights into cancer biology.« less

Deducing protein function by forensic integrative cell biology.

PubMed

Earnshaw, William C

2013-12-01

Our ability to sequence genomes has provided us with near-complete lists of the proteins that compose cells, tissues, and organisms, but this is only the beginning of the process to discover the functions of cellular components. In the future, it's going to be crucial to develop computational analyses that can predict the biological functions of uncharacterised proteins. At the same time, we must not forget those fundamental experimental skills needed to confirm the predictions or send the analysts back to the drawing board to devise new ones.
Elucidation of the genome organization of tobacco mosaic virus.

PubMed Central

Zaitlin, M

1999-01-01

Proteins unique to tobacco mosaic virus (TMV)-infected plants were detected in the 1970s by electrophoretic analyses of extracts of virus-infected tissues, comparing their proteins to those generated in extracts of uninfected tissues. The genome organization of TMV was deduced principally from studies involving in vitro translation of proteins from the genomic and subgenomic messenger RNAs. The ultimate analysis of the TMV genome came in 1982 when P. Goelet and colleagues sequenced the entire genome. Studies leading to the elucidation of the TMV genome organization are described below. PMID:10212938
Phylogenomic analyses of malaria parasites and evolution of their exported proteins

PubMed Central

2011-01-01

Background Plasmodium falciparum is the most malignant agent of human malaria. It belongs to the taxon Laverania, which includes other ape-infecting Plasmodium species. The origin of the Laverania is still debated. P. falciparum exports pathogenicity-related proteins into the host cell using the Plasmodium export element (PEXEL). Predictions based on the presence of a PEXEL motif suggest that more than 300 proteins are exported by P. falciparum, while there are many fewer exported proteins in non-Laverania. Results A whole-genome approach was applied to resolve the phylogeny of eight Plasmodium species and four outgroup taxa. By using 218 orthologous proteins we received unanimous support for a sister group position of Laverania and avian malaria parasites. This observation was corroborated by the analyses of 28 exported proteins with orthologs present in all Plasmodium species. Most interestingly, several deviations from the P. falciparum PEXEL motif were found to be present in the orthologous sequences of non-Laverania. Conclusion Our phylogenomic analyses strongly support the hypotheses that the Laverania have been founded by a single Plasmodium species switching from birds to African great apes or vice versa. The deviations from the canonical PEXEL motif in orthologs may explain the comparably low number of exported proteins that have been predicted in non-Laverania. PMID:21676252
Integrative View of the Diversity and Evolution of SWEET and SemiSWEET Sugar Transporters

PubMed Central

Jia, Baolei; Zhu, Xiao Feng; Pu, Zhong Ji; Duan, Yu Xi; Hao, Lu Jiang; Zhang, Jie; Chen, Li-Qing; Jeon, Che Ok; Xuan, Yuan Hu

2017-01-01

Sugars Will Eventually be Exported Transporter (SWEET) and SemiSWEET are recently characterized families of sugar transporters in eukaryotes and prokaryotes, respectively. SemiSWEETs contain 3 transmembrane helices (TMHs), while SWEETs contain 7. Here, we performed sequence-based comprehensive analyses for SWEETs and SemiSWEETs across the biosphere. In total, 3,249 proteins were identified and ≈60% proteins were found in green plants and Oomycota, which include a number of important plant pathogens. Protein sequence similarity networks indicate that proteins from different organisms are significantly clustered. Of note, SemiSWEETs with 3 or 4 TMHs that may fuse to SWEET were identified in plant genomes. 7-TMH SWEETs were found in bacteria, implying that SemiSWEET can be fused directly in prokaryote. 15-TMH extraSWEET and 25-TMH superSWEET were also observed in wild rice and oomycetes, respectively. The transporters can be classified into 4, 2, 2, and 2 clades in plants, Metazoa, unicellular eukaryotes, and prokaryotes, respectively. The consensus and coevolution of amino acids in SWEETs were identified by multiple sequence alignments. The functions of the highly conserved residues were analyzed by molecular dynamics analysis. The 19 most highly conserved residues in the SWEETs were further confirmed by point mutagenesis using SWEET1 from Arabidopsis thaliana. The results proved that the conserved residues located in the extrafacial gate (Y57, G58, G131, and P191), the substrate binding pocket (N73, N192, and W176), and the intrafacial gate (P43, Y83, F87, P145, M161, P162, and Q202) play important roles for substrate recognition and transport processes. Taken together, our analyses provide a foundation for understanding the diversity, classification, and evolution of SWEETs and SemiSWEETs using large-scale sequence analysis and further show that gene duplication and gene fusion are important factors driving the evolution of SWEETs. PMID:29326750
Evolutionary and genomic analysis of the caleosin/peroxygenase (CLO/PXG) gene/protein families in the Viridiplantae

PubMed Central

Rahman, Farzana; Hassan, Mehedi; Rosli, Rozana; Almousally, Ibrahem; Hanano, Abdulsamie

2018-01-01

Bioinformatics analyses of caleosin/peroxygenases (CLO/PXG) demonstrated that these genes are present in the vast majority of Viridiplantae taxa for which sequence data are available. Functionally active CLO/PXG proteins with roles in abiotic stress tolerance and lipid droplet storage are present in some Trebouxiophycean and Chlorophycean green algae but are absent from the small number of sequenced Prasinophyceaen genomes. CLO/PXG-like genes are expressed during dehydration stress in Charophyte algae, a sister clade of the land plants (Embryophyta). CLO/PXG-like sequences are also present in all of the >300 sequenced Embryophyte genomes, where some species contain as many as 10–12 genes that have arisen via selective gene duplication. Angiosperm genomes harbour at least one copy each of two distinct CLO/PX isoforms, termed H (high) and L (low), where H-forms contain an additional C-terminal motif of about 30–50 residues that is absent from L-forms. In contrast, species in other Viridiplantae taxa, including green algae, non-vascular plants, ferns and gymnosperms, contain only one (or occasionally both) of these isoforms per genome. Transcriptome and biochemical data show that CLO/PXG-like genes have complex patterns of developmental and tissue-specific expression. CLO/PXG proteins can associate with cytosolic lipid droplets and/or bilayer membranes. Many of the analysed isoforms also have peroxygenase activity and are involved in oxylipin metabolism. The distribution of CLO/PXG-like genes is consistent with an origin >1 billion years ago in at least two of the earliest diverging groups of the Viridiplantae, namely the Chlorophyta and the Streptophyta, after the Viridiplantae had already diverged from other Archaeplastidal groups such as the Rhodophyta and Glaucophyta. While algal CLO/PXGs have roles in lipid packaging and stress responses, the Embryophyte proteins have a much wider spectrum of roles and may have been instrumental in the colonisation of terrestrial habitats and the subsequent diversification as the major land flora. PMID:29771926
Integrative View of the Diversity and Evolution of SWEET and SemiSWEET Sugar Transporters.

PubMed

Jia, Baolei; Zhu, Xiao Feng; Pu, Zhong Ji; Duan, Yu Xi; Hao, Lu Jiang; Zhang, Jie; Chen, Li-Qing; Jeon, Che Ok; Xuan, Yuan Hu

2017-01-01

Sugars Will Eventually be Exported Transporter (SWEET) and SemiSWEET are recently characterized families of sugar transporters in eukaryotes and prokaryotes, respectively. SemiSWEETs contain 3 transmembrane helices (TMHs), while SWEETs contain 7. Here, we performed sequence-based comprehensive analyses for SWEETs and SemiSWEETs across the biosphere. In total, 3,249 proteins were identified and ≈60% proteins were found in green plants and Oomycota, which include a number of important plant pathogens. Protein sequence similarity networks indicate that proteins from different organisms are significantly clustered. Of note, SemiSWEETs with 3 or 4 TMHs that may fuse to SWEET were identified in plant genomes. 7-TMH SWEETs were found in bacteria, implying that SemiSWEET can be fused directly in prokaryote. 15-TMH extraSWEET and 25-TMH superSWEET were also observed in wild rice and oomycetes, respectively. The transporters can be classified into 4, 2, 2, and 2 clades in plants, Metazoa, unicellular eukaryotes, and prokaryotes, respectively. The consensus and coevolution of amino acids in SWEETs were identified by multiple sequence alignments. The functions of the highly conserved residues were analyzed by molecular dynamics analysis. The 19 most highly conserved residues in the SWEETs were further confirmed by point mutagenesis using SWEET1 from Arabidopsis thaliana . The results proved that the conserved residues located in the extrafacial gate (Y57, G58, G131, and P191), the substrate binding pocket (N73, N192, and W176), and the intrafacial gate (P43, Y83, F87, P145, M161, P162, and Q202) play important roles for substrate recognition and transport processes. Taken together, our analyses provide a foundation for understanding the diversity, classification, and evolution of SWEETs and SemiSWEETs using large-scale sequence analysis and further show that gene duplication and gene fusion are important factors driving the evolution of SWEETs.
Evolutionary and genomic analysis of the caleosin/peroxygenase (CLO/PXG) gene/protein families in the Viridiplantae.

PubMed

Rahman, Farzana; Hassan, Mehedi; Rosli, Rozana; Almousally, Ibrahem; Hanano, Abdulsamie; Murphy, Denis J

2018-01-01

Bioinformatics analyses of caleosin/peroxygenases (CLO/PXG) demonstrated that these genes are present in the vast majority of Viridiplantae taxa for which sequence data are available. Functionally active CLO/PXG proteins with roles in abiotic stress tolerance and lipid droplet storage are present in some Trebouxiophycean and Chlorophycean green algae but are absent from the small number of sequenced Prasinophyceaen genomes. CLO/PXG-like genes are expressed during dehydration stress in Charophyte algae, a sister clade of the land plants (Embryophyta). CLO/PXG-like sequences are also present in all of the >300 sequenced Embryophyte genomes, where some species contain as many as 10-12 genes that have arisen via selective gene duplication. Angiosperm genomes harbour at least one copy each of two distinct CLO/PX isoforms, termed H (high) and L (low), where H-forms contain an additional C-terminal motif of about 30-50 residues that is absent from L-forms. In contrast, species in other Viridiplantae taxa, including green algae, non-vascular plants, ferns and gymnosperms, contain only one (or occasionally both) of these isoforms per genome. Transcriptome and biochemical data show that CLO/PXG-like genes have complex patterns of developmental and tissue-specific expression. CLO/PXG proteins can associate with cytosolic lipid droplets and/or bilayer membranes. Many of the analysed isoforms also have peroxygenase activity and are involved in oxylipin metabolism. The distribution of CLO/PXG-like genes is consistent with an origin >1 billion years ago in at least two of the earliest diverging groups of the Viridiplantae, namely the Chlorophyta and the Streptophyta, after the Viridiplantae had already diverged from other Archaeplastidal groups such as the Rhodophyta and Glaucophyta. While algal CLO/PXGs have roles in lipid packaging and stress responses, the Embryophyte proteins have a much wider spectrum of roles and may have been instrumental in the colonisation of terrestrial habitats and the subsequent diversification as the major land flora.
Massive Collection of Full-Length Complementary DNA Clones and Microarray Analyses:. Keys to Rice Transcriptome Analysis

NASA Astrophysics Data System (ADS)

Kikuchi, Shoshi

2009-02-01

Completion of the high-precision genome sequence analysis of rice led to the collection of about 35,000 full-length cDNA clones and the determination of their complete sequences. Mapping of these full-length cDNA sequences has given us information on (1) the number of genes expressed in the rice genome; (2) the start and end positions and exon-intron structures of rice genes; (3) alternative transcripts; (4) possible encoded proteins; (5) non-protein-coding (np) RNAs; (6) the density of gene localization on the chromosome; (7) setting the parameters of gene prediction programs; and (8) the construction of a microarray system that monitors global gene expression. Manual curation for rice gene annotation by using mapping information on full-length cDNA and EST assemblies has revealed about 32,000 expressed genes in the rice genome. Analysis of major gene families, such as those encoding membrane transport proteins (pumps, ion channels, and secondary transporters), along with the evolution from bacteria to higher animals and plants, reveals how gene numbers have increased through adaptation to circumstances. Family-based gene annotation also gives us a new way of comparing organisms. Massive amounts of data on gene expression under many kinds of physiological conditions are being accumulated in rice oligoarrays (22K and 44K) based on full-length cDNA sequences. Cluster analyses of genes that have the same promoter cis-elements, that have similar expression profiles, or that encode enzymes in the same metabolic pathways or signal transduction cascades give us clues to understanding the networks of gene expression in rice. As a tool for that purpose, we recently developed "RiCES", a tool for searching for cis-elements in the promoter regions of clustered genes.
Sequence Analysis of Leuconostoc mesenteroides Bacteriophage Φ1-A4 Isolated from an Industrial Vegetable Fermentation▿

PubMed Central

Lu, Z.; Altermann, E.; Breidt, F.; Kozyavkin, S.

2010-01-01

Vegetable fermentations rely on the proper succession of a variety of lactic acid bacteria (LAB). Leuconostoc mesenteroides initiates fermentation. As fermentation proceeds, L. mesenteroides dies off and other LAB complete the fermentation. Phages infecting L. mesenteroides may significantly influence the die-off of L. mesenteroides. However, no L. mesenteroides phages have been previously genetically characterized. Knowledge of more phage genome sequences may provide new insights into phage genomics, phage evolution, and phage-host interactions. We have determined the complete genome sequence of L. mesenteroides phage Φ1-A4, isolated from an industrial sauerkraut fermentation. The phage possesses a linear, double-stranded DNA genome consisting of 29,508 bp with a G+C content of 36%. Fifty open reading frames (ORFs) were predicted. Putative functions were assigned to 26 ORFs (52%), including 5 ORFs of structural proteins. The phage genome was modularly organized, containing DNA replication, DNA-packaging, head and tail morphogenesis, cell lysis, and DNA regulation/modification modules. In silico analyses showed that Φ1-A4 is a unique lytic phage with a large-scale genome inversion (∼30% of the genome). The genome inversion encompassed the lysis module, part of the structural protein module, and a cos site. The endolysin gene was flanked by two holin genes. The tail morphogenesis module was interspersed with cell lysis genes and other genes with unknown functions. The predicted amino acid sequences of the phage proteins showed little similarity to other phages, but functional analyses showed that Φ1-A4 clusters with several Lactococcus phages. To our knowledge, Φ1-A4 is the first genetically characterized L. mesenteroides phage. PMID:20118355
Demonstration of GTG as an endogenous initiation codon for a human mRNA transcript revealed by molecular cloning of the serpin endopin 2B.

PubMed

Hwang, Shin-Rong; Garza, Christina Z; Wegrzyn, Jill; Hook, Vivian Y H

2004-08-16

This study demonstrates utilization of the novel GTG initiation codon for translation of a human mRNA transcript that encodes the serpin endopin 2B, a protease inhibitor. Molecular cloning revealed the nucleotide sequence of the human endopin 2B cDNA. Its deduced primary sequence shows high homology to bovine endopin 2A that possesses cross-class protease inhibition of elastase and papain. Notably, the human endopin 2B cDNA sequence revealed GTG as the predicted translation initiation codon; the predicted translation product of 46 kDa endopin 2B was produced by in vitro translation of 35S-endopin 2B with mammalian (rabbit) protein translation components. Importantly, bioinformatic studies demonstrated the presence of the entire human endopin 2B cDNA sequence with GTG as initiation codon within the human genome on chromosome 14. Further evidence for GTG as a functional initiation codon was illustrated by GTG-mediated in vitro translation of the heterologous protein EGFP, and by GTG-mediated expression of EGFP in mammalian PC12 cells. Mutagenesis of GTG to GTC resulted in the absence of EGFP expression in PC12 cells, indicating the function of GTG as an initiation codon. In addition, it was apparent that the GTG initiation codon produces lower levels of translated protein compared to ATG as initiation codon. Significantly, GTG-mediated translation of endopin 2B demonstrates a functional human gene product not previously predicted from initial analyses of the human genome. Further analyses based on GTG as an alternative initiation codon may predict new candidate genes of the human genome.
Murine recessive hereditary spherocytosis, sph/sph, is caused by a mutation in the erythroid alpha-spectrin gene.

PubMed

Wandersee, N J; Birkenmeier, C S; Gifford, E J; Mohandas, N; Barker, J E

2000-01-01

Spectrin, a heterodimer of alpha- and beta-subunits, is the major protein component of the red blood cell membrane skeleton. The mouse mutation, sph, causes an alpha-spectrin-deficient hereditary spherocytosis with the severe phenotype typical of recessive hereditary spherocytosis in humans. The sph mutation maps to the erythroid alpha-spectrin locus, Spna1, on Chromosome 1. Scanning electron microscopy, osmotic gradient ektacytometry, cDNA cloning, RT-PCR, nucleic acid sequencing, and Northern blot analyses were used to characterize the wild type and sph alleles of the Spna1 locus. Our results confirm the spherocytic nature of sph/sph red blood cells and document a mild spherocytic transition in the +/sph heterozygotes. Sequencing of the full length coding region of the Spna1 wild type allele from the C57BL/6J strain of mice reveals a 2414 residue deduced amino acid sequence that shows the typical 106-amino-acid repeat structure previously described for other members of the spectrin protein family. Sequence analysis of RT-PCR clones from sph/sph alpha-spectrin mRNA identified a single base deletion in repeat 5 that would cause a frame shift and premature termination of the protein. This deletion was confirmed in sph/sph genomic DNA. Northern blot analyses of the distribution of Spna1 mRNA in non-erythroid tissues detects the expression of 8, 2.5 and 2.0 kb transcripts in adult heart. These results predict the heart as an additional site where alpha-spectrin mutations may produce a phenotype and raise the possibility that a novel functional class of small alpha-spectrin isoforms may exist.
Diversity and Evolutionary Analysis of Iron-Containing (Type-III) Alcohol Dehydrogenases in Eukaryotes

PubMed Central

Gaona-López, Carlos; Julián-Sánchez, Adriana

2016-01-01

Background Alcohol dehydrogenase (ADH) activity is widely distributed in the three domains of life. Currently, there are three non-homologous NAD(P)+-dependent ADH families reported: Type I ADH comprises Zn-dependent ADHs; type II ADH comprises short-chain ADHs described first in Drosophila; and, type III ADH comprises iron-containing ADHs (FeADHs). These three families arose independently throughout evolution and possess different structures and mechanisms of reaction. While types I and II ADHs have been extensively studied, analyses about the evolution and diversity of (type III) FeADHs have not been published yet. Therefore in this work, a phylogenetic analysis of FeADHs was performed to get insights into the evolution of this protein family, as well as explore the diversity of FeADHs in eukaryotes. Principal Findings Results showed that FeADHs from eukaryotes are distributed in thirteen protein subfamilies, eight of them possessing protein sequences distributed in the three domains of life. Interestingly, none of these protein subfamilies possess protein sequences found simultaneously in animals, plants and fungi. Many FeADHs are activated by or contain Fe2+, but many others bind to a variety of metals, or even lack of metal cofactor. Animal FeADHs are found in just one protein subfamily, the hydroxyacid-oxoacid transhydrogenase (HOT) subfamily, which includes protein sequences widely distributed in fungi, but not in plants), and in several taxa from lower eukaryotes, bacteria and archaea. Fungi FeADHs are found mainly in two subfamilies: HOT and maleylacetate reductase (MAR), but some can be found also in other three different protein subfamilies. Plant FeADHs are found only in chlorophyta but not in higher plants, and are distributed in three different protein subfamilies. Conclusions/Significance FeADHs are a diverse and ancient protein family that shares a common 3D scaffold with a patchy distribution in eukaryotes. The majority of sequenced FeADHs from eukaryotes are distributed in just two subfamilies, HOT and MAR (found mainly in animals and fungi). These two subfamilies comprise almost 85% of all sequenced FeADHs in eukaryotes. PMID:27893862
Properties of the recombinant TNF-binding proteins from variola, monkeypox, and cowpox viruses are different.

PubMed

Gileva, Irina P; Nepomnyashchikh, Tatiana S; Antonets, Denis V; Lebedev, Leonid R; Kochneva, Galina V; Grazhdantseva, Antonina V; Shchelkunov, Sergei N

2006-11-01

Tumor necrosis factor (TNF), a potent proinflammatory and antiviral cytokine, is a critical extracellular immune regulator targeted by poxviruses through the activity of virus-encoded family of TNF-binding proteins (CrmB, CrmC, CrmD, and CrmE). The only TNF-binding protein from variola virus (VARV), the causative agent of smallpox, infecting exclusively humans, is CrmB. Here we have aligned the amino acid sequences of CrmB proteins from 10 VARV, 14 cowpox virus (CPXV), and 22 monkeypox virus (MPXV) strains. Sequence analyses demonstrated a high homology of these proteins. The regions homologous to cd00185 domain of the TNF receptor family, determining the specificity of ligand-receptor binding, were found in the sequences of CrmB proteins. In addition, a comparative analysis of the C-terminal SECRET domain sequences of CrmB proteins was performed. The differences in the amino acid sequences of these domains characteristic of each particular orthopoxvirus species were detected. It was assumed that the species-specific distinctions between the CrmB proteins might underlie the differences in these physicochemical and biological properties. The individual recombinant proteins VARV-CrmB, MPXV-CrmB, and CPXV-CrmB were synthesized in a baculovirus expression system in insect cells and isolated. Purified VARV-CrmB was detectable as a dimer with a molecular weight of 90 kDa, while MPXV- and CPXV-CrmBs, as monomers when fractioned by non-reducing SDS-PAGE. The CrmB proteins of VARV, MPXV, and CPXV differed in the efficiencies of inhibition of the cytotoxic effects of human, mouse, or rabbit TNFs in L929 mouse fibroblast cell line. Testing of CrmBs in the experimental model of LPS-induced shock using SPF BALB/c mice detected a pronounced protective effect of VARV-CrmB. Thus, our data demonstrated the difference in anti-TNF activities of VARV-, MPXV-, and CPXV-CrmBs and efficiency of VARV-CrmB rather than CPXV- or MPXV-CrmBs against LPS-induced mortality in mice.
Ribosomal protein S14 transcripts are edited in Oenothera mitochondria.

PubMed Central

Schuster, W; Unseld, M; Wissinger, B; Brennicke, A

1990-01-01

The gene encoding ribosomal protein S14 (rps14) in Oenothera mitochondria is located upstream of the cytochrome b gene (cob). Sequence analysis of independently derived cDNA clones covering the entire rps14 coding region shows two nucleotides edited from the genomic DNA to the mRNA derived sequences by C to U modifications. A third editing event occurs four nucleotides upstream of the AUG initiation codon and improves a potential ribosome binding site. A CGG codon specifying arginine in a position conserved in evolution between chloroplasts and E. coli as a UGG tryptophan codon is not edited in any of the cDNAs analysed. An inverted repeat 3' of an unidentified open reading frame is located upstream of the rps14 gene. The inverted repeat sequence is highly conserved at analogous regions in other Oenothera mitochondrial loci. Images PMID:2326162
Diversity among Tacaribe serocomplex viruses (family Arenaviridae) naturally associated with the Mexican woodrat (Neotoma mexicana)

PubMed Central

Cajimat, Maria N. B.; Milazzo, Mary Louise; Borchert, Jeff N.; Abbott, Ken D.; Bradley, Robert D.; Fulhorst, Charles F.

2008-01-01

The results of analyses of glycoprotein precursor and nucleocapsid protein gene sequences indicated that an arenavirus isolated from a Mexican woodrat (Neotoma mexicana) captured in Arizona is a strain of a novel species (proposed name Skinner Tank virus) and that arenaviruses isolated from Mexican woodrats captured in Colorado, New Mexico, and Utah are strains of Whitewater Arroyo virus or species phylogenetically closely related to Whitewater Arroyo virus. Pairwise comparisons of glycoprotein precursor sequences and nucleocapsid protein sequences revealed a high level of divergence among the viruses isolated from the Mexican woodrats captured in Colorado, New Mexico, and Utah and the Whitewater Arroyo virus prototype strain AV 9310135, which originally was isolated from a white-throated woodrat (Neotoma albigula) captured in New Mexico. Conceptually, the viruses from Colorado, New Mexico, and Utah and strain AV 9310135 could be grouped together in a species complex in the family Arenaviridae, genus Arenavirus. PMID:18304671
Diversity of the luciferin binding protein gene in bioluminescent dinoflagellates--insights from a new gene in Noctiluca scintillans and sequences from gonyaulacoid genera.

PubMed

Valiadi, Martha; Iglesias-Rodriguez, Maria Debora

2014-01-01

Dinoflagellate bioluminescence systems operate with or without a luciferin binding protein, representing two distinct modes of light production. However, the distribution, diversity, and evolution of the luciferin binding protein gene within bioluminescent dinoflagellates are not well known. We used PCR to detect and partially sequence this gene from the heterotrophic dinoflagellate Noctiluca scintillans and a group of ecologically important gonyaulacoid species. We report an additional luciferin binding protein gene in N. scintillans which is not attached to luciferase, further to its typical combined bioluminescence gene. This supports the hypothesis that a profound re-organization of the bioluminescence system has taken place in this organism. We also show that the luciferin binding protein gene is present in the genera Ceratocorys, Gonyaulax, and Protoceratium, and is prevalent in bioluminescent species of Alexandrium. Therefore, this gene is an integral component of the standard molecular bioluminescence machinery in dinoflagellates. Nucleotide sequences showed high within-strain variation among gene copies, revealing a highly diverse gene family comprising multiple gene types in some organisms. Phylogenetic analyses showed that, in some species, the evolution of the luciferin binding protein gene was different from the organism's general phylogenies, highlighting the complex evolutionary history of dinoflagellate bioluminescence systems. © 2013 The Author(s) Journal of Eukaryotic Microbiology © 2013 International Society of Protistologists.
Integrating protein structural dynamics and evolutionary analysis with Bio3D.

PubMed

Skjærven, Lars; Yao, Xin-Qiu; Scarabelli, Guido; Grant, Barry J

2014-12-10

Popular bioinformatics approaches for studying protein functional dynamics include comparisons of crystallographic structures, molecular dynamics simulations and normal mode analysis. However, determining how observed displacements and predicted motions from these traditionally separate analyses relate to each other, as well as to the evolution of sequence, structure and function within large protein families, remains a considerable challenge. This is in part due to the general lack of tools that integrate information of molecular structure, dynamics and evolution. Here, we describe the integration of new methodologies for evolutionary sequence, structure and simulation analysis into the Bio3D package. This major update includes unique high-throughput normal mode analysis for examining and contrasting the dynamics of related proteins with non-identical sequences and structures, as well as new methods for quantifying dynamical couplings and their residue-wise dissection from correlation network analysis. These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis. New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included. We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case. The integration of structural dynamics and evolutionary analysis in Bio3D enables researchers to go beyond a prediction of single protein dynamics to investigate dynamical features across large protein families. The Bio3D package is distributed with full source code and extensive documentation as a platform independent R package under a GPL2 license from http://thegrantlab.org/bio3d/ .
Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing.

PubMed

Zhang, Jin; Ruhlman, Tracey A; Mower, Jeffrey P; Jansen, Robert K

2013-12-29

Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition, results indicated no major improvements in breadth of coverage with data sets larger than six billion nucleotides or when sampling RNA from four tissue types rather than from a single tissue. Finally, this work demonstrates the power of cross-compartmental genomic analyses to deepen our understanding of the correlated evolution of the nuclear, plastid, and mitochondrial genomes in plants.
Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing

PubMed Central

2013-01-01

Background Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Results Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. Conclusions The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition, results indicated no major improvements in breadth of coverage with data sets larger than six billion nucleotides or when sampling RNA from four tissue types rather than from a single tissue. Finally, this work demonstrates the power of cross-compartmental genomic analyses to deepen our understanding of the correlated evolution of the nuclear, plastid, and mitochondrial genomes in plants. PMID:24373163
Genetic clustering and polymorphism of the merozoite surface protein-3 of Plasmodium knowlesi clinical isolates from Peninsular Malaysia.

PubMed

De Silva, Jeremy Ryan; Lau, Yee Ling; Fong, Mun Yik

2017-01-03

The simian malaria parasite Plasmodium knowlesi has been reported to cause significant numbers of human infection in South East Asia. Its merozoite surface protein-3 (MSP3) is a protein that belongs to a multi-gene family of proteins first found in Plasmodium falciparum. Several studies have evaluated the potential of P. falciparum MSP3 as a potential vaccine candidate. However, to date no detailed studies have been carried out on P. knowlesi MSP3 gene (pkmsp3). The present study investigates the genetic diversity, and haplotypes groups of pkmsp3 in P. knowlesi clinical samples from Peninsular Malaysia. Blood samples were collected from P. knowlesi malaria patients within a period of 4 years (2008-2012). The pkmsp3 gene of the isolates was amplified via PCR, and subsequently cloned and sequenced. The full length pkmsp3 sequence was divided into Domain A and Domain B. Natural selection, genetic diversity, and haplotypes of pkmsp3 were analysed using MEGA6 and DnaSP ver. 5.10.00 programmes. From 23 samples, 48 pkmsp3 sequences were successfully obtained. At the nucleotide level, 101 synonymous and 238 non-synonymous mutations were observed. Tests of neutrality were not significant for the full length, Domain A or Domain B sequences. However, the dN/dS ratio of Domain B indicates purifying selection for this domain. Analysis of the deduced amino acid sequences revealed 42 different haplotypes. Neighbour Joining phylogenetic tree and haplotype network analyses revealed that the haplotypes clustered into two distinct groups. A moderate level of genetic diversity was observed in the pkmsp3 and only the C-terminal region (Domain B) appeared to be under purifying selection. The separation of the pkmsp3 into two haplotype groups provides further evidence of the existence of two distinct P. knowlesi types or lineages. Future studies should investigate the diversity of pkmsp3 among P. knowlesi isolates in North Borneo, where large numbers of human knowlesi malaria infection still occur.

Genome-based identification of spliceosomal proteins in the silk moth Bombyx mori.

PubMed

Somarelli, Jason A; Mesa, Annia; Fuller, Myron E; Torres, Jacqueline O; Rodriguez, Carol E; Ferrer, Christina M; Herrera, Rene J

2010-12-01

Pre-messenger RNA splicing is a highly conserved eukaryotic cellular function that takes place by way of a large, RNA-protein assembly known as the spliceosome. In the mammalian system, nearly 300 proteins associate with uridine-rich small nuclear (sn)RNAs to form this complex. Some of these splicing factors are ubiquitously present in the spliceosome, whereas others are involved only in the processing of specific transcripts. Several proteomics analyses have delineated the proteins of the spliceosome in several species. In this study, we mine multiple sequence data sets of the silk moth Bombyx mori in an attempt to identify the entire set of known spliceosomal proteins. Five data sets were utilized, including the 3X, 6X, and Build 2.0 genomic contigs as well as the expressed sequence tag and protein libraries. While homologs for 88% of vertebrate splicing factors were delineated in the Bombyx mori genome, there appear to be several spliceosomal polypeptides absent in Bombyx mori and seven additional insect species. This apparent increase in spliceosomal complexity in vertebrates may reflect the tissue-specific and developmental stage-specific alternative pre-mRNA splicing requirements in vertebrates. Phylogenetic analyses of 15 eukaryotic taxa using the core splicing factors suggest that the essential functional units of the pre-mRNA processing machinery have remained highly conserved from yeast to humans. The Sm and LSm proteins are the most conserved, whereas proteins of the U1 small nuclear ribonucleoprotein particle are the most divergent. These data highlight both the differential conservation and relative phylogenetic signals of the essential spliceosomal components throughout evolution. © 2010 Wiley Periodicals, Inc.
The Transcriptome Analysis of Strongyloides stercoralis L3i Larvae Reveals Targets for Intervention in a Neglected Disease

PubMed Central

Marcilla, Antonio; Garg, Gagan; Bernal, Dolores; Ranganathan, Shoba; Forment, Javier; Ortiz, Javier; Muñoz-Antolí, Carla; Dominguez, M. Victoria; Pedrola, Laia; Martinez-Blanch, Juan; Sotillo, Javier; Trelis, Maria; Toledo, Rafael; Esteban, J. Guillermo

2012-01-01

Background Strongyloidiasis is one of the most neglected diseases distributed worldwide with endemic areas in developed countries, where chronic infections are life threatening. Despite its impact, very little is known about the molecular biology of the parasite involved and its interplay with its hosts. Next generation sequencing technologies now provide unique opportunities to rapidly address these questions. Principal Findings Here we present the first transcriptome of the third larval stage of S. stercoralis using 454 sequencing coupled with semi-automated bioinformatic analyses. 253,266 raw sequence reads were assembled into 11,250 contiguous sequences, most of which were novel. 8037 putative proteins were characterized based on homology, gene ontology and/or biochemical pathways. Comparison of the transcriptome of S. strongyloides with those of other nematodes, including S. ratti, revealed similarities in transcription of molecules inferred to have key roles in parasite-host interactions. Enzymatic proteins, like kinases and proteases, were abundant. 1213 putative excretory/secretory proteins were compiled using a new pipeline which included non-classical secretory proteins. Potential drug targets were also identified. Conclusions Overall, the present dataset should provide a solid foundation for future fundamental genomic, proteomic and metabolomic explorations of S. stercoralis, as well as a basis for applied outcomes, such as the development of novel methods of intervention against this neglected parasite. PMID:22389732
Combinatory RNA-Sequencing Analyses Reveal a Dual Mode of Gene Regulation by ADAR1 in Gastric Cancer.

PubMed

Cho, Charles J; Jung, Jaeeun; Jiang, Lushang; Lee, Eun Ji; Kim, Dae-Soo; Kim, Byung Sik; Kim, Hee Sung; Jung, Hwoon-Yong; Song, Ho-June; Hwang, Sung Wook; Park, Yangsoon; Jung, Min Kyo; Pack, Chan Gi; Myung, Seung-Jae; Chang, Suhwan

2018-04-25

Adenosine deaminase acting on RNA 1 (ADAR1) is known to mediate deamination of adenosine-to-inosine through binding to double-stranded RNA, the phenomenon known as RNA editing. Currently, the function of ADAR1 in gastric cancer is unclear. This study was aimed at investigating RNA editing-dependent and editing-independent functions of ADAR1 in gastric cancer, especially focusing on its influence on editing of 3' untranslated regions (UTRs) and subsequent changes in expression of messenger RNAs (mRNAs) as well as microRNAs (miRNAs). RNA-sequencing and small RNA-sequencing were performed on AGS and MKN-45 cells with a stable ADAR1 knockdown. Changed frequencies of editing and mRNA and miRNA expression were then identified by bioinformatic analyses. Targets of RNA editing were further validated in patients' samples. In the Alu region of both gastric cell lines, editing was most commonly of the A-to-I type in 3'-UTR or intron. mRNA and protein levels of PHACTR4 increased in ADAR1 knockdown cells, because of the loss of seed sequences in 3'-UTR of PHACTR4 mRNA that are required for miRNA-196a-3p binding. Immunohistochemical analyses of tumor and paired normal samples from 16 gastric cancer patients showed that ADAR1 expression was higher in tumors than in normal tissues and inversely correlated with PHACTR4 staining. On the other hand, decreased miRNA-148a-3p expression in ADAR1 knockdown cells led to increased mRNA and protein expression of NFYA, demonstrating ADAR1's editing-independent function. ADAR1 regulates post-transcriptional gene expression in gastric cancer through both RNA editing-dependent and editing-independent mechanisms.
A comprehensive and scalable database search system for metaproteomics.

PubMed

Chatterjee, Sandip; Stupp, Gregory S; Park, Sung Kyu Robin; Ducom, Jean-Christophe; Yates, John R; Su, Andrew I; Wolan, Dennis W

2016-08-16

Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations. Our approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed "Blazmass") to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy and allowing for a more in-depth characterization of the functional landscape of the samples. The combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomic search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.
Taxonomic distribution and origins of the extended LHC (light-harvesting complex) antenna protein superfamily

PubMed Central

2010-01-01

Background The extended light-harvesting complex (LHC) protein superfamily is a centerpiece of eukaryotic photosynthesis, comprising the LHC family and several families involved in photoprotection, like the LHC-like and the photosystem II subunit S (PSBS). The evolution of this complex superfamily has long remained elusive, partially due to previously missing families. Results In this study we present a meticulous search for LHC-like sequences in public genome and expressed sequence tag databases covering twelve representative photosynthetic eukaryotes from the three primary lineages of plants (Plantae): glaucophytes, red algae and green plants (Viridiplantae). By introducing a coherent classification of the different protein families based on both, hidden Markov model analyses and structural predictions, numerous new LHC-like sequences were identified and several new families were described, including the red lineage chlorophyll a/b-binding-like protein (RedCAP) family from red algae and diatoms. The test of alternative topologies of sequences of the highly conserved chlorophyll-binding core structure of LHC and PSBS proteins significantly supports the independent origins of LHC and PSBS families via two unrelated internal gene duplication events. This result was confirmed by the application of cluster likelihood mapping. Conclusions The independent evolution of LHC and PSBS families is supported by strong phylogenetic evidence. In addition, a possible origin of LHC and PSBS families from different homologous members of the stress-enhanced protein subfamily, a diverse and anciently paralogous group of two-helix proteins, seems likely. The new hypothesis for the evolution of the extended LHC protein superfamily proposed here is in agreement with the character evolution analysis that incorporates the distribution of families and subfamilies across taxonomic lineages. Intriguingly, stress-enhanced proteins, which are universally found in the genomes of green plants, red algae, glaucophytes and in diatoms with complex plastids, could represent an important and previously missing link in the evolution of the extended LHC protein superfamily. PMID:20673336
Purification and fractionation of membranes for proteomic analyses.

PubMed

Marmagne, Anne; Salvi, Daniel; Rolland, Norbert; Ephritikhine, Geneviève; Joyard, Jacques; Barbier-Brygoo, Hélène

2006-01-01

Proteomics is a very powerful approach to link the information contained in sequenced genomes, such as Arabidopsis, to the functional knowledge provided by studies of plant cell compartments. However, membrane proteomics remains a challenge. One way to bring into view the complex mixture of proteins present in a membrane is to develop proteomic analyses based on (1) the use of highly purified membrane fractions and (2) fractionation of membrane proteins to retrieve as many proteins as possible (from the most to the less hydrophobic ones). To illustrate such strategies, we choose two types of membranes, the plasma membrane and the chloroplast envelope membranes. Both types of membranes can be prepared in a reasonable degree of purity from different types of tissues: the plasma membrane from cultured cells and the chloroplast envelope membrane from whole plants. This article is restricted to the description of methods for the preparation of highly purified and characterized plant membrane fractions and the subsequent fractionation of these membrane proteins according to simple physicochemical criteria (i.e., chloroform/methanol extraction, alkaline or saline treatments) for further analyses using modern proteomic methodologies.
Complete cDNA sequence of SAP-like pentraxin from Limulus polyphemus: implications for pentraxin evolution.

PubMed

Tharia, Hazel A; Shrive, Annette K; Mills, John D; Arme, Chris; Williams, Gwyn T; Greenhough, Trevor J

2002-02-22

The serum amyloid P component (SAP)-like pentraxin Limulus polyphemus SAP is a recently discovered, distinct pentraxin species, of known structure, which does not bind phosphocholine and whose N-terminal sequence has been shown to differ markedly from the highly conserved N terminus of all other known horseshoe crab pentraxins. The complete cDNA sequence of Limulus SAP, and the derived amino acid sequence, the first invertebrate SAP-like pentraxin sequence, have been determined. Two sequences were identified that differed only in the length of the 3' untranslated region. Limulus SAP is synthesised as a precursor protein of 234 amino acid residues, the first 17 residues encoding a signal peptide that is absent from the mature protein. Phylogenetic analysis clusters Limulus SAP pentraxin with the horseshoe crab C-reactive proteins (CRPs) rather than the mammalian SAPs, which are clustered with mammalian CRPs. The deduced amino acid sequence shares 22% identity with both human SAP and CRP, which are 51% identical, and 31-35% with horseshoe crab CRPs. These analyses indicate that gene duplication of CRP (or SAP), followed by sequence divergence and the evolution of CRP and/or SAP function, occurred independently along the chordate and arthropod evolutionary lines rather than in a common ancestor. They further indicate that the CRP/SAP gene duplication event in Limulus occurred before both the emergence of the Limulus CRP variants and the mammalian CRP/SAP gene duplication. Limulus SAP, which does not exhibit the CRP characteristic of calcium-dependent binding to phosphocholine, is established as a pentraxin species distinct from all other known horseshoe crab pentraxins that exist in many variant forms sharing a high level of sequence homology. Copyright 2002 Elsevier Science Ltd.
Orthologs in Arabidopsis thaliana of the Hsp70 interacting protein Hip

PubMed Central

Webb, Mary Alice; Cavaletto, John M.; Klanrit, Preekamol; Thompson, Gary A.

2001-01-01

The Hsp70-interacting protein Hip binds to the adenosine triphosphatase domain of Hsp70, stabilizing it in the adenosine 5′-diphosphate–ligated conformation and promoting binding of target polypeptides. In mammalian cells, Hip is a component of the cytoplasmic chaperone heterocomplex that regulates signal transduction via interaction with hormone receptors and protein kinases. Analysis of the complete genome sequence of the model flowering plant Arabidopsis thaliana revealed 2 genes encoding Hip orthologs. The deduced sequence of AtHip-1 consists of 441 amino acid residues and is 42% identical to human Hip. AtHip-1 contains the same functional domains characterized in mammalian Hip, including an N-terminal dimerization domain, an acidic domain, 3 tetratricopeptide repeats flanked by a highly charged region, a series of degenerate GGMP repeats, and a C-terminal region similar to the Sti1/Hop/p60 protein. The deduced amino acid sequence of AtHip-2 consists of 380 amino acid residues. AtHip-2 consists of a truncated Hip-like domain that is 46% identical to human Hip, followed by a C-terminal domain related to thioredoxin. AtHip-2 is 63% identical to another Hip-thioredoxin protein recently identified in Vitis labrusca (grape). The truncated Hip domain in AtHip-2 includes the amino terminus, the acidic domain, and tetratricopeptide repeats with flanking charged region. Analyses of expressed sequence tag databases indicate that both AtHip-1 and AtHip-2 are expressed in A thaliana and that orthologs of Hip are also expressed widely in other plants. The similarity between AtHip-1 and its mammalian orthologs is consistent with a similar role in plant cells. The sequence of AtHip-2 suggests the possibility of additional unique chaperone functions. PMID:11599566
Purification and partial characterization of low molecular weight vicilin-like glycoprotein from the seeds of Citrullus lanatus.

PubMed

Yadav, Sushila; Tomar, Anil Kumar; Jithesh, O; Khan, Meraj Alam; Yadav, R N; Srinivasan, A; Singh, Tej P; Yadav, Savita

2011-12-01

The watermelon (Citrullus lanatus) seeds are highly nutritive and contain large amount of proteins and many beneficial minerals such as magnesium, calcium, potassium, iron, phosphorous, zinc etc. In various parts of the world, C. lanatus seed extracts are used to cure cancer, cardiovascular diseases, hypertension, and blood pressure. C. lanatus seed extracts are also used as home remedy for edema and urinary tract problems. In this study, we isolated protein fraction of C. lanatus seeds using various protein separation methods. We successfully purified a low molecular weight vicilin-like glycoprotein using chromatographic methods followed by SDS-PAGE and MALDI-TOF/MS identification. This is the first report of purification of a vicilin like polypeptide from C. lanatus seeds. In next step, we extracted mRNA from immature seeds and reverse transcribed it using suitable forward and reverse primers for purified glycoprotein. The PCR product was analysed on 1% agarose gel and was subsequently sequenced by Dideoxy DNA sequencing method. An amino acid translation of the gene is in agreement with amino acid sequences of the identified peptides.
Tools to evaluate the conformation of protein products.

PubMed

Manta, Bruno; Obal, Gonzalo; Ricciardi, Alejandro; Pritsch, Otto; Denicola, Ana

2011-06-01

Production of recombinant proteins is a process intensively used in the research laboratory. In addition, the main biotechnology market products are recombinant proteins and monoclonal antibodies. The biological (and clinical) properties of the protein product strongly depend on the conformation of the polypeptide. Therefore, assessment of the correct conformation of the produced protein is crucial. There is no single method to assess every aspect of protein structure or function. Depending on the protein, the methods of choice vary. There are general methods to evaluate not only mass and primary sequence of the protein, but also higher-order structure. This review outlines the principal techniques for determining the conformation of a protein from structural (biophysical methods) to functional (in vitro binding assays) analyses. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The mitochondrial genome of the Arizona Snowfly Mesocapnia arizonensis (Plecoptera, Capniidae).

PubMed

Elbrecht, Vasco; Leese, Florian

2016-09-01

We assembled the mitochondrial genome of the capniid stonefly Mesocapnia arizonensis (Baumann & Gaufin, 1969) using Illumina HiSeq sequence data. The recovered mitogenome is 14,921 bp in length and includes 13 protein-coding genes, 2 ribosomal RNA genes and 22 transfer RNA genes. The control region could only be assembled partially. Gene order resembles that of basal arthropods. This is the first partial mitogenome sequence for the stonefly superfamily group Euholognatha and will be useful in future phylogenetic analyses.
Fungal proteomics: from identification to function.

PubMed

Doyle, Sean

2011-08-01

Some fungi cause disease in humans and plants, while others have demonstrable potential for the control of insect pests. In addition, fungi are also a rich reservoir of therapeutic metabolites and industrially useful enzymes. Detailed analysis of fungal biochemistry is now enabled by multiple technologies including protein mass spectrometry, genome and transcriptome sequencing and advances in bioinformatics. Yet, the assignment of function to fungal proteins, encoded either by in silico annotated, or unannotated genes, remains problematic. The purpose of this review is to describe the strategies used by many researchers to reveal protein function in fungi, and more importantly, to consolidate the nomenclature of 'unknown function protein' as opposed to 'hypothetical protein' - once any protein has been identified by protein mass spectrometry. A combination of approaches including comparative proteomics, pathogen-induced protein expression and immunoproteomics are outlined, which, when used in combination with a variety of other techniques (e.g. functional genomics, microarray analysis, immunochemical and infection model systems), appear to yield comprehensive and definitive information on protein function in fungi. The relative advantages of proteomic, as opposed to transcriptomic-only, analyses are also described. In the future, combined high-throughput, quantitative proteomics, allied to transcriptomic sequencing, are set to reveal much about protein function in fungi. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
Evolution and phylogeny of the mud shrimps (Crustacea: Decapoda) revealed from complete mitochondrial genomes.

PubMed

Lin, Feng-Jiau; Liu, Yuan; Sha, Zhongli; Tsang, Ling Ming; Chu, Ka Hou; Chan, Tin-Yam; Liu, Ruiyu; Cui, Zhaoxia

2012-11-16

The evolutionary history and relationships of the mud shrimps (Crustacea: Decapoda: Gebiidea and Axiidea) are contentious, with previous attempts revealing mixed results. The mud shrimps were once classified in the infraorder Thalassinidea. Recent molecular phylogenetic analyses, however, suggest separation of the group into two individual infraorders, Gebiidea and Axiidea. Mitochondrial (mt) genome sequence and structure can be especially powerful in resolving higher systematic relationships that may offer new insights into the phylogeny of the mud shrimps and the other decapod infraorders, and test the hypothesis of dividing the mud shrimps into two infraorders. We present the complete mitochondrial genome sequences of five mud shrimps, Austinogebia edulis, Upogebia major, Thalassina kelanang (Gebiidea), Nihonotrypaea thermophilus and Neaxius glyptocercus (Axiidea). All five genomes encode a standard set of 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes and a putative control region. Except for T. kelanang, mud shrimp mitochondrial genomes exhibited rearrangements and novel patterns compared to the pancrustacean ground pattern. Each of the two Gebiidea species (A. edulis and U. major) and two Axiidea species (N. glyptocercus and N. thermophiles) share unique gene order specific to their infraorders and analyses further suggest these two derived gene orders have evolved independently. Phylogenetic analyses based on the concatenated nucleotide and amino acid sequences of 13 protein-coding genes indicate the possible polyphyly of mud shrimps, supporting the division of the group into two infraorders. However, the infraordinal relationships among the Gebiidea and Axiidea, and other reptants are poorly resolved. The inclusion of mt genome from more taxa, in particular the reptant infraorders Polychelida and Glypheidea is required in further analysis. Phylogenetic analyses on the mt genome sequences and the distinct gene orders provide further evidences for the divergence between the two mud shrimp infraorders, Gebiidea and Axiidea, corroborating previous molecular phylogeny and justifying their infraordinal status. Mitochondrial genome sequences appear to be promising markers for resolving phylogenetic issues concerning decapod crustaceans that warrant further investigations and our present study has also provided further information concerning the mt genome evolution of the Decapoda.
Evolution and phylogeny of the mud shrimps (Crustacea: Decapoda) revealed from complete mitochondrial genomes

PubMed Central

2012-01-01

Background The evolutionary history and relationships of the mud shrimps (Crustacea: Decapoda: Gebiidea and Axiidea) are contentious, with previous attempts revealing mixed results. The mud shrimps were once classified in the infraorder Thalassinidea. Recent molecular phylogenetic analyses, however, suggest separation of the group into two individual infraorders, Gebiidea and Axiidea. Mitochondrial (mt) genome sequence and structure can be especially powerful in resolving higher systematic relationships that may offer new insights into the phylogeny of the mud shrimps and the other decapod infraorders, and test the hypothesis of dividing the mud shrimps into two infraorders. Results We present the complete mitochondrial genome sequences of five mud shrimps, Austinogebia edulis, Upogebia major, Thalassina kelanang (Gebiidea), Nihonotrypaea thermophilus and Neaxius glyptocercus (Axiidea). All five genomes encode a standard set of 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes and a putative control region. Except for T. kelanang, mud shrimp mitochondrial genomes exhibited rearrangements and novel patterns compared to the pancrustacean ground pattern. Each of the two Gebiidea species (A. edulis and U. major) and two Axiidea species (N. glyptocercus and N. thermophiles) share unique gene order specific to their infraorders and analyses further suggest these two derived gene orders have evolved independently. Phylogenetic analyses based on the concatenated nucleotide and amino acid sequences of 13 protein-coding genes indicate the possible polyphyly of mud shrimps, supporting the division of the group into two infraorders. However, the infraordinal relationships among the Gebiidea and Axiidea, and other reptants are poorly resolved. The inclusion of mt genome from more taxa, in particular the reptant infraorders Polychelida and Glypheidea is required in further analysis. Conclusions Phylogenetic analyses on the mt genome sequences and the distinct gene orders provide further evidences for the divergence between the two mud shrimp infraorders, Gebiidea and Axiidea, corroborating previous molecular phylogeny and justifying their infraordinal status. Mitochondrial genome sequences appear to be promising markers for resolving phylogenetic issues concerning decapod crustaceans that warrant further investigations and our present study has also provided further information concerning the mt genome evolution of the Decapoda. PMID:23153176
Regulation of expression of the ada gene controlling the adaptive response. Interactions with the ada promoter of the Ada protein and RNA polymerase.

PubMed

Sakumi, K; Sekiguchi, M

1989-01-20

The Ada protein of Escherichia coli catalyzes transfer of methyl groups from methylated DNA to its own molecule, and the methylated form of Ada protein promotes transcription of its own gene, ada. Using an in vitro reconstituted system, we found that both the sigma factor and the methylated Ada protein are required for transcription of the ada gene. To elucidate molecular mechanisms involved in the regulation of the ada transcription, we investigated interactions of the non-methylated and methylated forms of Ada protein and the RNA polymerase holo enzyme (the core enzyme and sigma factor) with a DNA fragment carrying the ada promoter region. Footprinting analyses revealed that the methylated Ada protein binds to a region from positions -63 to -31, which includes the ada regulatory sequence AAAGCGCA. No firm binding was observed with the non-methylated Ada protein, although some DNase I-hypersensitive sites were produced in the promoter by both types of Ada protein. RNA polymerase did bind to the promoter once the methylated Ada protein had bound to the upstream sequence. To correlate these phenomena with the process in vivo, we used the DNAs derived from promoter-defective mutants. No binding of Ada protein nor of RNA polymerase occurred with a mutant DNA having a C to G substitution at position -47 within the ada regulatory sequence. In the case of a -35 box mutant with a T to A change at position -34, the methylated Ada protein did bind to the ada regulatory sequence, yet there was no RNA polymerase binding. Thus, the binding of the methylated Ada protein to the upstream region apparently facilitates binding of the RNA polymerase to the proper region of the promoter. The Ada protein possesses two known methyl acceptor sites, Cys69 and Cys321. The role of methylation of each cysteine residue was investigated using mutant forms of the Ada protein. The Ada protein with the cysteine residue at position 69 replaced by alanine was incapable of binding to the ada promoter even when the cysteine residue at position 321 of the protein was methylated. When the Ada protein with alanine at position 321 was methylated, it acquired the potential to bind to the ada promoter. These results are compatible with the notion that methylation of the cysteine residue at position 69 causes a conformational change of the Ada protein, thereby facilitating binding of the protein to the upstream regulatory sequence.
Associations between transcriptional changes and protein phenotypes provide insights into immune regulation in corals.

PubMed

Fuess, Lauren E; Pinzόn C, Jorge H; Weil, Ernesto; Mydlarz, Laura D

2016-09-01

Disease outbreaks in marine ecosystems have driven worldwide declines of numerous taxa, including corals. Some corals, such as Orbicella faveolata, are particularly susceptible to disease. To explore the mechanisms contributing to susceptibility, colonies of O. faveolata were exposed to immune challenge with lipopolysaccharides. RNA sequencing and protein activity assays were used to characterize the response of corals to immune challenge. Differential expression analyses identified 17 immune-related transcripts that varied in expression post-immune challenge. Network analyses revealed several groups of transcripts correlated to immune protein activity. Several transcripts, which were annotated as positive regulators of immunity were included in these groups, and some were downregulated following immune challenge. Correlations between expression of these transcripts and protein activity results further supported the role of these transcripts in positive regulation of immunity. The observed pattern of gene expression and protein activity may elucidate the processes contributing to the disease susceptibility of species like O. faveolata. Copyright © 2016 Elsevier Ltd. All rights reserved.
One precursor, three apolipoproteins: the relationship between two crustacean lipoproteins, the large discoidal lipoprotein and the high density lipoprotein/β-glucan binding protein.

PubMed

Stieb, Stefanie; Roth, Ziv; Dal Magro, Christina; Fischer, Sabine; Butz, Eric; Sagi, Amir; Khalaila, Isam; Lieb, Bernhard; Schenk, Sven; Hoeger, Ulrich

2014-12-01

The novel discoidal lipoprotein (dLp) recently detected in the crayfish, differs from other crustacean lipoproteins in its large size, apoprotein composition and high lipid binding capacity, We identified the dLp sequence by transcriptome analyses of the hepatopancreas and mass spectrometry. Further de novo assembly of the NGS data followed by BLAST searches using the sequence of the high density lipoprotein/1-glucan binding protein (HDL-BGBP) of Astacus leptodactylus as query revealed a putative precursor molecule with an open reading frame of 14.7 kb and a deduced primary structure of 4889 amino acids. The presence of an N-terminal lipid bind- ing domain and a DUF 1943 domain suggests the relationship with the large lipid transfer proteins. Two-putative dibasic furin cleavage sites were identified bordering the sequence of the HDL-BGBP. When subjected to mass spectroscopic analyses, tryptic peptides of the large apoprotein of dLp matched the N-terminal part of the precursor, while the peptides obtained for its small apoprotein matched the C-terminal part. Repeating the analysis in the prawn Macrobrachium rosenbergii revealed a similar protein with identical domain architecture suggesting that our findings do not represent an isolated instance. Our results indicate that the above three apolipoproteins (i.e HDL-BGBP and both the large and the small subunit of dLp) are translated as a large precursor. Cleavage at the furin type sites releases two subunits forming a heterodimeric dLP particle, while the remaining part forms an HDL-BGBP whose relationship with other lipoproteins as well as specific functions are yet to be elucidated.
Virulence-Affecting Amino Acid Changes in the PA Protein of H7N9 Influenza A Viruses

PubMed Central

Yamayoshi, Seiya; Yamada, Shinya; Fukuyama, Satoshi; Murakami, Shin; Zhao, Dongming; Uraki, Ryuta; Watanabe, Tokiko; Tomita, Yuriko; Macken, Catherine; Neumann, Gabriele

2014-01-01

ABSTRACT Novel avian-origin influenza A(H7N9) viruses were first reported to infect humans in March 2013. To date, 143 human cases, including 45 deaths, have been recorded. By using sequence comparisons and phylogenetic and ancestral inference analyses, we identified several distinct amino acids in the A(H7N9) polymerase PA protein, some of which may be mammalian adapting. Mutant viruses possessing some of these amino acid changes, singly or in combination, were assessed for their polymerase activities and growth kinetics in mammalian and avian cells and for their virulence in mice. We identified several mutants that were slightly more virulent in mice than the wild-type A(H7N9) virus, A/Anhui/1/2013. These mutants also exhibited increased polymerase activity in human cells but not in avian cells. Our findings indicate that the PA protein of A(H7N9) viruses has several amino acid substitutions that are attenuating in mammals. IMPORTANCE Novel avian-origin influenza A(H7N9) viruses emerged in the spring of 2013. By using computational analyses of A(H7N9) viral sequences, we identified several amino acid changes in the polymerase PA protein, which we then assessed for their effects on viral replication in cultured cells and mice. We found that the PA proteins of A(H7N9) viruses possess several amino acid substitutions that cause attenuation in mammals. PMID:24371069
Role of DNA conformation & energetic insights in Msx-1-DNA recognition as revealed by molecular dynamics studies on specific and nonspecific complexes.

PubMed

Kachhap, Sangita; Singh, Balvinder

2015-01-01

In most of homeodomain-DNA complexes, glutamine or lysine is present at 50th position and interacts with 5th and 6th nucleotide of core recognition region. Molecular dynamics simulations of Msx-1-DNA complex (Q50-TG) and its variant complexes, that is specific (Q50K-CC), nonspecific (Q50-CC) having mutation in DNA and (Q50K-TG) in protein, have been carried out. Analysis of protein-DNA interactions and structure of DNA in specific and nonspecific complexes show that amino acid residues use sequence-dependent shape of DNA to interact. The binding free energies of all four complexes were analysed to define role of amino acid residue at 50th position in terms of binding strength considering the variation in DNA on stability of protein-DNA complexes. The order of stability of protein-DNA complexes shows that specific complexes are more stable than nonspecific ones. Decomposition analysis shows that N-terminal amino acid residues have been found to contribute maximally in binding free energy of protein-DNA complexes. Among specific protein-DNA complexes, K50 contributes more as compared to Q50 towards binding free energy in respective complexes. The sequence dependence of local conformation of DNA enables Q50/Q50K to make hydrogen bond with nucleotide(s) of DNA. The changes in amino acid sequence of protein are accommodated and stabilized around TAAT core region of DNA having variation in nucleotides.
Community and gene composition of a human dental plaque microbiota obtained by metagenomic sequencing

PubMed Central

Xie, G.; Chain, P.S.G.; Lo, C.; Liu, K-L.; Gans, J.; Merritt, J.; Qi, F.

2010-01-01

SUMMARY Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~ 2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. PMID:21040513

Community and gene composition of a human dental plaque microbiota obtained by metagenomic sequencing.

PubMed

Xie, G; Chain, P S G; Lo, C-C; Liu, K-L; Gans, J; Merritt, J; Qi, F

2010-12-01

Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. © 2010 John Wiley & Sons A/S.
VKCDB: voltage-gated K+ channel database updated and upgraded.

PubMed

Gallin, Warren J; Boutet, Patrick A

2011-01-01

The Voltage-gated K(+) Channel DataBase (VKCDB) (http://vkcdb.biology.ualberta.ca) makes a comprehensive set of sequence data readily available for phylogenetic and comparative analysis. The current update contains 2063 entries for full-length or nearly full-length unique channel sequences from Bacteria (477), Archaea (18) and Eukaryotes (1568), an increase from 346 solely eukaryotic entries in the original release. In addition to protein sequences for channels, corresponding nucleotide sequences of the open reading frames corresponding to the amino acid sequences are now available and can be extracted in parallel with sets of protein sequences. Channels are categorized into subfamilies by phylogenetic analysis and by using hidden Markov model analyses. Although the raw database contains a number of fragmentary, duplicated, obsolete and non-channel sequences that were collected in early steps of data collection, the web interface will only return entries that have been validated as likely K(+) channels. The retrieval function of the web interface allows retrieval of entries that contain a substantial fraction of the core structural elements of VKCs, fragmentary entries, or both. The full database can be downloaded as either a MySQL dump or as an XML dump from the web site. We have now implemented automated updates at quarterly intervals.
Mapping HLA-A2, -A3 and -B7 supertype-restricted T-cell epitopes in the ebolavirus proteome.

PubMed

Lim, Wan Ching; Khan, Asif M

2018-01-19

Ebolavirus (EBOV) is responsible for one of the most fatal diseases encountered by mankind. Cellular T-cell responses have been implicated to be important in providing protection against the virus. Antigenic variation can result in viral escape from immune recognition. Mapping targets of immune responses among the sequence of viral proteins is, thus, an important first step towards understanding the immune responses to viral variants and can aid in the identification of vaccine targets. Herein, we performed a large-scale, proteome-wide mapping and diversity analyses of putative HLA supertype-restricted T-cell epitopes of Zaire ebolavirus (ZEBOV), the most pathogenic species among the EBOV family. All publicly available ZEBOV sequences (14,098) for each of the nine viral proteins were retrieved, removed of irrelevant and duplicate sequences, and aligned. The overall proteome diversity of the non-redundant sequences was studied by use of Shannon's entropy. The sequences were predicted, by use of the NetCTLpan server, for HLA-A2, -A3, and -B7 supertype-restricted epitopes, which are relevant to African and other ethnicities and provide for large (~86%) population coverage. The predicted epitopes were mapped to the alignment of each protein for analyses of antigenic sequence diversity and relevance to structure and function. The putative epitopes were validated by comparison with experimentally confirmed epitopes. ZEBOV proteome was generally conserved, with an average entropy of 0.16. The 185 HLA supertype-restricted T-cell epitopes predicted (82 (A2), 37 (A3) and 66 (B7)) mapped to 125 alignment positions and covered ~24% of the proteome length. Many of the epitopes showed a propensity to co-localize at select positions of the alignment. Thirty (30) of the mapped positions were completely conserved and may be attractive for vaccine design. The remaining (95) positions had one or more epitopes, with or without non-epitope variants. A significant number (24) of the putative epitopes matched reported experimentally validated HLA ligands/T-cell epitopes of A2, A3 and/or B7 supertype representative allele restrictions. The epitopes generally corresponded to functional motifs/domains and there was no correlation to localization on the protein 3D structure. These data and the epitope map provide important insights into the interaction between EBOV and the host immune system.
Phylogenetic place of guinea pigs: no support of the rodent-polyphyly hypothesis from maximum-likelihood analyses of multiple protein sequences.

PubMed

Cao, Y; Adachi, J; Yano, T; Hasegawa, M

1994-07-01

Graur et al.'s (1991) hypothesis that the guinea pig-like rodents have an evolutionary origin within mammals that is separate from that of other rodents (the rodent-polyphyly hypothesis) was reexamined by the maximum-likelihood method for protein phylogeny, as well as by the maximum-parsimony and neighbor-joining methods. The overall evidence does not support Graur et al.'s hypothesis, which radically contradicts the traditional view of rodent monophyly. This work demonstrates that we must be careful in choosing a proper method for phylogenetic inference and that an argument based on a small data set (with respect to the length of the sequence and especially the number of species) may be unstable.
Peptide sequences from the first Castoroides ohioensis skull and the utility of old museum collections for palaeoproteomics

PubMed Central

Schroeter, Elena R.; Feranec, Robert S.; Vashishth, Deepak

2016-01-01

Vertebrate fossils have been collected for hundreds of years and are stored in museum collections around the world. These remains provide a readily available resource to search for preserved proteins; however, the vast majority of palaeoproteomic studies have focused on relatively recently collected bones with a well-known handling history. Here, we characterize proteins from the nasal turbinates of the first Castoroides ohioensis skull ever discovered. Collected in 1845, this is the oldest museum-curated specimen characterized using palaeoproteomic tools. Our mass spectrometry analysis detected many collagen I peptides, a peptide from haemoglobin beta, and in vivo and diagenetic post-translational modifications. Additionally, the identified collagen I sequences provide enough resolution to place C. ohioensis within Rodentia. This study illustrates the utility of archived museum specimens for both the recovery of preserved proteins and phylogenetic analyses. PMID:27306052
In situ expression of eukaryotic ice-binding proteins in microbial communities of Arctic and Antarctic sea ice.

PubMed

Uhlig, Christiane; Kilpert, Fabian; Frickenhaus, Stephan; Kegel, Jessica U; Krell, Andreas; Mock, Thomas; Valentin, Klaus; Beszteri, Bánk

2015-11-01

Ice-binding proteins (IBPs) have been isolated from various sea-ice organisms. Their characterisation points to a crucial role in protecting the organisms in sub-zero environments. However, their in situ abundance and diversity in natural sea-ice microbial communities is largely unknown. In this study, we analysed the expression and phylogenetic diversity of eukaryotic IBP transcripts from microbial communities of Arctic and Antarctic sea ice. IBP transcripts were found in abundances similar to those of proteins involved in core cellular processes such as photosynthesis. Eighty-nine percent of the IBP transcripts grouped with known IBP sequences from diatoms, haptophytes and crustaceans, but the majority represented novel sequences not previously characterized in cultured organisms. The observed high eukaryotic IBP expression in natural eukaryotic sea ice communities underlines the essential role of IBPs for survival of many microorganisms in communities living under the extreme conditions of polar sea ice.
MACSIMS : multiple alignment of complete sequences information management system

PubMed Central

Thompson, Julie D; Muller, Arnaud; Waterhouse, Andrew; Procter, Jim; Barton, Geoffrey J; Plewniak, Frédéric; Poch, Olivier

2006-01-01

Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at . PMID:16792820
Combining micro-RNA and protein sequencing to detect robust biomarkers for Graves' disease and orbitopathy.

PubMed

Zhang, Lei; Masetti, Giulia; Colucci, Giuseppe; Salvi, Mario; Covelli, Danila; Eckstein, Anja; Kaiser, Ulrike; Draman, Mohd Shazli; Muller, Ilaria; Ludgate, Marian; Lucini, Luigi; Biscarini, Filippo

2018-05-30

Graves' Disease (GD) is an autoimmune condition in which thyroid-stimulating antibodies (TRAB) mimic thyroid-stimulating hormone function causing hyperthyroidism. 5% of GD patients develop inflammatory Graves' orbitopathy (GO) characterized by proptosis and attendant sight problems. A major challenge is to identify which GD patients are most likely to develop GO and has relied on TRAB measurement. We screened sera/plasma from 14 GD, 19 GO and 13 healthy controls using high-throughput proteomics and miRNA sequencing (Illumina's HiSeq2000 and Agilent-6550 Funnel quadrupole-time-of-flight mass spectrometry) to identify potential biomarkers for diagnosis or prognosis evaluation. Euclidean distances and differential expression (DE) based on miRNA and protein quantification were analysed by multidimensional scaling (MDS) and multinomial regression respectively. We detected 3025 miRNAs and 1886 proteins and MDS revealed good separation of the 3 groups. Biomarkers were identified by combined DE and Lasso-penalized predictive models; accuracy of predictions was 0.86 (±0:18), and 5 miRNA and 20 proteins were found including Zonulin, Alpha-2 macroglobulin, Beta-2 glycoprotein 1 and Fibronectin. Functional analysis identified relevant metabolic pathways, including hippo signaling, bacterial invasion of epithelial cells and mRNA surveillance. Proteomic and miRNA analyses, combined with robust bioinformatics, identified circulating biomarkers applicable to diagnose GD, predict GO disease status and optimize patient management.
Characterization of a mixture of lobster digestive cysteine proteinases by ionspray mass spectrometry and tryptic mapping with LC--MS and LC--MS--MS

NASA Astrophysics Data System (ADS)

Thibault, P.; Pleasance, S.; Laycock, M. V.; Mackay, R. M.; Boyd, R. K.

1991-12-01

An inseparable mixture of two cysteine proteinases, isolated from the digestive tract of the American lobster, was investigated by ionspray mass spectrometry (ISP-MS), using a combination of infusion of intact proteins with on-line liquid chromatography--mass spectrometry (LC--MS) and LC--MS--MS analyses of tryptic digests. These data were interpreted by comparisons with predictions from results of molecular cloning of cysteine-proteinase-encoding messenger RNA sequences previously isolated from the lobster hepatopancreas. Investigations of the numbers of free thiol groups and of disulfide bonds were made by measuring the molecular weights of the alkylated proteins with and without prior reduction of disulfide bonds, and comparison with the corresponding data for the native proteins. Identification of tyrptic fragment peptides containing cysteine residues was facilitated by comparing LC--MS analyses of tryptic digests of denatured and of denatured and alkylated proteins, since such tryptic peptides are subject to shifts in both mass and retention time upon reduction and alkylation. Confirmation of amino acid sequences was obtained from fragment ion spectra of each tryptic peptide (alkylated or not) as it eluted from the column. Acquisition of such on-line LC--MS data was possible through use of the entire effluent from a standard 1 mm high performance liquid chromatography (HPLC) column by an IonsSpray® LC--MS interface (pneumatically assisted electrospray).
Dual targeting to mitochondria and plastids of AtBT1 and ZmBT1, two members of the mitochondrial carrier family.

PubMed

Bahaji, Abdellatif; Ovecka, Miroslav; Bárány, Ivett; Risueño, María Carmen; Muñoz, Francisco José; Baroja-Fernández, Edurne; Montero, Manuel; Li, Jun; Hidalgo, Maite; Sesma, María Teresa; Ezquer, Ignacio; Testillano, Pilar S; Pozueta-Romero, Javier

2011-04-01

Zea mays and Arabidopsis thaliana Brittle 1 (ZmBT1 and AtBT1, respectively) are members of the mitochondrial carrier family. Although they are presumed to be exclusively localized in the envelope membranes of plastids, confocal fluorescence microscopy analyses of potato, Arabidopsis and maize plants stably expressing green fluorescent protein (GFP) fusions of ZmBT1 and AtBT1 revealed that the two proteins have dual localization to plastids and mitochondria. The patterns of GFP fluorescence distribution observed in plants stably expressing GFP fusions of ZmBT1 and AtBT1 N-terminal extensions were fully congruent with that of plants expressing a plastidial marker fused to GFP. Furthermore, the patterns of GFP fluorescence distribution and motility observed in plants expressing the mature proteins fused to GFP were identical to those observed in plants expressing a mitochondrial marker fused to GFP. Electron microscopic immunocytochemical analyses of maize endosperms using anti-ZmBT1 antibodies further confirmed that ZmBT1 occurs in both plastids and mitochondria. The overall data showed that (i) ZmBT1 and AtBT1 are dually targeted to mitochondria and plastids; (ii) AtBT1 and ZmBT1 N-terminal extensions comprise targeting sequences exclusively recognized by the plastidial compartment; and (iii) targeting sequences to mitochondria are localized within the mature part of the BT1 proteins.
Overcoming Species Boundaries in Peptide Identification with Bayesian Information Criterion-driven Error-tolerant Peptide Search (BICEPS)*

PubMed Central

Renard, Bernhard Y.; Xu, Buote; Kirchner, Marc; Zickmann, Franziska; Winter, Dominic; Korten, Simone; Brattig, Norbert W.; Tzur, Amit; Hamprecht, Fred A.; Steen, Hanno

2012-01-01

Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. Although sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides that are not exactly contained in a protein database. De novo searches are generally hindered by their restricted reliability, and current error-tolerant search strategies are limited by global, heuristic tradeoffs between database and spectral information. We propose a Bayesian information criterion-driven error-tolerant peptide search (BICEPS) and offer an open source implementation based on this statistical criterion to automatically balance the information of each single spectrum and the database, while limiting the run time. We show that BICEPS performs as well as current database search algorithms when such algorithms are applied to sequenced organisms, whereas BICEPS only uses a remotely related organism database. For instance, we use a chicken instead of a human database corresponding to an evolutionary distance of more than 300 million years (International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716). We demonstrate the successful application to cross-species proteomics with a 33% increase in the number of identified proteins for a filarial nematode sample of Litomosoides sigmodontis. PMID:22493179
Characterisation of divergent flavivirus NS3 and NS5 protein sequences detected in Rhipicephalus microplus ticks from Brazil

PubMed Central

Maruyama, Sandra Regina; Castro-Jorge, Luiza Antunes; Ribeiro, José Marcos Chaves; Gardinassi, Luiz Gustavo; Garcia, Gustavo Rocha; Brandão, Lucinda Giampietro; Rodrigues, Aline Rezende; Okada, Marcos Ituo; Abrão, Emiliana Pereira; Ferreira, Beatriz Rossetti; da Fonseca, Benedito Antonio Lopes; de Miranda-Santos, Isabel Kinney Ferreira

2013-01-01

Transcripts similar to those that encode the nonstructural (NS) proteins NS3 and NS5 from flaviviruses were found in a salivary gland (SG) complementary DNA (cDNA) library from the cattle tick Rhipicephalus microplus. Tick extracts were cultured with cells to enable the isolation of viruses capable of replicating in cultured invertebrate and vertebrate cells. Deep sequencing of the viral RNA isolated from culture supernatants provided the complete coding sequences for the NS3 and NS5 proteins and their molecular characterisation confirmed similarity with the NS3 and NS5 sequences from other flaviviruses. Despite this similarity, phylogenetic analyses revealed that this potentially novel virus may be a highly divergent member of the genus Flavivirus. Interestingly, we detected the divergent NS3 and NS5 sequences in ticks collected from several dairy farms widely distributed throughout three regions of Brazil. This is the first report of flavivirus-like transcripts in R. microplus ticks. This novel virus is a potential arbovirus because it replicated in arthropod and mammalian cells; furthermore, it was detected in a cDNA library from tick SGs and therefore may be present in tick saliva. It is important to determine whether and by what means this potential virus is transmissible and to monitor the virus as a potential emerging tick-borne zoonotic pathogen. PMID:24626302
A comparison of complete mitochondrial genomes of silver carp hypophthalmichthys molitrix and bighead carp hypophthalmichthys nobilis: Implications for their taxonomic relationship and phylogeny

USGS Publications Warehouse

Li, S.-F.; Xu, J.-W.; Yang, Q.-L.; Wang, C.H.; Chen, Q.; Chapman, D.C.; Lu, G.

2009-01-01

Based upon morphological characters, Silver carp Hypophthalmichthys molitrix and bighead carp Hypophthalmichthys nobilis (or Aristichthys nobilis) have been classified into either the same genus or two distinct genera. Consequently, the taxonomic relationship of the two species at the generic level remains equivocal. This issue is addressed by sequencing complete mitochondrial genomes of H. molitrix and H. nobilis, comparing their mitogenome organization, structure and sequence similarity, and conducting a comprehensive phylogenetic analysis of cyprinid species. As with other cyprinid fishes, the mitogenomes of the two species were structurally conserved, containing 37 genes including 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA (tRNAs) genes and a putative control region (D-loop). Sequence similarity between the two mitogenomes varied in different genes or regions, being highest in the tRNA genes (98??8%), lowest in the control region (89??4%) and intermediate in the protein-coding genes (94??2%). Analyses of the sequence comparison and phylogeny using concatenated protein sequences support the view that the two species belong to the genus Hypophthalmichthys. Further studies using nuclear markers and involving more closely related species, and the systematic combination of traditional biology and molecular biology are needed in order to confirm this conclusion. ?? 2009 The Fisheries Society of the British Isles.
Genome-wide identification of aquaporin encoding genes in Brassica oleracea and their phylogenetic sequence comparison to Brassica crops and Arabidopsis

PubMed Central

Diehn, Till A.; Pommerrenig, Benjamin; Bernhardt, Nadine; Hartmann, Anja; Bienert, Gerd P.

2015-01-01

Aquaporins (AQPs) are essential channel proteins that regulate plant water homeostasis and the uptake and distribution of uncharged solutes such as metalloids, urea, ammonia, and carbon dioxide. Despite their importance as crop plants, little is known about AQP gene and protein function in cabbage (Brassica oleracea) and other Brassica species. The recent releases of the genome sequences of B. oleracea and Brassica rapa allow comparative genomic studies in these species to investigate the evolution and features of Brassica genes and proteins. In this study, we identified all AQP genes in B. oleracea by a genome-wide survey. In total, 67 genes of four plant AQP subfamilies were identified. Their full-length gene sequences and locations on chromosomes and scaffolds were manually curated. The identification of six additional full-length AQP sequences in the B. rapa genome added to the recently published AQP protein family of this species. A phylogenetic analysis of AQPs of Arabidopsis thaliana, B. oleracea, B. rapa allowed us to follow AQP evolution in closely related species and to systematically classify and (re-) name these isoforms. Thirty-three groups of AQP-orthologous genes were identified between B. oleracea and Arabidopsis and their expression was analyzed in different organs. The two selectivity filters, gene structure and coding sequences were highly conserved within each AQP subfamily while sequence variations in some introns and untranslated regions were frequent. These data suggest a similar substrate selectivity and function of Brassica AQPs compared to Arabidopsis orthologs. The comparative analyses of all AQP subfamilies in three Brassicaceae species give initial insights into AQP evolution in these taxa. Based on the genome-wide AQP identification in B. oleracea and the sequence analysis and reprocessing of Brassica AQP information, our dataset provides a sequence resource for further investigations of the physiological and molecular functions of Brassica crop AQPs. PMID:25904922
d-Omix: a mixer of generic protein domain analysis tools.

PubMed

Wichadakul, Duangdao; Numnark, Somrak; Ingsriswang, Supawadee

2009-07-01

Domain combination provides important clues to the roles of protein domains in protein function, interaction and evolution. We have developed a web server d-Omix (a Mixer of Protein Domain Analysis Tools) aiming as a unified platform to analyze, compare and visualize protein data sets in various aspects of protein domain combinations. With InterProScan files for protein sets of interest provided by users, the server incorporates four services for domain analyses. First, it constructs protein phylogenetic tree based on a distance matrix calculated from protein domain architectures (DAs), allowing the comparison with a sequence-based tree. Second, it calculates and visualizes the versatility, abundance and co-presence of protein domains via a domain graph. Third, it compares the similarity of proteins based on DA alignment. Fourth, it builds a putative protein network derived from domain-domain interactions from DOMINE. Users may select a variety of input data files and flexibly choose domain search tools (e.g. hmmpfam, superfamily) for a specific analysis. Results from the d-Omix could be interactively explored and exported into various formats such as SVG, JPG, BMP and CSV. Users with only protein sequences could prepare an InterProScan file using a service provided by the server as well. The d-Omix web server is freely available at http://www.biotec.or.th/isl/Domix.
Biochemical Characterization of α‐Fetoprotein and Other Serum Proteins Produced by a Uterine Endometrial Adenocarcinoma

PubMed Central

Taketa, Kazuhisa; Azuma, Masaki; Yamamoto, Ritsu; Fujimoto, Seiichiro; Nishi, Shinzo

1996-01-01

A high serum α‐fetoprotein (AFP) level was found in a patient with endometrial adenocarcinoma of the uterus, which appeared to be hepatoid on histological examination. The AFP of this unusual patient was purified hy immunoaffinity chromatography and characterized. The electrophoretic profiles on sodium dodecyl sulfate‐polyacrylamide gel electrophoresis both before and after glycopeptidase F treatment were indistinguishable from those of a hepatoma AFP. This indicates that the patient's AFP was also composed of a single polypeptide chain of Mr 67,000 and an N‐linked sugar chain of Mr 3,000. Amino acid sequence analyses of this AFP, and of AFP from hepatoma and umbilical cord serum indicated that the N‐terminal sequences were essentially the same. The sequence, Arg‐Thr‐Leu‐His‐Arg‐Asn‐Glu‐Tyr‐Gly‐Ile, was slightly different from previous reports, but matched that deduced from the cDNA sequence. AFP isoforms due to microheterogeneity of the sugar chain were analyzed by lectin affinity electrophoresis using a series of lectins. The AFP isoform profiles were distinct from those of proteins derived from cord serum, hepatoma, yolk sac tumor and gastric cancer. The reverse‐transcription of RNA from the tumor tissue followed by a polymerase chain reaction using primers with AFP‐specific sequences gave a product of the size and nucleotide sequence expected for AFP. mRNAs possessing the requisite sequences for albumin and transferrin syntheses were also detected in the tumor. The expression of these hepatocyte‐specific proteins supported the hepatoid nature of this tumor. PMID:8766525
The complete mitochondrial genome of the dwarf tapeworm Hymenolepis nana--a neglected zoonotic helminth.

PubMed

Cheng, Tian; Liu, Guo-Hua; Song, Hui-Qun; Lin, Rui-Qing; Zhu, Xing-Quan

2016-03-01

Hymenolepis nana, commonly known as the dwarf tapeworm, is one of the most common tapeworms of humans and rodents and can cause hymenolepiasis. Although this zoonotic tapeworm is of socio-economic significance in many countries of the world, its genetics, systematics, epidemiology, and biology are poorly understood. In the present study, we sequenced and characterized the complete mitochondrial (mt) genome of H. nana. The mt genome is 13,764 bp in size and encodes 36 genes, including 12 protein-coding genes, 2 ribosomal RNA, and 22 transfer RNA genes. All genes are transcribed in the same direction. The gene order and genome content are completely identical with their congener Hymenolepis diminuta. Phylogenetic analyses based on concatenated amino acid sequences of 12 protein-coding genes by Bayesian inference, Maximum likelihood, and Maximum parsimony showed the division of class Cestoda into two orders, supported the monophylies of both the orders Cyclophyllidea and Pseudophyllidea. Analyses of mt genome sequences also support the monophylies of the three families Taeniidae, Hymenolepididae, and Diphyllobothriidae. This novel mt genome provides a useful genetic marker for studying the molecular epidemiology, systematics, and population genetics of the dwarf tapeworm and should have implications for the diagnosis, prevention, and control of hymenolepiasis in humans.
Prevalence of Tobacco mosaic virus in Iran and Evolutionary Analyses of the Coat Protein Gene

PubMed Central

Alishiri, Athar; Rakhshandehroo, Farshad; Zamanizadeh, Hamid-Reza; Palukaitis, Peter

2013-01-01

The incidence and distribution of Tobacco mosaic virus (TMV) and related tobamoviruses was determined using an enzyme-linked immunosorbent assay on 1,926 symptomatic horticultural crops and 107 asymptomatic weed samples collected from 78 highly infected fields in the major horticultural crop-producing areas in 17 provinces throughout Iran. The results were confirmed by host range studies and reverse transcription-polymerase chain reaction. The overall incidence of infection by these viruses in symptomatic plants was 11.3%. The coat protein (CP) gene sequences of a number of isolates were determined and disclosed to be a high identity (up to 100%) among the Iranian isolates. Phylogenetic analysis of all known TMV CP genes showed three clades on the basis of nucleotide sequences with all Iranian isolates distinctly clustered in clade II. Analysis using the complete CP amino acid sequence showed one clade with two subgroups, IA and IB, with Iranian isolates in both subgroups. The nucleotide diversity within each sub-group was very low, but higher between the two clades. No correlation was found between genetic distance and geographical origin or host species of isolation. Statistical analyses suggested a negative selection and demonstrated the occurrence of gene flow from the isolates in other clades to the Iranian population. PMID:25288953
Characterization of a gene coding for a type IIo bacterial IgG-binding protein.

PubMed

Boyle, M D; Weber-Heynemann, J; Raeder, R; Podbielski, A

1995-06-01

Two antigenic classes of non-immune IgG-binding proteins can be expressed by group A streptococci. One antigenic group of proteins is recognized by an antibody prepared against the product of a cloned fcrA gene (anti-FcRA). In this study, the immunogen used to prepare the antibody that defines the second antigenic class was shown to be the product of the emm-like (emmL) gene of M serotype 55 group A isolate, A928. The emmL55 gene expressed in E. coli produced an M(r) approximately 58,000 molecule which bound human IgG1, IgG2, IgG3 and IgG4, as well as horse, rabbit and pig IgG in a non-immune fashion. These properties are characteristic of the previously described type IIo IgG-binding protein isolated from this strain. In addition, the recombinant protein was reactive with human serum albumin and fibrinogen. The emmL 55 gene sequence was analysed and found to have the organization and sequence characteristics of a typical class I emm-like gene.
Analysis of Sir2E in the cellular slime mold Dictyostelium discoideum: cellular localization, spatial expression and overexpression.

PubMed

Katayama, Takahiro; Yasukawa, Hiro

2008-10-01

It has been reported that Dictyostelium discoideum encodes four silent information regulator 2 (Sir2) proteins (Sir2A-D) showing sequence similarity to human homologues of Sir2 (SIRT1-3). Further screening in a database revealed that D. discoideum encodes an additional Sir2 homologue (Sir2E). The amino acid sequence of Sir2E is not similar to those of SIRTs but is similar to those of proteins encoded by Giardia lamblia, Cryptosporidium hominis and Cryptosporidium parvum. Fluorescence of Sir2E-green fluorescent protein fusion protein was detected in the D. discoideum nucleus, indicating that Sir2E is a nuclear localizing protein. Reverse transcription-polymerase chain reaction and whole-mount in situ hybridization analyses showed that D. discoideum expressed sir2E in amoebae in the growth phase and in prestalk cells in the developmental phase. D. discoideum overexpressing sir2E grew faster than the wild type. These results indicate that Sir2E plays important roles both in the growth phase and developmental phase of D. discoideum.

Bioinformatic Analyses of Unique (Orphan) Core Genes of the Genus Acidithiobacillus: Functional Inferences and Use As Molecular Probes for Genomic and Metagenomic/Transcriptomic Interrogation

PubMed Central

González, Carolina; Lazcano, Marcelo; Valdés, Jorge; Holmes, David S.

2016-01-01

Using phylogenomic and gene compositional analyses, five highly conserved gene families have been detected in the core genome of the phylogenetically coherent genus Acidithiobacillus of the class Acidithiobacillia. These core gene families are absent in the closest extant genus Thermithiobacillus tepidarius that subtends the Acidithiobacillus genus and roots the deepest in this class. The predicted proteins encoded by these core gene families are not detected by a BLAST search in the NCBI non-redundant database of more than 90 million proteins using a relaxed cut-off of 1.0e−5. None of the five families has a clear functional prediction. However, bioinformatic scrutiny, using pI prediction, motif/domain searches, cellular location predictions, genomic context analyses, and chromosome topology studies together with previously published transcriptomic and proteomic data, suggests that some may have functions associated with membrane remodeling during cell division perhaps in response to pH stress. Despite the high level of amino acid sequence conservation within each family, there is sufficient nucleotide variation of the respective genes to permit the use of the DNA sequences to distinguish different species of Acidithiobacillus, making them useful additions to the armamentarium of tools for phylogenetic analysis. Since the protein families are unique to the Acidithiobacillus genus, they can also be leveraged as probes to detect the genus in environmental metagenomes and metatranscriptomes, including industrial biomining operations, and acid mine drainage (AMD). PMID:28082953
Bioinformatic Analyses of Unique (Orphan) Core Genes of the Genus Acidithiobacillus: Functional Inferences and Use As Molecular Probes for Genomic and Metagenomic/Transcriptomic Interrogation.

PubMed

González, Carolina; Lazcano, Marcelo; Valdés, Jorge; Holmes, David S

2016-01-01

Using phylogenomic and gene compositional analyses, five highly conserved gene families have been detected in the core genome of the phylogenetically coherent genus Acidithiobacillus of the class Acidithiobacillia . These core gene families are absent in the closest extant genus Thermithiobacillus tepidarius that subtends the Acidithiobacillus genus and roots the deepest in this class. The predicted proteins encoded by these core gene families are not detected by a BLAST search in the NCBI non-redundant database of more than 90 million proteins using a relaxed cut-off of 1.0e -5 . None of the five families has a clear functional prediction. However, bioinformatic scrutiny, using pI prediction, motif/domain searches, cellular location predictions, genomic context analyses, and chromosome topology studies together with previously published transcriptomic and proteomic data, suggests that some may have functions associated with membrane remodeling during cell division perhaps in response to pH stress. Despite the high level of amino acid sequence conservation within each family, there is sufficient nucleotide variation of the respective genes to permit the use of the DNA sequences to distinguish different species of Acidithiobacillus , making them useful additions to the armamentarium of tools for phylogenetic analysis. Since the protein families are unique to the Acidithiobacillus genus, they can also be leveraged as probes to detect the genus in environmental metagenomes and metatranscriptomes, including industrial biomining operations, and acid mine drainage (AMD).
Needles in the EST Haystack: Large-Scale Identification and Analysis of Excretory-Secretory (ES) Proteins in Parasitic Nematodes Using Expressed Sequence Tags (ESTs)

PubMed Central

Nagaraj, Shivashankar H.; Gasser, Robin B.; Ranganathan, Shoba

2008-01-01

Background Parasitic nematodes of humans, other animals and plants continue to impose a significant public health and economic burden worldwide, due to the diseases they cause. Promising antiparasitic drug and vaccine candidates have been discovered from excreted or secreted (ES) proteins released from the parasite and exposed to the immune system of the host. Mining the entire expressed sequence tag (EST) data available from parasitic nematodes represents an approach to discover such ES targets. Methods and Findings In this study, we predicted, using EST2Secretome, a novel, high-throughput, computational workflow system, 4,710 ES proteins from 452,134 ESTs derived from 39 different species of nematodes, parasitic in animals (including humans) or plants. In total, 2,632, 786, and 1,292 ES proteins were predicted for animal-, human-, and plant-parasitic nematodes. Subsequently, we systematically analysed ES proteins using computational methods. Of these 4,710 proteins, 2,490 (52.8%) had orthologues in Caenorhabditis elegans, whereas 621 (13.8%) appeared to be novel, currently having no significant match to any molecule available in public databases. Of the C. elegans homologues, 267 had strong “loss-of-function” phenotypes by RNA interference (RNAi) in this nematode. We could functionally classify 1,948 (41.3%) sequences using the Gene Ontology (GO) terms, establish pathway associations for 573 (12.2%) sequences using Kyoto Encyclopaedia of Genes and Genomes (KEGG), and identify protein interaction partners for 1,774 (37.6%) molecules. We also mapped 758 (16.1%) proteins to protein domains including the nematode-specific protein family “transthyretin-like” and “chromadorea ALT,” considered as vaccine candidates against filariasis in humans. Conclusions We report the large-scale analysis of ES proteins inferred from EST data for a range of parasitic nematodes. This set of ES proteins provides an inventory of known and novel members of ES proteins as a foundation for studies focused on understanding the biology of parasitic nematodes and their interactions with their hosts, as well as for the development of novel drugs or vaccines for parasite intervention and control. PMID:18820748
[MALDI-TOF mass spectrometry in the investigation of large high-molecular biological compounds].

PubMed

Porubl'ova, L V; Rebriiev, A V; Hromovyĭ, T Iu; Minia, I I; Obolens'ka, M Iu

2009-01-01

MALDI-TOF (Matrix-Assisted Laser Desorption/Ionization Time-of-Flight) mass spectrometry has become, in the recent years, a tool of choice for analyses of biological polymers. The wide mass range, high accuracy, informativity and sensitivity make it a superior method for analysis of all kinds of high-molecular biological compounds including proteins, nucleic acids and lipids. MALDI-TOF-MS is particularly suitable for the identification of proteins by mass fingerprint or microsequencing. Therefore it has become an important technique of proteomics. Furthermore, the method allows making a detailed analysis of post-translational protein modifications, protein-protein and protein-nucleic acid interactions. Recently, the method was also successfully applied to nucleic acid sequencing as well as screening for mutations.
New species of Bordetella, Bordetella ansorpii sp. nov., isolated from the purulent exudate of an epidermal cyst.

PubMed

Ko, Kwan Soo; Peck, Kyong Ran; Oh, Won Sup; Lee, Nam Yong; Lee, Jang Ho; Song, Jae-Hoon

2005-05-01

A gram-negative bacillus, SMC-8986(T), which was isolated from the purulent exudate of an epidermal cyst but could not be identified by a conventional microbiologic method, was characterized by a variety of phenotypic and genotypic analyses. Sequences of the 16S rRNA gene revealed that this bacterium belongs to the genus Bordetella but diverged distinctly from previously described Bordetella species. Analyses of cellular fatty acid composition and performance of biochemical tests confirmed that this bacterium is distinct from other Bordetella species. Furthermore, the results of comparative sequence analyses of two protein-coding genes (risA and ompA) also showed that this strain represents a new species within the genus Bordetella. Based on the evaluated phenotypic and genotypic characteristics, it is proposed that SMC-8986(T) should be classified as a new species, namely Bordetella ansorpii sp. nov.
Comparison of the aggregation of homologous β2-microglobulin variants reveals protein solubility as a key determinant of amyloid formation.

PubMed

Pashley, Clare L; Hewitt, Eric W; Radford, Sheena E

2016-02-13

The mouse and human β2-microglobulin protein orthologs are 70% identical in sequence and share 88% sequence similarity. These proteins are predicted by various algorithms to have similar aggregation and amyloid propensities. However, whilst human β2m (hβ2m) forms amyloid-like fibrils in denaturing conditions (e.g. pH2.5) in the absence of NaCl, mouse β2m (mβ2m) requires the addition of 0.3M NaCl to cause fibrillation. Here, the factors which give rise to this difference in amyloid propensity are investigated. We utilise structural and mutational analyses, fibril growth kinetics and solubility measurements under a range of pH and salt conditions, to determine why these two proteins have different amyloid propensities. The results show that, although other factors influence the fibril growth kinetics, a striking difference in the solubility of the proteins is a key determinant of the different amyloidogenicity of hβ2m and mβ2m. The relationship between protein solubility and lag time of amyloid formation is not captured by current aggregation or amyloid prediction algorithms, indicating a need to better understand the role of solubility on the lag time of amyloid formation. The results demonstrate the key contribution of protein solubility in determining amyloid propensity and lag time of amyloid formation, highlighting how small differences in protein sequence can have dramatic effects on amyloid formation. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
The Reference Genome of the Halophytic Plant Eutrema salsugineum

PubMed Central

Yang, Ruolin; Jarvis, David E.; Chen, Hao; Beilstein, Mark A.; Grimwood, Jane; Jenkins, Jerry; Shu, ShengQiang; Prochnik, Simon; Xin, Mingming; Ma, Chuang; Schmutz, Jeremy; Wing, Rod A.; Mitchell-Olds, Thomas; Schumaker, Karen S.; Wang, Xiangfeng

2013-01-01

Halophytes are plants that can naturally tolerate high concentrations of salt in the soil, and their tolerance to salt stress may occur through various evolutionary and molecular mechanisms. Eutrema salsugineum is a halophytic species in the Brassicaceae that can naturally tolerate multiple types of abiotic stresses that typically limit crop productivity, including extreme salinity and cold. It has been widely used as a laboratorial model for stress biology research in plants. Here, we present the reference genome sequence (241 Mb) of E. salsugineum at 8× coverage sequenced using the traditional Sanger sequencing-based approach with comparison to its close relative Arabidopsis thaliana. The E. salsugineum genome contains 26,531 protein-coding genes and 51.4% of its genome is composed of repetitive sequences that mostly reside in pericentromeric regions. Comparative analyses of the genome structures, protein-coding genes, microRNAs, stress-related pathways, and estimated translation efficiency of proteins between E. salsugineum and A. thaliana suggest that halophyte adaptation to environmental stresses may occur via a global network adjustment of multiple regulatory mechanisms. The E. salsugineum genome provides a resource to identify naturally occurring genetic alterations contributing to the adaptation of halophytic plants to salinity and that might be bioengineered in related crop species. PMID:23518688
Classifying Membrane Proteins in the Proteome by Using Artificial Neural Networks Based on the Preferential Parameters of Amino Acids

NASA Astrophysics Data System (ADS)

Bose, Subrata K.; Browne, Antony; Kazemian, Hassan; White, Kenneth

Membrane proteins (MPs) are large set of biological macromolecules that play a fundamental role in physiology and pathophysiology for survival. From a pharma-economical perspective, though it is the fact that MPs constitute ˜75% of possible targets for novel drugs but MPs are one of the most understudied groups of proteins in biochemical research. This is mainly because of the technical difficulties of obtaining structural information about trans-membrane regions (these are small sequences that crossways the bilayer lipid membrane). It is quite useful to predict the location of transmembrane segments down the sequence, since these are the elementary structural building blocks defining their topology. There have been several attempts over the last 20 years to develop tools for predicting membrane-spanning regions but current tools are far away from achieving a considerable reliability in prediction. This study aims to exploit the knowledge and current understanding in the field of artificial neural networks (ANNs) in particular data representation through the development of a system to identify and predict membrane-spanning regions by analysing primary amino acids sequence. In this paper we present a novel neural network (NNs) architecture and algorithms for predicting membrane spanning regions from primary amino acids sequences by using their preference parameters.
The sequence and structure of snake gourd (Trichosanthes anguina) seed lectin, a three-chain nontoxic homologue of type II RIPs.

PubMed

Sharma, Alok; Pohlentz, Gottfried; Bobbili, Kishore Babu; Jeyaprakash, A Arockia; Chandran, Thyageshwar; Mormann, Michael; Swamy, Musti J; Vijayan, M

2013-08-01

The sequence and structure of snake gourd seed lectin (SGSL), a nontoxic homologue of type II ribosome-inactivating proteins (RIPs), have been determined by mass spectrometry and X-ray crystallography, respectively. As in type II RIPs, the molecule consists of a lectin chain made up of two β-trefoil domains. The catalytic chain, which is connected through a disulfide bridge to the lectin chain in type II RIPs, is cleaved into two in SGSL. However, the integrity of the three-dimensional structure of the catalytic component of the molecule is preserved. This is the first time that a three-chain RIP or RIP homologue has been observed. A thorough examination of the sequence and structure of the protein and of its interactions with the bound methyl-α-galactose indicate that the nontoxicity of SGSL results from a combination of changes in the catalytic and the carbohydrate-binding sites. Detailed analyses of the sequences of type II RIPs of known structure and their homologues with unknown structure provide valuable insights into the evolution of this class of proteins. They also indicate some variability in carbohydrate-binding sites, which appears to contribute to the different levels of toxicity exhibited by lectins from various sources.
Using the TIGR gene index databases for biological discovery.

PubMed

Lee, Yuandan; Quackenbush, John

2003-11-01

The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
Expansion of divergent SEA domains in cell surface proteins and nucleoporin 54.

PubMed

Pei, Jimin; Grishin, Nick V

2017-03-01

SEA (sea urchin sperm protein, enterokinase, agrin) domains, many of which possess autoproteolysis activity, have been found in a number of cell surface and secreted proteins. Despite high sequence divergence, SEA domains were also proposed to be present in dystroglycan based on a conserved autoproteolysis motif and receptor-type protein phosphatase IA-2 based on structural similarity. The presence of a SEA domain adjacent to the transmembrane segment appears to be a recurring theme in quite a number of type I transmembrane proteins on the cell surface, such as MUC1, dystroglycan, IA-2, and Notch receptors. By comparative sequence and structural analyses, we identified dystroglycan-like proteins with SEA domains in Capsaspora owczarzaki of the Filasterea group, one of the closest single-cell relatives of metazoans. We also detected novel and divergent SEA domains in a variety of cell surface proteins such as EpCAM, α/ε-sarcoglycan, PTPRR, collectrin/Tmem27, amnionless, CD34, KIAA0319, fibrocystin-like protein, and a number of cadherins. While these proteins are mostly from metazoans or their single cell relatives such as choanoflagellates and Filasterea, fibrocystin-like proteins with SEA domains were found in several other eukaryotic lineages including green algae, Alveolata, Euglenozoa, and Haptophyta, suggesting an ancient evolutionary origin. In addition, the intracellular protein Nucleoporin 54 (Nup54) acquired a divergent SEA domain in choanoflagellates and metazoans. © 2016 The Protein Society.
Structural genomics analysis of uncharacterized protein families overrepresented in human gut bacteria identifies a novel glycoside hydrolase

PubMed Central

2014-01-01

Background Bacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism. Results BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications. Conclusions Structural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively. PMID:24742328
First Transcriptome and Digital Gene Expression Analysis in Neuroptera with an Emphasis on Chemoreception Genes in Chrysopa pallens (Rambur).

PubMed

Li, Zhao-Qun; Zhang, Shuai; Ma, Yan; Luo, Jun-Yu; Wang, Chun-Yi; Lv, Li-Min; Dong, Shuang-Lin; Cui, Jin-Jie

2013-01-01

Chrysopa pallens (Rambur) are the most important natural enemies and predators of various agricultural pests. Understanding the sophisticated olfactory system in insect antennae is crucial for studying the physiological bases of olfaction and also could lead to effective applications of C. pallens in integrated pest management. However no transcriptome information is available for Neuroptera, and sequence data for C. pallens are scarce, so obtaining more sequence data is a priority for researchers on this species. To facilitate identifying sets of genes involved in olfaction, a normalized transcriptome of C. pallens was sequenced. A total of 104,603 contigs were obtained and assembled into 10,662 clusters and 39,734 singletons; 20,524 were annotated based on BLASTX analyses. A large number of candidate chemosensory genes were identified, including 14 odorant-binding proteins (OBPs), 22 chemosensory proteins (CSPs), 16 ionotropic receptors, 14 odorant receptors, and genes potentially involved in olfactory modulation. To better understand the OBPs, CSPs and cytochrome P450s, phylogenetic trees were constructed. In addition, 10 digital gene expression libraries of different tissues were constructed and gene expression profiles were compared among different tissues in males and females. Our results provide a basis for exploring the mechanisms of chemoreception in C. pallens, as well as other insects. The evolutionary analyses in our study provide new insights into the differentiation and evolution of insect OBPs and CSPs. Our study provided large-scale sequence information for further studies in C. pallens.
Defining Electron Bifurcation in the Electron-Transferring Flavoprotein Family.

PubMed

Garcia Costas, Amaya M; Poudel, Saroj; Miller, Anne-Frances; Schut, Gerrit J; Ledbetter, Rhesa N; Fixen, Kathryn R; Seefeldt, Lance C; Adams, Michael W W; Harwood, Caroline S; Boyd, Eric S; Peters, John W

2017-11-01

Electron bifurcation is the coupling of exergonic and endergonic redox reactions to simultaneously generate (or utilize) low- and high-potential electrons. It is the third recognized form of energy conservation in biology and was recently described for select electron-transferring flavoproteins (Etfs). Etfs are flavin-containing heterodimers best known for donating electrons derived from fatty acid and amino acid oxidation to an electron transfer respiratory chain via Etf-quinone oxidoreductase. Canonical examples contain a flavin adenine dinucleotide (FAD) that is involved in electron transfer, as well as a non-redox-active AMP. However, Etfs demonstrated to bifurcate electrons contain a second FAD in place of the AMP. To expand our understanding of the functional variety and metabolic significance of Etfs and to identify amino acid sequence motifs that potentially enable electron bifurcation, we compiled 1,314 Etf protein sequences from genome sequence databases and subjected them to informatic and structural analyses. Etfs were identified in diverse archaea and bacteria, and they clustered into five distinct well-supported groups, based on their amino acid sequences. Gene neighborhood analyses indicated that these Etf group designations largely correspond to putative differences in functionality. Etfs with the demonstrated ability to bifurcate were found to form one group, suggesting that distinct conserved amino acid sequence motifs enable this capability. Indeed, structural modeling and sequence alignments revealed that identifying residues occur in the NADH- and FAD-binding regions of bifurcating Etfs. Collectively, a new classification scheme for Etf proteins that delineates putative bifurcating versus nonbifurcating members is presented and suggests that Etf-mediated bifurcation is associated with surprisingly diverse enzymes. IMPORTANCE Electron bifurcation has recently been recognized as an electron transfer mechanism used by microorganisms to maximize energy conservation. Bifurcating enzymes couple thermodynamically unfavorable reactions with thermodynamically favorable reactions in an overall spontaneous process. Here we show that the electron-transferring flavoprotein (Etf) enzyme family exhibits far greater diversity than previously recognized, and we provide a phylogenetic analysis that clearly delineates bifurcating versus nonbifurcating members of this family. Structural modeling of proteins within these groups reveals key differences between the bifurcating and nonbifurcating Etfs. Copyright © 2017 American Society for Microbiology.
Defining Electron Bifurcation in the Electron-Transferring Flavoprotein Family

PubMed Central

Garcia Costas, Amaya M.; Poudel, Saroj; Miller, Anne-Frances; Schut, Gerrit J.; Ledbetter, Rhesa N.; Seefeldt, Lance C.; Adams, Michael W. W.

2017-01-01

ABSTRACT Electron bifurcation is the coupling of exergonic and endergonic redox reactions to simultaneously generate (or utilize) low- and high-potential electrons. It is the third recognized form of energy conservation in biology and was recently described for select electron-transferring flavoproteins (Etfs). Etfs are flavin-containing heterodimers best known for donating electrons derived from fatty acid and amino acid oxidation to an electron transfer respiratory chain via Etf-quinone oxidoreductase. Canonical examples contain a flavin adenine dinucleotide (FAD) that is involved in electron transfer, as well as a non-redox-active AMP. However, Etfs demonstrated to bifurcate electrons contain a second FAD in place of the AMP. To expand our understanding of the functional variety and metabolic significance of Etfs and to identify amino acid sequence motifs that potentially enable electron bifurcation, we compiled 1,314 Etf protein sequences from genome sequence databases and subjected them to informatic and structural analyses. Etfs were identified in diverse archaea and bacteria, and they clustered into five distinct well-supported groups, based on their amino acid sequences. Gene neighborhood analyses indicated that these Etf group designations largely correspond to putative differences in functionality. Etfs with the demonstrated ability to bifurcate were found to form one group, suggesting that distinct conserved amino acid sequence motifs enable this capability. Indeed, structural modeling and sequence alignments revealed that identifying residues occur in the NADH- and FAD-binding regions of bifurcating Etfs. Collectively, a new classification scheme for Etf proteins that delineates putative bifurcating versus nonbifurcating members is presented and suggests that Etf-mediated bifurcation is associated with surprisingly diverse enzymes. IMPORTANCE Electron bifurcation has recently been recognized as an electron transfer mechanism used by microorganisms to maximize energy conservation. Bifurcating enzymes couple thermodynamically unfavorable reactions with thermodynamically favorable reactions in an overall spontaneous process. Here we show that the electron-transferring flavoprotein (Etf) enzyme family exhibits far greater diversity than previously recognized, and we provide a phylogenetic analysis that clearly delineates bifurcating versus nonbifurcating members of this family. Structural modeling of proteins within these groups reveals key differences between the bifurcating and nonbifurcating Etfs. PMID:28808132
MaxAlign: maximizing usable data in an alignment.

PubMed

Gouveia-Oliveira, Rodrigo; Sackett, Peter W; Pedersen, Anders G

2007-08-28

The presence of gaps in an alignment of nucleotide or protein sequences is often an inconvenience for bioinformatical studies. In phylogenetic and other analyses, for instance, gapped columns are often discarded entirely from the alignment. MaxAlign is a program that optimizes the alignment prior to such analyses. Specifically, it maximizes the number of nucleotide (or amino acid) symbols that are present in gap-free columns - the alignment area - by selecting the optimal subset of sequences to exclude from the alignment. MaxAlign can be used prior to phylogenetic and bioinformatical analyses as well as in other situations where this form of alignment improvement is useful. In this work we test MaxAlign's performance in these tasks and compare the accuracy of phylogenetic estimates including and excluding gapped columns from the analysis, with and without processing with MaxAlign. In this paper we also introduce a new simple measure of tree similarity, Normalized Symmetric Similarity (NSS) that we consider useful for comparing tree topologies. We demonstrate how MaxAlign is helpful in detecting misaligned or defective sequences without requiring manual inspection. We also show that it is not advisable to exclude gapped columns from phylogenetic analyses unless MaxAlign is used first. Finally, we find that the sequences removed by MaxAlign from an alignment tend to be those that would otherwise be associated with low phylogenetic accuracy, and that the presence of gaps in any given sequence does not seem to disturb the phylogenetic estimates of other sequences. The MaxAlign web-server is freely available online at http://www.cbs.dtu.dk/services/MaxAlign where supplementary information can also be found. The program is also freely available as a Perl stand-alone package.
Topological frustration in βα-repeat proteins: sequence diversity modulates the conserved folding mechanisms of α/β/α sandwich proteins

PubMed Central

Hills, Ronald D.; Kathuria, Sagar V.; Wallace, Louise A.; Day, Iain J.; Brooks, Charles L.; Matthews, C. Robert

2010-01-01

The thermodynamic hypothesis of Anfinsen postulates that structures and stabilities of globular proteins are determined by their amino acid sequences. Chain topology, however, is known to influence the folding reaction, in that motifs with a preponderance of local interactions typically fold more rapidly than those with a larger fraction of non-local interactions. Together, the topology and sequence can modulate the energy landscape and influence the rate at which the protein folds to the native conformation. To explore the relationship of sequence and topology in the folding of βα–repeat proteins, which are dominated by local interactions, a combined experimental and simulation analysis was performed on two members of the flavodoxin-like, α/β/α sandwich fold. Spo0F and the N-terminal receiver domain of NtrC (NT-NtrC) have similar topologies but low sequence identity, enabling a test of the effects of sequence on folding. Experimental results demonstrated that both response-regulator proteins fold via parallel channels through highly structured sub-millisecond intermediates before accessing their cis prolyl peptide bond-containing native conformations. Global analysis of the experimental results preferentially places these intermediates off the productive folding pathway. Sequence-sensitive Gō-model simulations conclude that frustration in the folding in Spo0F, corresponding to the appearance of the off-pathway intermediate, reflects competition for intra-subdomain van der Waals contacts between its N- and C-terminal subdomains. The extent of transient, premature structure appears to correlate with the number of isoleucine, leucine and valine (ILV) side-chains that form a large sequence-local cluster involving the central β-sheet and helices α2, α3 and α4. The failure to detect the off-pathway species in the simulations of NT-NtrC may reflect the reduced number of ILV side-chains in its corresponding hydrophobic cluster. The location of the hydrophobic clusters in the structure may also be related to the differing functional properties of these response regulators. Comparison with the results of previous experimental and simulation analyses on the homologous CheY argues that prematurely-folded unproductive intermediates are a common property of the βα-repeat motif. PMID:20226790
Molecular Evolution of Slow and Quick Anion Channels (SLACs and QUACs/ALMTs)

PubMed Central

Dreyer, Ingo; Gomez-Porras, Judith Lucia; Riaño-Pachón, Diego Mauricio; Hedrich, Rainer; Geiger, Dietmar

2012-01-01

Electrophysiological analyses conducted about 25 years ago detected two types of anion channels in the plasma membrane of guard cells. One type of channel responds slowly to changes in membrane voltage while the other responds quickly. Consequently, they were named SLAC, for SLow Anion Channel, and QUAC, for QUick Anion Channel. Recently, genes SLAC1 and QUAC1/ALMT12, underlying the two different anion current components, could be identified in the model plant Arabidopsis thaliana. Expression of the gene products in Xenopus oocytes confirmed the quick and slow current kinetics. In this study we provide an overview on our current knowledge on slow and quick anion channels in plants and analyze the molecular evolution of ALMT/QUAC-like and SLAC-like channels. We discovered fingerprints that allow screening databases for these channel types and were able to identify 192 (177 non-redundant) SLAC-like and 422 (402 non-redundant) ALMT/QUAC-like proteins in the fully sequenced genomes of 32 plant species. Phylogenetic analyses provided new insights into the molecular evolution of these channel types. We also combined sequence alignment and clustering with predictions of protein features, leading to the identification of known conserved phosphorylation sites in SLAC1-like channels along with potential sites that have not been yet experimentally confirmed. Using a similar strategy to analyze the hydropathicity of ALMT/QUAC-like channels, we propose a modified topology with additional transmembrane regions that integrates structure and function of these membrane proteins. Our results suggest that cross-referencing phylogenetic analyses with position-specific protein properties and functional data could be a very powerful tool for genome research approaches in general. PMID:23226151
Molecular Evolution of Slow and Quick Anion Channels (SLACs and QUACs/ALMTs).

PubMed

Dreyer, Ingo; Gomez-Porras, Judith Lucia; Riaño-Pachón, Diego Mauricio; Hedrich, Rainer; Geiger, Dietmar

2012-01-01

Electrophysiological analyses conducted about 25 years ago detected two types of anion channels in the plasma membrane of guard cells. One type of channel responds slowly to changes in membrane voltage while the other responds quickly. Consequently, they were named SLAC, for SLow Anion Channel, and QUAC, for QUick Anion Channel. Recently, genes SLAC1 and QUAC1/ALMT12, underlying the two different anion current components, could be identified in the model plant Arabidopsis thaliana. Expression of the gene products in Xenopus oocytes confirmed the quick and slow current kinetics. In this study we provide an overview on our current knowledge on slow and quick anion channels in plants and analyze the molecular evolution of ALMT/QUAC-like and SLAC-like channels. We discovered fingerprints that allow screening databases for these channel types and were able to identify 192 (177 non-redundant) SLAC-like and 422 (402 non-redundant) ALMT/QUAC-like proteins in the fully sequenced genomes of 32 plant species. Phylogenetic analyses provided new insights into the molecular evolution of these channel types. We also combined sequence alignment and clustering with predictions of protein features, leading to the identification of known conserved phosphorylation sites in SLAC1-like channels along with potential sites that have not been yet experimentally confirmed. Using a similar strategy to analyze the hydropathicity of ALMT/QUAC-like channels, we propose a modified topology with additional transmembrane regions that integrates structure and function of these membrane proteins. Our results suggest that cross-referencing phylogenetic analyses with position-specific protein properties and functional data could be a very powerful tool for genome research approaches in general.
Genome Sequence and Analysis of Escherichia coli MRE600, a Colicinogenic, Nonmotile Strain that Lacks RNase I and the Type I Methyltransferase, EcoKI

PubMed Central

Kurylo, Chad M.; Alexander, Noah; Dass, Randall A.; Parks, Matthew M.; Altman, Roger A.; Vincent, C. Theresa; Mason, Christopher E.; Blanchard, Scott C.

2016-01-01

Escherichia coli strain MRE600 was originally identified for its low RNase I activity and has therefore been widely adopted by the biomedical research community as a preferred source for the expression and purification of transfer RNAs and ribosomes. Despite its widespread use, surprisingly little information about its genome or genetic content exists. Here, we present the first de novo assembly and description of the MRE600 genome and epigenome. To provide context to these studies of MRE600, we include comparative analyses with E. coli K-12 MG1655 (K12). Pacific Biosciences Single Molecule, Real-Time sequencing reads were assembled into one large chromosome (4.83 Mb) and three smaller plasmids (89.1, 56.9, and 7.1 kb). Interestingly, the 7.1-kb plasmid possesses genes encoding a colicin E1 protein and its associated immunity protein. The MRE600 genome has a G + C content of 50.8% and contains a total of 5,181 genes, including 4,913 protein-encoding genes and 268 RNA genes. We identified 41,469 modified DNA bases (0.83% of total) and found that MRE600 lacks the gene for type I methyltransferase, EcoKI. Phylogenetic, taxonomic, and genetic analyses demonstrate that MRE600 is a divergent E. coli strain that displays features of the closely related genus, Shigella. Nevertheless, comparative analyses between MRE600 and E. coli K12 show that these two strains exhibit nearly identical ribosomal proteins, ribosomal RNAs, and highly homologous tRNA species. Substantiating prior suggestions that MRE600 lacks RNase I activity, the RNase I-encoding gene, rna, contains a single premature stop codon early in its open-reading frame. PMID:26802429

Single-molecule nanopore enzymology

PubMed Central

Wloka, Carsten; Maglia, Giovanni

2017-01-01

Biological nanopores are a class of membrane proteins that open nanoscale water-conduits in biological membranes. When they are reconstituted in artificial membranes and a bias voltage is applied across the membrane, the ionic current passing through individual nanopores can be used to monitor chemical reactions, to recognize individual molecules and, of most interest, to sequence DNA. More recently, proteins and enzymes have started being analysed with nanopores. Monitoring enzymatic reactions with nanopores, i.e. nanopore enzymology, has the unique advantage that it allows long-timescale observations of native proteins at the single-molecule level. Here we describe the approaches and challenges in nanopore enzymology. PMID:28630164
Arabidopsis ASYMMETRIC LEAVES2 protein required for leaf morphogenesis consistently forms speckles during mitosis of tobacco BY-2 cells via signals in its specific sequence.

PubMed

Luo, Lilan; Ando, Sayuri; Sasabe, Michiko; Machida, Chiyoko; Kurihara, Daisuke; Higashiyama, Tetsuya; Machida, Yasunori

2012-09-01

Leaf primordia with high division and developmental competencies are generated around the periphery of stem cells at the shoot apex. Arabidopsis ASYMMETRIC-LEAVES2 (AS2) protein plays a key role in the regulation of many genes responsible for flat symmetric leaf formation. The AS2 gene, expressed in leaf primordia, encodes a plant-specific nuclear protein containing an AS2/LOB domain with cysteine repeats (C-motif). AS2 proteins are present in speckles in and around the nucleoli, and in the nucleoplasm of some leaf epidermal cells. We used the tobacco cultured cell line BY-2 expressing the AS2-fused yellow fluorescent protein to examine subnuclear localization of AS2 in dividing cells. AS2 mainly localized to speckles (designated AS2 bodies) in cells undergoing mitosis and distributed in a pairwise manner during the separation of sets of daughter chromosomes. Few interphase cells contained AS2 bodies. Deletion analyses showed that a short stretch of the AS2 amino-terminal sequence and the C-motif play negative and positive roles, respectively, in localizing AS2 to the bodies. These results suggest that AS2 bodies function to properly distribute AS2 to daughter cells during cell division in leaf primordia; and this process is controlled at least partially by signals encoded by the AS2 sequence itself.
2-DE analysis indicates that Acinetobacter baumannii displays a robust and versatile metabolism

PubMed Central

Soares, Nelson C; Cabral, Maria P; Parreira, José R; Gayoso, Carmen; Barba, Maria J; Bou, Germán

2009-01-01

Background Acinetobacter baumannii is a nosocomial pathogen that has been associated with outbreak infections in hospitals. Despite increasing awareness about this bacterium, its proteome remains poorly characterised, however recently the complete genome of A. baumannii reference strain ATCC 17978 has been sequenced. Here, we have used 2-DE and MALDI-TOF/TOF approach to characterise the proteome of this strain. Results The membrane and cytoplasmatic protein extracts were analysed separately, these analyses revealed the reproducible presence of 239 and 511 membrane and cytoplamatic protein spots, respectively. MALDI-TOF/TOF characterisation identified a total of 192 protein spots (37 membrane and 155 cytoplasmatic) and revealed that the identified membrane proteins were mainly transport-related proteins, whereas the cytoplasmatic proteins were of diverse nature, although mainly related to metabolic processes. Conclusion This work indicates that A. baumannii has a versatile and robust metabolism and also reveal a number of proteins that may play a key role in the mechanism of drug resistance and virulence. The data obtained complements earlier reports of A. baumannii proteome and provides new tools to increase our knowledge on the protein expression profile of this pathogen. PMID:19785748
Guanine nucleotide-binding protein α subunit hypofunction in children with short stature and disproportionate shortening of the 4th and 5th metacarpals.

PubMed

Inta, Ioana Monica; Choukair, Daniela; Bender, Sebastian; Kneppo, Carolin; Knauer-Fischer, Sabine; Meyenburg, Kahina; Ivandic, Boris; Pfister, Stefan M; Bettendorf, Markus

2014-01-01

GNAS encodes the α subunit of the stimulatory G protein (Gsα). Maternal inherited Gsα mutations cause pseudohypoparathyroidism type Ia (PHP-Ia), associated with shortening of the 4th and 5th metacarpals. Here we investigated the Gsα pathway in short patients with distinct shortening of the 4th and 5th metacarpals. In 571 children with short stature and 4 patients with PHP-Ia metacarpal bone lengths were measured. In identified patients we analysed the Gsα protein function in platelets, performed GNAS sequencing, and epigenetic analysis of four significant differentially methylated regions. In 51 patients (8.9%) shortening of the 4th and 5th metacarpals was more pronounced than their height deficit. No GNAS coding mutations were identified in 20 analysed patients, except in 2 PHP-Ia patients. Gsα activity was reduced in all PHP-Ia patients and in 25% of the analysed patients. No significant methylation changes were identified. Our findings suggest that patients with short stature and distinct metacarpal bone shortening could be part of the wide variety of PHP/PPHP, therefore it was worthwhile analysing the Gsα protein function and GNAS gene in these patients in order to further elucidate the phenotype and genotype of Gsα dysfunction.
Amino acid pair- and triplet-wise groupings in the interior of α-helical segments in proteins.

PubMed

de Sousa, Miguel M; Munteanu, Cristian R; Pazos, Alejandro; Fonseca, Nuno A; Camacho, Rui; Magalhães, A L

2011-02-21

A statistical approach has been applied to analyse primary structure patterns at inner positions of α-helices in proteins. A systematic survey was carried out in a recent sample of non-redundant proteins selected from the Protein Data Bank, which were used to analyse α-helix structures for amino acid pairing patterns. Only residues more than three positions apart from both termini of the α-helix were considered as inner. Amino acid pairings i, i+k (k=1, 2, 3, 4, 5), were analysed and the corresponding 20×20 matrices of relative global propensities were constructed. An analysis of (i, i+4, i+8) and (i, i+3, i+4) triplet patterns was also performed. These analysis yielded information on a series of amino acid patterns (pairings and triplets) showing either high or low preference for α-helical motifs and suggested a novel approach to protein alphabet reduction. In addition, it has been shown that the individual amino acid propensities are not enough to define the statistical distribution of these patterns. Global pair propensities also depend on the type of pattern, its composition and orientation in the protein sequence. The data presented should prove useful to obtain and refine useful predictive rules which can further the development and fine-tuning of protein structure prediction algorithms and tools. Copyright Â© 2010 Elsevier Ltd. All rights reserved.
Opposite Stereoselectivities of Dirigent Proteins in Arabidopsis and Schizandra Species

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Kye-Won; Moinuddin, Syed G. A.; Atwell, Kathleen M.

2012-08-01

How stereoselective monolignol-derived phenoxy radical-radical coupling reactions are differentially biochemically orchestrated in planta, whereby for example they afford (+)- and (-)-pinoresinols, respectively, is both a fascinating mechanistic and evolutionary question. In earlier work, biochemical control of (+)-pinoresinol formation had been established to be engendered by a (+)-pinoresinol-forming dirigent protein in Forsythia intermedia, whereas the presence of a (-)-pinoresinol-forming dirigent protein was indirectly deduced based on the enantiospecificity of downstream pinoresinol reductases (AtPrRs) in Arabidopsis thaliana root tissue. In this study of 16 putative dirigent protein homologs in Arabidopsis, AtDIR6, AtDIR10, and AtDIR13 were established to be root-specific using a β-glucuronidasemore » reporter gene strategy. Of these three, in vitro analyses established that only recombinant AtDIR6 was a (-)-pinoresinol-forming dirigent protein, whose physiological role was further confirmed using overexpression and RNAi strategies in vivo. Interestingly, its closest homolog, AtDIR5, was also established to be a (-)-pinoresinol-forming dirigent protein based on in vitro biochemical analyses. Both of these were compared in terms of properties with a (+)-pinoresinol-forming dirigent protein from Schizandra chinensis. In this context, sequence analyses, site-directed mutagenesis, and region swapping resulted in identification of putative substrate binding sites/regions and candidate residues controlling distinct stereoselectivities of coupling modes.« less
A stoichiometry driven universal spatial organization of backbones of folded proteins: are there Chargaff's rules for protein folding?

PubMed

Mittal, A; Jayaram, B; Shenoy, Sandhya; Bawa, Tejdeep Singh

2010-10-01

Protein folding is at least a six decade old problem, since the times of Pauling and Anfinsen. However, rules of protein folding remain elusive till date. In this work, rigorous analyses of several thousand crystal structures of folded proteins reveal a surprisingly simple unifying principle of backbone organization in protein folding. We find that protein folding is a direct consequence of a narrow band of stoichiometric occurrences of amino-acids in primary sequences, regardless of the size and the fold of a protein. We observe that "preferential interactions" between amino-acids do not drive protein folding, contrary to all prevalent views. We dedicate our discovery to the seminal contribution of Chargaff which was one of the major keys to elucidation of the stoichiometry-driven spatially organized double helical structure of DNA.
Data mining of enzymes using specific peptides

PubMed Central

2009-01-01

Background Predicting the function of a protein from its sequence is a long-standing challenge of bioinformatic research, typically addressed using either sequence-similarity or sequence-motifs. We employ the novel motif method that consists of Specific Peptides (SPs) that are unique to specific branches of the Enzyme Commission (EC) functional classification. We devise the Data Mining of Enzymes (DME) methodology that allows for searching SPs on arbitrary proteins, determining from its sequence whether a protein is an enzyme and what the enzyme's EC classification is. Results We extract novel SP sets from Swiss-Prot enzyme data. Using a training set of July 2006, and test sets of July 2008, we find that the predictive power of SPs, both for true-positives (enzymes) and true-negatives (non-enzymes), depends on the coverage length of all SP matches (the number of amino-acids matched on the protein sequence). DME is quite different from BLAST. Comparing the two on an enzyme test set of July 2008, we find that DME has lower recall. On the other hand, DME can provide predictions for proteins regarded by BLAST as having low homologies with known enzymes, thus supplying complementary information. We test our method on a set of proteins belonging to 10 bacteria, dated July 2008, establishing the usefulness of the coverage-length cutoff to determine true-negatives. Moreover, sifting through our predictions we find that some of them have been substantiated by Swiss-Prot annotations by July 2009. Finally we extract, for production purposes, a novel SP set trained on all Swiss-Prot enzymes as of July 2009. This new set increases considerably the recall of DME. The new SP set is being applied to three metagenomes: Sargasso Sea with over 1,000,000 proteins, producing predictions of over 220,000 enzymes, and two human gut metagenomes. The outcome of these analyses can be characterized by the enzymatic profile of the metagenomes, describing the relative numbers of enzymes observed for different EC categories. Conclusions Employing SPs for predicting enzymatic activity of proteins works well once one utilizes coverage-length criteria. In our analysis, L ≥ 7 has led to highly accurate results. PMID:20034383
Mitogenome Sequencing in the Genus Camelus Reveals Evidence for Purifying Selection and Long-term Divergence between Wild and Domestic Bactrian Camels.

PubMed

Mohandesan, Elmira; Fitak, Robert R; Corander, Jukka; Yadamsuren, Adiya; Chuluunbat, Battsetseg; Abdelhadi, Omer; Raziq, Abdul; Nagy, Peter; Stalder, Gabrielle; Walzer, Chris; Faye, Bernard; Burger, Pamela A

2017-08-30

The genus Camelus is an interesting model to study adaptive evolution in the mitochondrial genome, as the three extant Old World camel species inhabit hot and low-altitude as well as cold and high-altitude deserts. We sequenced 24 camel mitogenomes and combined them with three previously published sequences to study the role of natural selection under different environmental pressure, and to advance our understanding of the evolutionary history of the genus Camelus. We confirmed the heterogeneity of divergence across different components of the electron transport system. Lineage-specific analysis of mitochondrial protein evolution revealed a significant effect of purifying selection in the concatenated protein-coding genes in domestic Bactrian camels. The estimated dN/dS < 1 in the concatenated protein-coding genes suggested purifying selection as driving force for shaping mitogenome diversity in camels. Additional analyses of the functional divergence in amino acid changes between species-specific lineages indicated fixed substitutions in various genes, with radical effects on the physicochemical properties of the protein products. The evolutionary time estimates revealed a divergence between domestic and wild Bactrian camels around 1.1 [0.58-1.8] million years ago (mya). This has major implications for the conservation and management of the critically endangered wild species, Camelus ferus.
Evolutionary implications of phylogenetic analyses of the gene transfer agent (GTA) of Rhodobacter capsulatus.

PubMed

Lang, Andrew S; Taylor, Terumi A; Beatty, J Thomas

2002-11-01

The gene transfer agent (GTA) of the a-proteobacterium Rhodobacter capsulatus is a cell-controlled genetic exchange vector. Genes that encode the GTA structure are clustered in a 15-kb region of the R. capsulatus chromosome, and some of these genes show sequence similarity to known bacteriophage head and tail genes. However, the production of GTA is controlled at the level of transcription by a cellular two-component signal transduction system. This paper describes homologues of both the GTA structural gene cluster and the GTA regulatory genes in the a-proteobacteria Rhodopseudomonas palustris, Rhodobacter sphaeroides, Caulobacter crescentus, Agrobacterium tumefaciens and Brucella melitensis. These sequences were used in a phylogenetic tree approach to examine the evolutionary relationships of selected GTA proteins to these homologues and (pro)phage proteins, which was compared to a 16S rRNA tree. The data indicate that a GTA-like element was present in a single progenitor of the extant species that contain both GTA structural cluster and regulatory gene homologues. The evolutionary relationships of GTA structural proteins to (pro)phage proteins indicated by the phylogenetic tree patterns suggest a predominantly vertical descent of GTA-like sequences in the a-proteobacteria and little past gene exchange with (pro)phages.
A novel BLAST-Based Relative Distance (BBRD) method can effectively group members of protein arginine methyltransferases and suggest their evolutionary relationship.

PubMed

Wang, Yi-Chun; Wang, Jing-Doo; Chen, Chin-Han; Chen, Yi-Wen; Li, Chuan

2015-03-01

We developed a novel BLAST-Based Relative Distance (BBRD) method by Pearson's correlation coefficient to avoid the problems of tedious multiple sequence alignment and complicated outgroup selection. We showed its application on reconstructing reliable phylogeny for nucleotide and protein sequences as exemplified by the fmr-1 gene and dihydrolipoamide dehydrogenase, respectively. We then used BBRD to resolve 124 protein arginine methyltransferases (PRMTs) that are homologues of nine mammalian PRMTs. The tree placed the uncharacterized PRMT9 with PRMT7 in the same clade, outside of all the Type I PRMTs including PRMT1 and its vertebrate paralogue PRMT8, PRMT3, PRMT6, PRMT2 and PRMT4. The PRMT7/9 branch then connects with the type II PRMT5. Some non-vertebrates contain different PRMTs without high sequence homology with the mammalian PRMTs. For example, in the case of Drosophila arginine methyltransferase (DART) and Trypanosoma brucei methyltransferases (TbPRMTs) in the analyses, the BBRD program grouped them with specific clades and thus suggested their evolutionary relationships. The BBRD method thus provided a great tool to construct a reliable tree for members of protein families through evolution. Copyright © 2015 Elsevier Inc. All rights reserved.
GC-rich coding sequences reduce transposon-like, small RNA-mediated transgene silencing.

PubMed

Sidorenko, Lyudmila V; Lee, Tzuu-Fen; Woosley, Aaron; Moskal, William A; Bevan, Scott A; Merlo, P Ann Owens; Walsh, Terence A; Wang, Xiujuan; Weaver, Staci; Glancy, Todd P; Wang, PoHao; Yang, Xiaozeng; Sriram, Shreedharan; Meyers, Blake C

2017-11-01

The molecular basis of transgene susceptibility to silencing is poorly characterized in plants; thus, we evaluated several transgene design parameters as means to reduce heritable transgene silencing. Analyses of Arabidopsis plants with transgenes encoding a microalgal polyunsaturated fatty acid (PUFA) synthase revealed that small RNA (sRNA)-mediated silencing, combined with the use of repetitive regulatory elements, led to aggressive transposon-like silencing of canola-biased PUFA synthase transgenes. Diversifying regulatory sequences and using native microalgal coding sequences (CDSs) with higher GC content improved transgene expression and resulted in a remarkable trans-generational stability via reduced accumulation of sRNAs and DNA methylation. Further experiments in maize with transgenes individually expressing three crystal (Cry) proteins from Bacillus thuringiensis (Bt) tested the impact of CDS recoding using different codon bias tables. Transgenes with higher GC content exhibited increased transcript and protein accumulation. These results demonstrate that the sequence composition of transgene CDSs can directly impact silencing, providing design strategies for increasing transgene expression levels and reducing risks of heritable loss of transgene expression.
An integrated metagenome and -proteome analysis of the microbial community residing in a biogas production plant.

PubMed

Ortseifen, Vera; Stolze, Yvonne; Maus, Irena; Sczyrba, Alexander; Bremges, Andreas; Albaum, Stefan P; Jaenicke, Sebastian; Fracowiak, Jochen; Pühler, Alfred; Schlüter, Andreas

2016-08-10

To study the metaproteome of a biogas-producing microbial community, fermentation samples were taken from an agricultural biogas plant for microbial cell and protein extraction and corresponding metagenome analyses. Based on metagenome sequence data, taxonomic community profiling was performed to elucidate the composition of bacterial and archaeal sub-communities. The community's cytosolic metaproteome was represented in a 2D-PAGE approach. Metaproteome databases for protein identification were compiled based on the assembled metagenome sequence dataset for the biogas plant analyzed and non-corresponding biogas metagenomes. Protein identification results revealed that the corresponding biogas protein database facilitated the highest identification rate followed by other biogas-specific databases, whereas common public databases yielded insufficient identification rates. Proteins of the biogas microbiome identified as highly abundant were assigned to the pathways involved in methanogenesis, transport and carbon metabolism. Moreover, the integrated metagenome/-proteome approach enabled the examination of genetic-context information for genes encoding identified proteins by studying neighboring genes on the corresponding contig. Exemplarily, this approach led to the identification of a Methanoculleus sp. contig encoding 16 methanogenesis-related gene products, three of which were also detected as abundant proteins within the community's metaproteome. Thus, metagenome contigs provide additional information on the genetic environment of identified abundant proteins. Copyright © 2016 Elsevier B.V. All rights reserved.
Mouse Vk gene classification by nucleic acid sequence similarity.

PubMed

Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

1989-01-01

Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.
Molecular and immunological characterisation of the glycosylated orange allergen Cit s 1

PubMed Central

Pöltl, Gerald; Ahrazem, Oussama; Paschinger, Katharina; Ibañez, M. Dolores; Salcedo, Gabriel; Wilson, Iain B. H.

2010-01-01

The IgE of sera from patients with a history of allergy to oranges (Citrus sinensis) bind a number of proteins in orange extract, including Cit s 1, a germin-like protein. In the present study, we have analysed its immunological cross-reactivity and its molecular nature. Sera from many of the patients examined recognise a range of glycoproteins and neoglycoconjugates containing β1,2-xylose and core α1,3-fucose on their N-glycans. These reagents also inhibited the interaction of Cit s 1 with patients’ sera, thus underlining the critical role of glycosylation in the recognition of this protein by patients’ IgE and extending previous data showing that deglycosylated Cit s 1 does not possess IgE epitopes. In parallel, we examined the peptide sequence and glycan structure of Cit s 1 using mass spectrometric techniques. Indeed, we achieved complete sequence coverage of the mature protein as compared to the translation of an expressed sequence tag cDNA clone and demonstrated that the single N-glycosylation site of this protein carries oligosaccharides with xylose and fucose residues. Due to the presumed requirement for multivalency for in vivo allergenicity, our molecular data showing that Cit s 1 is monovalent as regards glycosylation and that the single N-glycan is the target of the IgE response to this protein, therefore, explain the immunological cross-reactive properties of Cit s 1 as well as its equivocal nature as a clinically-relevant allergen. PMID:17095532
Analyses of Mitogenome Sequences Revealed that Asian Citrus Psyllids (Diaphorina citri) from California Were Related to Those from Florida.

PubMed

Wu, Fengnian; Kumagai, Luci; Cen, Yijing; Chen, Jianchi; Wallis, Christopher M; Polek, MaryLou; Jiang, Hongyan; Zheng, Zheng; Liang, Guangwen; Deng, Xiaoling

2017-08-31

Asian citrus psyllid (ACP, Diaphorina citri Kuwayama) transmits "Candidatus Liberibacter asiaticus" (CLas), an unculturable alpha-proteobacterium associated with citrus Huanglongbing (HLB). CLas has recently been found in California. Understanding ACP population diversity is necessary for HLB regulatory practices aimed at reducing CLas spread. In this study, two circular ACP mitogenome sequences from California (mt-CApsy, ~15,027 bp) and Florida (mt-FLpsy, ~15,012 bp), USA, were acquired. Each mitogenome contained 13 protein coding genes, 2 ribosomal RNA and 22 transfer RNA genes, and a control region varying in sizes. The Californian mt-CApsy was identical to the Floridian mt-FLpsy, but different from the mitogenome (mt-GDpsy) of Guangdong, China, in 50 single nucleotide polymorphisms (SNPs). Further analyses were performed on sequences in cox1 and trnAsn regions with 100 ACPs, SNPs in nad1-nad4-nad5 locus through PCR with 252 ACP samples. All results showed the presence of a Chinese ACP cluster (CAC) and an American ACP cluster (AAC). We proposed that ACP in California was likely not introduced from China based on our current ACP collection but somewhere in America. However, more studies with ACP samples from around the world are needed. ACP mitogenome sequence analyses will facilitate ACP population research.
PCR and RFLP analyses based on the ribosomal protein operon

USDA-ARS?s Scientific Manuscript database

Differentiation and classification of phytoplasmas have been primarily based on the highly conserved 16Sr RNA gene. RFLP analysis of 16Sr RNA gene sequences has identified 31 16Sr RNA (16Sr) groups and more than 100 16Sr subgroups. Classification of phytoplasma strains can however, become more refin...
Amblyomma imitator Ticks as Vectors of Rickettsia rickettsii, Mexico

PubMed Central

Oliveira, Karla A.; Pinter, Adriano; Medina-Sanchez, Aaron; Boppana, Venkata D.; Wikel, Stephen K.; Saito, Tais B.; Shelite, Thomas; Blanton, Lucas; Popov, Vsevolod; Teel, Pete D.; Walker, David H.; Galvao, Marcio A.M.; Mafra, Claudio

2010-01-01

Real-time PCR of Amblyomma imitator tick egg masses obtained in Nuevo Leon State, Mexico, identified a Rickettsia species. Sequence analyses of 17-kD common antigen and outer membrane protein A and B gene fragments showed to it to be R. rickettsii, which suggested a potential new vector for this bacterium. PMID:20678325
Why Choose This One? Factors in Scientists' Selection of Bioinformatics Tools

ERIC Educational Resources Information Center

Bartlett, Joan C.; Ishimura, Yusuke; Kloda, Lorie A.

2011-01-01

Purpose: The objective was to identify and understand the factors involved in scientists' selection of preferred bioinformatics tools, such as databases of gene or protein sequence information (e.g., GenBank) or programs that manipulate and analyse biological data (e.g., BLAST). Methods: Eight scientists maintained research diaries for a two-week…
Occurrence of hemocyanin in ostracod crustaceans.

PubMed

Marxen, Julia C; Pick, Christian; Oakley, Todd H; Burmester, Thorsten

2014-08-01

Hemocyanin is a copper-containing protein that transports O2 in the hemolymph of many arthropod species. Within the crustaceans, hemocyanin appeared to be restricted to Malacostraca but has recently been identified in Remipedia. Here, we report the occurrence of hemocyanin in ostracods, indicating that this respiratory protein is more widespread within crustaceans than previously thought. By analyses of expressed sequence tags and by RT-PCR, we obtained four full length and nine partial hemocyanin sequences from six of ten investigated ostracod species. Hemocyanin was identified in Myodocopida (Actinoseta jonesi, Cypridininae sp., Euphilomedes morini, Skogsbergia lerneri, Vargula tsujii) and Platycopida (Cytherelloidea californica) but not in Podocopida. We found no evidence for the presence of hemoglobin in any of these ostracod species. Like in other arthropods, we identified multiple hemocyanin subunits (up to six) to occur in a single ostracod species. Bayesian phylogenetic analyses showed that ostracod hemocyanin subunit diversity evolved independently from that of other crustaceans. Ostracod hemocyanin subunits were found paraphyletic, with myodocopid and platycopid subunits forming distinct clades within those of the crustaceans. This pattern suggests that ostracod hemocyanins originated from distinct subunits in the pancrustacean stemline.

Investigations on the ORF 167L of lymphocystis disease virus (Iridoviridae).

PubMed

Essbauer, Sandra; Fischer, Uwe; Bergmann, Sven; Ahne, Winfried

2004-01-01

The predicted open reading frame 167L (ORF 167L) of lymphocystis disease virus (LCDV, Iridoviridae ) isolated from plaice, dab and flounder was investigated. The ORF 167L corresponding genes of the three LCDV isolates were amplified, cloned and sequenced. A comparison of the LCDV strains showed that the nucleotide sequence of ORF 167L and its deduced amino acid sequence were highly conserved in the genus lymphocystivirus (a homology of 80% in dab and flounder/plaice, 97% in plaice and flounder). The N-terminus protein predicted from the ORF 167L suggests similarities to the tumor necrosis factor receptor (TNFR)-family, and to TNFR-like proteins, which play an important role in various poxvirus species. Further, homology to the CUB-domain was shown at the C-terminus of the LCDV protein. Phylogenetic analyses of partial LCDV protein sequences identified two clusters: one cluster containing the flounder and plaice LCDV isolate (LCDV-1), and another cluster, containing the dab LCDV isolate (LCDV-2). The ORF 167L of plaice LCDV was expressed in Escherichia coli, and in fish cells. The expressed ORF resulted in a 30-kDa cytoplasmic protein lacking a signal peptide. An established monoclonal antibody (mAb 18) was used to detect LCDV proteins in skin explants of flounders and cryosections of dab skin. Specific fluorescence was found in the cytoplasm of intact epitheloid cells of the lymphocystis capsule and in the epidermis skin covering the lymphocystic nodules. LCDV-specific labelling of mAb 18 was also shown in spleen and liver tissue of LCDV-positive flounders. The ORF 167L protein seemed not to have the extracellular receptor function predicted from the usual cellular TNFR. The myxomavirus M-T2 protein, a poxviral TNFR homologue, was also shown not to have TNFR-like functions but to be involved in the apoptosis signal cascade.
Epizootiology of Tacaribe serocomplex viruses (Arenaviridae) associated with neotomine rodents (Cricetidae, Neotominae) in southern California.

PubMed

Milazzo, Mary Louise; Cajimat, Maria N B; Mauldin, Matthew R; Bennett, Stephen G; Hess, Barry D; Rood, Michael P; Conlan, Christopher A; Nguyen, Kiet; Wekesa, J Wakoli; Ramos, Ronald D; Bradley, Robert D; Fulhorst, Charles F

2015-02-01

The objective of this study was to advance our knowledge of the epizootiology of Bear Canyon virus and other Tacaribe serocomplex viruses (Arenaviridae) associated with wild rodents in California. Antibody (immunoglobulin G [IgG]) to a Tacaribe serocomplex virus was found in 145 (3.6%) of 3977 neotomine rodents (Cricetidae: Neotominae) captured in six counties in southern California. The majority (122 or 84.1%) of the 145 antibody-positive rodents were big-eared woodrats (Neotoma macrotis) or California mice (Peromyscus californicus). The 23 other antibody-positive rodents included a white-throated woodrat (N. albigula), desert woodrat (N. lepida), Bryant's woodrats (N. bryanti), brush mice (P. boylii), cactus mice (P. eremicus), and deer mice (P. maniculatus). Analyses of viral nucleocapsid protein gene sequence data indicated that Bear Canyon virus is associated with N. macrotis and/or P. californicus in Santa Barbara County, Los Angeles County, Orange County, and western Riverside County. Together, analyses of field data and antibody prevalence data indicated that N. macrotis is the principal host of Bear Canyon virus. Last, the analyses of viral nucleocapsid protein gene sequence data suggested that the Tacaribe serocomplex virus associated with N. albigula and N. lepida in eastern Riverside County represents a novel species (tentatively named "Palo Verde virus") in the genus Arenavirus.
A comparative study of cold- and warm-adapted Endonucleases A using sequence analyses and molecular dynamics simulations.

PubMed

Michetti, Davide; Brandsdal, Bjørn Olav; Bon, Davide; Isaksen, Geir Villy; Tiberti, Matteo; Papaleo, Elena

2017-01-01

The psychrophilic and mesophilic endonucleases A (EndA) from Aliivibrio salmonicida (VsEndA) and Vibrio cholera (VcEndA) have been studied experimentally in terms of the biophysical properties related to thermal adaptation. The analyses of their static X-ray structures was no sufficient to rationalize the determinants of their adaptive traits at the molecular level. Thus, we used Molecular Dynamics (MD) simulations to compare the two proteins and unveil their structural and dynamical differences. Our simulations did not show a substantial increase in flexibility in the cold-adapted variant on the nanosecond time scale. The only exception is a more rigid C-terminal region in VcEndA, which is ascribable to a cluster of electrostatic interactions and hydrogen bonds, as also supported by MD simulations of the VsEndA mutant variant where the cluster of interactions was introduced. Moreover, we identified three additional amino acidic substitutions through multiple sequence alignment and the analyses of MD-based protein structure networks. In particular, T120V occurs in the proximity of the catalytic residue H80 and alters the interaction with the residue Y43, which belongs to the second coordination sphere of the Mg2+ ion. This makes T120V an amenable candidate for future experimental mutagenesis.
Proteins with an Euonymus lectin-like domain are ubiquitous in Embryophyta

PubMed Central

2009-01-01

Background Cloning of the Euonymus lectin led to the discovery of a novel domain that also occurs in some stress-induced plant proteins. The distribution and the diversity of proteins with an Euonymus lectin (EUL) domain were investigated using detailed analysis of sequences in publicly accessible genome and transcriptome databases. Results Comprehensive in silico analyses indicate that the recently identified Euonymus europaeus lectin domain represents a conserved structural unit of a novel family of putative carbohydrate-binding proteins, which will further be referred to as the Euonymus lectin (EUL) family. The EUL domain is widespread among plants. Analysis of retrieved sequences revealed that some sequences consist of a single EUL domain linked to an unrelated N-terminal domain whereas others comprise two in tandem arrayed EUL domains. A new classification system for these lectins is proposed based on the overall domain architecture. Evolutionary relationships among the sequences with EUL domains are discussed. Conclusion The identification of the EUL family provides the first evidence for the occurrence in terrestrial plants of a highly conserved plant specific domain. The widespread distribution of the EUL domain strikingly contrasts the more limited or even narrow distribution of most other lectin domains found in plants. The apparent omnipresence of the EUL domain is indicative for a universal role of this lectin domain in plants. Although there is unambiguous evidence that several EUL domains possess carbohydrate-binding activity further research is required to corroborate the carbohydrate-binding properties of different members of the EUL family. PMID:19930663
An Evolutionarily Young Polar Bear (Ursus maritimus) Endogenous Retrovirus Identified from Next Generation Sequence Data.

PubMed

Tsangaras, Kyriakos; Mayer, Jens; Alquezar-Planas, David E; Greenwood, Alex D

2015-11-24

Transcriptome analysis of polar bear (Ursus maritimus) tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV). Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs) of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos) and black bear (Ursus americanus) but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs) and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals.
Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes).

PubMed

Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

2011-09-01

Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.
Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)

PubMed Central

Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

2011-01-01

Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341
An Evolutionarily Young Polar Bear (Ursus maritimus) Endogenous Retrovirus Identified from Next Generation Sequence Data

PubMed Central

Tsangaras, Kyriakos; Mayer, Jens; Alquezar-Planas, David E.; Greenwood, Alex D.

2015-01-01

Transcriptome analysis of polar bear (Ursus maritimus) tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV). Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs) of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos) and black bear (Ursus americanus) but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs) and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals. PMID:26610552
Properties of Protein Drug Target Classes

PubMed Central

Bull, Simon C.; Doig, Andrew J.

2015-01-01

Accurate identification of drug targets is a crucial part of any drug development program. We mined the human proteome to discover properties of proteins that may be important in determining their suitability for pharmaceutical modulation. Data was gathered concerning each protein’s sequence, post-translational modifications, secondary structure, germline variants, expression profile and drug target status. The data was then analysed to determine features for which the target and non-target proteins had significantly different values. This analysis was repeated for subsets of the proteome consisting of all G-protein coupled receptors, ion channels, kinases and proteases, as well as proteins that are implicated in cancer. Machine learning was used to quantify the proteins in each dataset in terms of their potential to serve as a drug target. This was accomplished by first inducing a random forest that could distinguish between its targets and non-targets, and then using the random forest to quantify the drug target likeness of the non-targets. The properties that can best differentiate targets from non-targets were primarily those that are directly related to a protein’s sequence (e.g. secondary structure). Germline variants, expression levels and interactions between proteins had minimal discriminative power. Overall, the best indicators of drug target likeness were found to be the proteins’ hydrophobicities, in vivo half-lives, propensity for being membrane bound and the fraction of non-polar amino acids in their sequences. In terms of predicting potential targets, datasets of proteases, ion channels and cancer proteins were able to induce random forests that were highly capable of distinguishing between targets and non-targets. The non-target proteins predicted to be targets by these random forests comprise the set of the most suitable potential future drug targets, and should therefore be prioritised when building a drug development programme. PMID:25822509
The complete chloroplast genome sequences of Lychnis wilfordii and Silene capitata and comparative analyses with other Caryophyllaceae genomes.

PubMed

Kang, Jong-Soo; Lee, Byoung Yoon; Kwak, Myounghai

2017-01-01

The complete chloroplast genomes of Lychnis wilfordii and Silene capitata were determined and compared with ten previously reported Caryophyllaceae chloroplast genomes. The chloroplast genome sequences of L. wilfordii and S. capitata contain 152,320 bp and 150,224 bp, respectively. The gene contents and orders among 12 Caryophyllaceae species are consistent, but several microstructural changes have occurred. Expansion of the inverted repeat (IR) regions at the large single copy (LSC)/IRb and small single copy (SSC)/IR boundaries led to partial or entire gene duplications. Additionally, rearrangements of the LSC region were caused by gene inversions and/or transpositions. The 18 kb inversions, which occurred three times in different lineages of tribe Sileneae, were thought to be facilitated by the intermolecular duplicated sequences. Sequence analyses of the L. wilfordii and S. capitata genomes revealed 39 and 43 repeats, respectively, including forward, palindromic, and reverse repeats. In addition, a total of 67 and 56 simple sequence repeats were discovered in the L. wilfordii and S. capitata chloroplast genomes, respectively. Finally, we constructed phylogenetic trees of the 12 Caryophyllaceae species and two Amaranthaceae species based on 73 protein-coding genes using both maximum parsimony and likelihood methods.
PHYSICO2: an UNIX based standalone procedure for computation of physicochemical, window-dependent and substitution based evolutionary properties of protein sequences along with automated block preparation tool, version 2.

PubMed

Banerjee, Shyamashree; Gupta, Parth Sarthi Sen; Nayek, Arnab; Das, Sunit; Sur, Vishma Pratap; Seth, Pratyay; Islam, Rifat Nawaz Ul; Bandyopadhyay, Amal K

2015-01-01

Automated genome sequencing procedure is enriching the sequence database very fast. To achieve a balance between the entry of sequences in the database and their analyses, efficient software is required. In this end PHYSICO2, compare to earlier PHYSICO and other public domain tools, is most efficient in that it i] extracts physicochemical, window-dependent and homologousposition-based-substitution (PWS) properties including positional and BLOCK-specific diversity and conservation, ii] provides users with optional-flexibility in setting relevant input-parameters, iii] helps users to prepare BLOCK-FASTA-file by the use of Automated Block Preparation Tool of the program, iv] performs fast, accurate and user-friendly analyses and v] redirects itemized outputs in excel format along with detailed methodology. The program package contains documentation describing application of methods. Overall the program acts as efficient PWS-analyzer and finds application in sequence-bioinformatics. PHYSICO2: is freely available at http://sourceforge.net/projects/physico2/ along with its documentation at https://sourceforge.net/projects/physico2/files/Documentation.pdf/download for all users.
PHYSICO2: an UNIX based standalone procedure for computation of physicochemical, window-dependent and substitution based evolutionary properties of protein sequences along with automated block preparation tool, version 2

PubMed Central

Banerjee, Shyamashree; Gupta, Parth Sarthi Sen; Nayek, Arnab; Das, Sunit; Sur, Vishma Pratap; Seth, Pratyay; Islam, Rifat Nawaz Ul; Bandyopadhyay, Amal K

2015-01-01

Automated genome sequencing procedure is enriching the sequence database very fast. To achieve a balance between the entry of sequences in the database and their analyses, efficient software is required. In this end PHYSICO2, compare to earlier PHYSICO and other public domain tools, is most efficient in that it i] extracts physicochemical, window-dependent and homologousposition-based-substitution (PWS) properties including positional and BLOCK-specific diversity and conservation, ii] provides users with optional-flexibility in setting relevant input-parameters, iii] helps users to prepare BLOCK-FASTA-file by the use of Automated Block Preparation Tool of the program, iv] performs fast, accurate and user-friendly analyses and v] redirects itemized outputs in excel format along with detailed methodology. The program package contains documentation describing application of methods. Overall the program acts as efficient PWS-analyzer and finds application in sequence-bioinformatics. Availability PHYSICO2: is freely available at http://sourceforge.net/projects/physico2/ along with its documentation at https://sourceforge.net/projects/physico2/files/Documentation.pdf/download for all users. PMID:26339154
Pro-Inflammatory Flagellin Proteins of Prevalent Motile Commensal Bacteria Are Variably Abundant in the Intestinal Microbiome of Elderly Humans

PubMed Central

Neville, B. Anne; Sheridan, Paul O.; Harris, Hugh M. B.; Coughlan, Simone; Flint, Harry J.; Duncan, Sylvia H.; Jeffery, Ian B.; Claesson, Marcus J.; Ross, R. Paul; Scott, Karen P.; O'Toole, Paul W.

2013-01-01

Some Eubacterium and Roseburia species are among the most prevalent motile bacteria present in the intestinal microbiota of healthy adults. These flagellate species contribute “cell motility” category genes to the intestinal microbiome and flagellin proteins to the intestinal proteome. We reviewed and revised the annotation of motility genes in the genomes of six Eubacterium and Roseburia species that occur in the human intestinal microbiota and examined their respective locus organization by comparative genomics. Motility gene order was generally conserved across these loci. Five of these species harbored multiple genes for predicted flagellins. Flagellin proteins were isolated from R. inulinivorans strain A2-194 and from E. rectale strains A1-86 and M104/1. The amino-termini sequences of the R. inulinivorans and E. rectale A1-86 proteins were almost identical. These protein preparations stimulated secretion of interleukin-8 (IL-8) from human intestinal epithelial cell lines, suggesting that these flagellins were pro-inflammatory. Flagellins from the other four species were predicted to be pro-inflammatory on the basis of alignment to the consensus sequence of pro-inflammatory flagellins from the β- and γ- proteobacteria. Many fliC genes were deduced to be under the control of σ28. The relative abundance of the target Eubacterium and Roseburia species varied across shotgun metagenomes from 27 elderly individuals. Genes involved in the flagellum biogenesis pathways of these species were variably abundant in these metagenomes, suggesting that the current depth of coverage used for metagenomic sequencing (3.13–4.79 Gb total sequence in our study) insufficiently captures the functional diversity of genomes present at low (≤1%) relative abundance. E. rectale and R. inulinivorans thus appear to synthesize complex flagella composed of flagellin proteins that stimulate IL-8 production. A greater depth of sequencing, improved evenness of sequencing and improved metagenome assembly from short reads will be required to facilitate in silico analyses of complete complex biochemical pathways for low-abundance target species from shotgun metagenomes. PMID:23935906
Origin and Diversification of Basic-Helix-Loop-Helix Proteins in Plants

PubMed Central

Pires, Nuno; Dolan, Liam

2010-01-01

Basic helix-loop-helix (bHLH) proteins are a class of transcription factors found throughout eukaryotic organisms. Classification of the complete sets of bHLH proteins in the sequenced genomes of Arabidopsis thaliana and Oryza sativa (rice) has defined the diversity of these proteins among flowering plants. However, the evolutionary relationships of different plant bHLH groups and the diversity of bHLH proteins in more ancestral groups of plants are currently unknown. In this study, we use whole-genome sequences from nine species of land plants and algae to define the relationships between these proteins in plants. We show that few (less than 5) bHLH proteins are encoded in the genomes of chlorophytes and red algae. In contrast, many bHLH proteins (100–170) are encoded in the genomes of land plants (embryophytes). Phylogenetic analyses suggest that plant bHLH proteins are monophyletic and constitute 26 subfamilies. Twenty of these subfamilies existed in the common ancestors of extant mosses and vascular plants, whereas six further subfamilies evolved among the vascular plants. In addition to the conserved bHLH domains, most subfamilies are characterized by the presence of highly conserved short amino acid motifs. We conclude that much of the diversity of plant bHLH proteins was established in early land plants, over 440 million years ago. PMID:19942615
Identification of a Unique Fe-S Cluster Binding Site in a Glycyl-Radical Type Microcompartment Shell Protein

PubMed Central

Thompson, Michael C.; Wheatley, Nicole M.; Jorda, Julien; Sawaya, Michael R.; Gidaniyan, Soheil D.; Ahmed, Hoda; Yang, Zhongyu; McCarty, Krystal N.; Whitelegge, Julian P.; Yeates, Todd O.

2014-01-01

Recently, progress has been made toward understanding the functional diversity of bacterial microcompartment (MCP) systems, which serve as protein-based metabolic organelles in diverse microbes. New types of MCPs have been identified, including the glycyl-radical propanediol (Grp) MCP. Within these elaborate protein complexes, BMC-domain shell proteins assemble to form a polyhedral barrier that encapsulates the enzymatic contents of the MCP. Interestingly, the Grp MCP contains a number of shell proteins with unusual sequence features. GrpU is one such shell protein, whose amino acid sequence is particularly divergent from other members of the BMC-domain superfamily of proteins that effectively defines all MCPs. Expression, purification, and subsequent characterization of the protein showed, unexpectedly, that it binds an iron-sulfur cluster. We determined X-ray crystal structures of two GrpU orthologs, providing the first structural insight into the homohexameric BMC-domain shell proteins of the Grp system. The X-ray structures of GrpU, both obtained in the apo form, combined with spectroscopic analyses and computational modeling, show that the metal cluster resides in the central pore of the BMC shell protein at a position of broken 6-fold symmetry. The result is a structurally polymorphic iron-sulfur cluster binding site that appears to be unique among metalloproteins studied to date. PMID:25102080
Probing sequence dependence of folding pathway of α-helix bundle proteins through free energy landscape analysis.

PubMed

Shao, Qiang

2014-06-05

A comparative study on the folding of multiple three-α-helix bundle proteins including α3D, α3W, and the B domain of protein A (BdpA) is presented. The use of integrated-tempering-sampling molecular dynamics simulations achieves reversible folding and unfolding events in individual short trajectories, which thus provides an efficient approach to sufficiently sample the configuration space of protein and delineate the folding pathway of α-helix bundle. The detailed free energy landscape analyses indicate that the folding mechanism of α-helix bundle is not uniform but sequence dependent. A simple model is then proposed to predict folding mechanism of α-helix bundle on the basis of amino acid composition: α-helical proteins containing higher percentage of hydrophobic residues than charged ones fold via nucleation-condensation mechanism (e.g., α3D and BdpA) whereas proteins having opposite tendency in amino acid composition more likely fold via the framework mechanism (e.g., α3W). The model is tested on various α-helix bundle proteins, and the predicted mechanism is similar to the most approved one for each protein. In addition, the common features in the folding pathway of α-helix bundle protein are also deduced. In summary, the present study provides comprehensive, atomic-level picture of the folding of α-helix bundle proteins.
Structural analyses of the CRISPR protein Csc2 reveal the RNA-binding interface of the type I-D Cas7 family.

PubMed

Hrle, Ajla; Maier, Lisa-Katharina; Sharma, Kundan; Ebert, Judith; Basquin, Claire; Urlaub, Henning; Marchfelder, Anita; Conti, Elena

2014-01-01

Upon pathogen invasion, bacteria and archaea activate an RNA-interference-like mechanism termed CRISPR (clustered regularly interspaced short palindromic repeats). A large family of Cas (CRISPR-associated) proteins mediates the different stages of this sophisticated immune response. Bioinformatic studies have classified the Cas proteins into families, according to their sequences and respective functions. These range from the insertion of the foreign genetic elements into the host genome to the activation of the interference machinery as well as target degradation upon attack. Cas7 family proteins are central to the type I and type III interference machineries as they constitute the backbone of the large interference complexes. Here we report the crystal structure of Thermofilum pendens Csc2, a Cas7 family protein of type I-D. We found that Csc2 forms a core RRM-like domain, flanked by three peripheral insertion domains: a lid domain, a Zinc-binding domain and a helical domain. Comparison with other Cas7 family proteins reveals a set of similar structural features both in the core and in the peripheral domains, despite the absence of significant sequence similarity. T. pendens Csc2 binds single-stranded RNA in vitro in a sequence-independent manner. Using a crosslinking - mass-spectrometry approach, we mapped the RNA-binding surface to a positively charged surface patch on T. pendens Csc2. Thus our analysis of the key structural and functional features of T. pendens Csc2 highlights recurring themes and evolutionary relationships in type I and type III Cas proteins.
MitoNuc: a database of nuclear genes coding for mitochondrial proteins. Update 2002.

PubMed

Attimonelli, Marcella; Catalano, Domenico; Gissi, Carmela; Grillo, Giorgio; Licciulli, Flavio; Liuni, Sabino; Santamaria, Monica; Pesole, Graziano; Saccone, Cecilia

2002-01-01

Mitochondria, besides their central role in energy metabolism, have recently been found to be involved in a number of basic processes of cell life and to contribute to the pathogenesis of many degenerative diseases. All functions of mitochondria depend on the interaction of nuclear and organelle genomes. Mitochondrial genomes have been extensively sequenced and analysed and data have been collected in several specialised databases. In order to collect information on nuclear coded mitochondrial proteins we developed MitoNuc, a database containing detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa. The MitoNuc database can be retrieved through SRS and is available via the web site http://bighost.area.ba.cnr.it/mitochondriome where other mitochondrial databases developed by our group, the complete list of the sequenced mitochondrial genomes, links to other mitochondrial sites and related information, are available. The MitoAln database, related to MitoNuc in the previous release, reporting the multiple alignments of the relevant homologous protein coding regions, is no longer supported in the present release. In order to keep the links among entries in MitoNuc from homologous proteins, a new field in the database has been defined: the cluster identifier, an alpha numeric code used to identify each cluster of homologous proteins. A comment field derived from the corresponding SWISS-PROT entry has been introduced; this reports clinical data related to dysfunction of the protein. The logic scheme of MitoNuc database has been implemented in the ORACLE DBMS. This will allow the end-users to retrieve data through a friendly interface that will be soon implemented.
RNA-Seq analysis and transcriptome assembly for blackberry (Rubus sp. Var. Lochness) fruit.

PubMed

Garcia-Seco, Daniel; Zhang, Yang; Gutierrez-Mañero, Francisco J; Martin, Cathie; Ramos-Solano, Beatriz

2015-01-22

There is an increasing interest in berries, especially blackberries in the diet, because of recent reports of their health benefits due to their high content of flavonoids. A broad range of genomic tools are available for other Rosaceae species but these tools are still lacking in the Rubus genus, thus limiting gene discovery and the breeding of improved varieties. De novo RNA-seq of ripe blackberries grown under field conditions was performed using Illumina Hiseq 2000. Almost 9 billion nucleotide bases were sequenced in total. Following assembly, 42,062 consensus sequences were detected. For functional annotation, 33,040 (NR), 32,762 (NT), 21,932 (Swiss-Prot), 20,134 (KEGG), 13,676 (COG), 24,168 (GO) consensus sequences were annotated using different databases; in total 34,552 annotated sequences were identified. For protein prediction analysis, the number of coding DNA sequences (CDS) that mapped to the protein database was 32,540. Non redundant (NR), annotation showed that 25,418 genes (73.5%) has the highest similarity with Fragaria vesca subspecies vesca. Reanalysis was undertaken by aligning the reads with this reference genome for a deeper analysis of the transcriptome. We demonstrated that de novo assembly, using Trinity and later annotation with Blast using different databases, were complementary to alignment to the reference sequence using SOAPaligner/SOAP2. The Fragaria reference genome belongs to a species in the same family as blackberry (Rosaceae) but to a different genus. Since blackberries are tetraploids, the possibility of artefactual gene chimeras resulting from mis-assembly was tested with one of the genes sequenced by RNAseq, Chalcone Synthase (CHS). cDNAs encoding this protein were cloned and sequenced. Primers designed to the assembled sequences accurately distinguished different contigs, at least for chalcone synthase genes. We prepared and analysed transcriptome data from ripe blackberries, for which prior genomic information was limited. This new sequence information will improve the knowledge of this important and healthy fruit, providing an invaluable new tool for biological research.
Structure and sequence analyses of Bacteroides proteins BVU_4064 and BF1687 reveal presence of two novel predominantly-beta domains, predicted to be involved in lipid and cell surface interactions

DOE PAGES

Natarajan, Padmaja; Punta, Marco; Kumar, Abhinav; ...

2015-01-16

N-terminal domains of BVU_4064 and BF1687 proteins from Bacteroides vulgatus and Bacteroides fragilis respectively are members of the Pfam family PF12985 (DUF3869). Proteins containing a domain from this family can be found in most Bacteroides species and, in large numbers, in all human gut microbiome samples. Both BVU_4064 and BF1687 proteins have a consensus lipobox motif implying they are anchored to the membrane, but their functions are otherwise unknown. The C-terminal half of BVU_4064 is assigned to protein family PF12986 (DUF3870); the equivalent part of BF1687 was unclassified.

Coupling MALDI-TOF mass spectrometry protein and specialized metabolite analyses to rapidly discriminate bacterial function

PubMed Central

Clark, Chase M.; Costa, Maria S.

2018-01-01

For decades, researchers have lacked the ability to rapidly correlate microbial identity with bacterial metabolism. Since specialized metabolites are critical to bacterial function and survival in the environment, we designed a data acquisition and bioinformatics technique (IDBac) that utilizes in situ matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) to analyze protein and specialized metabolite spectra recorded from single bacterial colonies picked from agar plates. We demonstrated the power of our approach by discriminating between two Bacillus subtilis strains in <30 min solely on the basis of their differential ability to produce cyclic peptide antibiotics surfactin and plipastatin, caused by a single frameshift mutation. Next, we used IDBac to detect subtle intraspecies differences in the production of metal scavenging acyl-desferrioxamines in a group of eight freshwater Micromonospora isolates that share >99% sequence similarity in the 16S rRNA gene. Finally, we used IDBac to simultaneously extract protein and specialized metabolite MS profiles from unidentified Lake Michigan sponge-associated bacteria isolated from an agar plate. In just 3 h, we created hierarchical protein MS groupings of 11 environmental isolates (10 MS replicates each, for a total of 110 spectra) that accurately mirrored phylogenetic groupings. We further distinguished isolates within these groupings, which share nearly identical 16S rRNA gene sequence identity, based on interspecies and intraspecies differences in specialized metabolite production. IDBac is an attempt to couple in situ MS analyses of protein content and specialized metabolite production to allow for facile discrimination of closely related bacterial colonies. PMID:29686101
Identification of Abundantly Expressed Novel and Conserved Genes from the Infective Larval Stage of Toxocara canis by an Expressed Sequence Tag Strategy

PubMed Central

Tetteh, Kevin K. A.; Loukas, Alex; Tripp, Cindy; Maizels, Rick M.

1999-01-01

Larvae of Toxocara canis, a nematode parasite of dogs, infect humans, causing visceral and ocular larva migrans. In noncanid hosts, larvae neither grow nor differentiate but endure in a state of arrested development. Reasoning that parasite protein production is orientated to immune evasion, we undertook a random sequencing project from a larval cDNA library to characterize the most highly expressed transcripts. In all, 266 clones were sequenced, most from both 3′ and 5′ ends, and similarity searches against GenBank protein and dbEST nucleotide databases were conducted. Cluster analyses showed that 128 distinct gene products had been found, all but 3 of which represented newly identified genes. Ninety-five genes were represented by a single clone, but seven transcripts were present at high frequencies, each composing >2% of all clones sequenced. These high-abundance transcripts include a mucin and a C-type lectin, which are both major excretory-secretory antigens released by parasites. Four highly expressed novel gene transcripts, termed ant (abundant novel transcript) genes, were found. Together, these four genes comprised 18% of all cDNA clones isolated, but no similar sequences occur in the Caenorhabditis elegans genome. While the coding regions of the four genes are dissimilar, their 3′ untranslated tracts have significant homology in nucleotide sequence. The discovery of these abundant, parasite-specific genes of newly identified lectins and mucins, as well as a range of conserved and novel proteins, provides defined candidates for future analysis of the molecular basis of immune evasion by T. canis. PMID:10456930
Molecular genetic studies of DMT1 on 12q in French-Canadian restless legs syndrome patients and families.

PubMed

Xiong, Lan; Dion, Patrick; Montplaisir, Jacques; Levchenko, Anastasia; Thibodeau, Pascale; Karemera, Liliane; Rivière, Jean-Baptiste; St-Onge, Judith; Gaspar, Claudia; Dubé, Marie-Pierre; Desautels, Alex; Turecki, Gustavo; Rouleau, Guy A

2007-10-05

Converging evidence from clinical observations, brain imaging and pathological findings strongly indicate impaired brain iron regulation in restless legs syndrome (RLS). Animal models with mutation in (DMT1) divalent metal transporter 1 gene, an important brain iron transporter, demonstrate a similar iron deficiency profile as found in RLS brain. The human DMT1 gene, mapped to chromosome 12q near the RLS1 locus, qualifies as an excellent functional and possible positional candidate for RLS. DMT1 protein levels were assessed in lymphoblastoid cell lines from RLS patients and controls. Linkage analyses were carried out with markers flanking and within the DMT1 gene. Selected patient samples from RLS families with compatible linkage to the RLS1 locus on 12q were fully sequenced in both the coding regions and the long stretches of UTR sequences. Finally, selected sequence variants were further studied in case/control and family-based association tests. A clinical association of anemia and RLS was further confirmed in this study. There was no detectable difference in DMT1 protein levels between RLS patient lymphoblastoid cell lines and normal controls. Non-parametric linkage analyses failed to identify any significant linkage signals within the DMT1 gene region. Sequencing of selected patients did not detect any sequence variant(s) compatible with DMT1 harboring RLS causative mutation(s). Further studies did not find any association between ten SNPs, spanning the whole DMT1 gene region, and RLS affection status. Finally, two DMT1 intronic SNPs showed positive association with RLS in patients with a history of anemia, when compared to RLS patients without anemia. (c) 2007 Wiley-Liss, Inc.
Transcriptomic and Proteomic Analyses of Resistant Host Responses in Arachis diogoi Challenged with Late Leaf Spot Pathogen, Phaeoisariopsis personata

PubMed Central

Kumar, Dilip; Kirti, Pulugurtha Bharadwaja

2015-01-01

Late leaf spot is a serious disease of peanut caused by the imperfect fungus, Phaeoisariopsis personata. Wild diploid species, Arachis diogoi. is reported to be highly resistant to this disease and asymptomatic. The objective of this study is to investigate the molecular responses of the wild peanut challenged with the late leaf spot pathogen using cDNA-AFLP and 2D proteomic study. A total of 233 reliable, differentially expressed genes were identified in Arachis diogoi. About one third of the TDFs exhibit no significant similarity with the known sequences in the data bases. Expressed sequence tag data showed that the characterized genes are involved in conferring resistance in the wild peanut to the pathogen challenge. Several genes for proteins involved in cell wall strengthening, hypersensitive cell death and resistance related proteins have been identified. Genes identified for other proteins appear to function in metabolism, signal transduction and defence. Nineteen TDFs based on the homology analysis of genes associated with defence, signal transduction and metabolism were further validated by quantitative real time PCR (qRT-PCR) analyses in resistant wild species in comparison with a susceptible peanut genotype in time course experiments. The proteins corresponding to six TDFs were differentially expressed at protein level also. Differentially expressed TDFs and proteins in wild peanut indicate its defence mechanism upon pathogen challenge and provide initial breakthrough of genes possibly involved in recognition events and early signalling responses to combat the pathogen through subsequent development of resistivity. This is the first attempt to elucidate the molecular basis of the response of the resistant genotype to the late leaf spot pathogen, and its defence mechanism. PMID:25646800
An efficient transgenic system by TA cloning vectors and RNAi for C. elegans

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gengyo-Ando, Keiko; CREST, JST, 4-1-8 Hon-cho, Kawaguchi, Saitama 332-0012; Yoshina, Sawako

2006-11-03

In the nematode, transgenic analyses have been performed by microinjection of DNA from various sources into the syncytium gonad. To expedite these transgenic analyses, we solved two potential problems in this work. First, we constructed an efficient TA-cloning vector system which is useful for any promoter. By amplifying the genomic DNA fragments which contain regulatory sequences with or without the coding region, we could easily construct plasmids expressing fluorescent protein fusion without considering restriction sites. We could dissect motor neurons with three colors in a single animal. Second, we used feeding RNAi to isolate transgenic strains which express lag-2::venus fusionmore » gene. We found that the fusion protein is toxic when ectopically expressed in embryos but is functional to rescue a loss of function mutant in the lag-2 gene. Thus, the transgenic system described here should be useful to examine the protein function in the nematode.« less
Transcriptome sequencing and whole genome expression profiling of chrysanthemum under dehydration stress

PubMed Central

2013-01-01

Background Chrysanthemum is one of the most important ornamental crops in the world and drought stress seriously limits its production and distribution. In order to generate a functional genomics resource and obtain a deeper understanding of the molecular mechanisms regarding chrysanthemum responses to dehydration stress, we performed large-scale transcriptome sequencing of chrysanthemum plants under dehydration stress using the Illumina sequencing technology. Results Two cDNA libraries constructed from mRNAs of control and dehydration-treated seedlings were sequenced by Illumina technology. A total of more than 100 million reads were generated and de novo assembled into 98,180 unique transcripts which were further extensively annotated by comparing their sequencing to different protein databases. Biochemical pathways were predicted from these transcript sequences. Furthermore, we performed gene expression profiling analysis upon dehydration treatment in chrysanthemum and identified 8,558 dehydration-responsive unique transcripts, including 307 transcription factors and 229 protein kinases and many well-known stress responsive genes. Gene ontology (GO) term enrichment and biochemical pathway analyses showed that dehydration stress caused changes in hormone response, secondary and amino acid metabolism, and light and photoperiod response. These findings suggest that drought tolerance of chrysanthemum plants may be related to the regulation of hormone biosynthesis and signaling, reduction of oxidative damage, stabilization of cell proteins and structures, and maintenance of energy and carbon supply. Conclusions Our transcriptome sequences can provide a valuable resource for chrysanthemum breeding and research and novel insights into chrysanthemum responses to dehydration stress and offer candidate genes or markers that can be used to guide future studies attempting to breed drought tolerant chrysanthemum cultivars. PMID:24074255
Global sequence variation in the histidine-rich proteins 2 and 3 of Plasmodium falciparum: implications for the performance of malaria rapid diagnostic tests

PubMed Central

2010-01-01

Background Accurate diagnosis is essential for prompt and appropriate treatment of malaria. While rapid diagnostic tests (RDTs) offer great potential to improve malaria diagnosis, the sensitivity of RDTs has been reported to be highly variable. One possible factor contributing to variable test performance is the diversity of parasite antigens. This is of particular concern for Plasmodium falciparum histidine-rich protein 2 (PfHRP2)-detecting RDTs since PfHRP2 has been reported to be highly variable in isolates of the Asia-Pacific region. Methods The pfhrp2 exon 2 fragment from 458 isolates of P. falciparum collected from 38 countries was amplified and sequenced. For a subset of 80 isolates, the exon 2 fragment of histidine-rich protein 3 (pfhrp3) was also amplified and sequenced. DNA sequence and statistical analysis of the variation observed in these genes was conducted. The potential impact of the pfhrp2 variation on RDT detection rates was examined by analysing the relationship between sequence characteristics of this gene and the results of the WHO product testing of malaria RDTs: Round 1 (2008), for 34 PfHRP2-detecting RDTs. Results Sequence analysis revealed extensive variations in the number and arrangement of various repeats encoded by the genes in parasite populations world-wide. However, no statistically robust correlation between gene structure and RDT detection rate for P. falciparum parasites at 200 parasites per microlitre was identified. Conclusions The results suggest that despite extreme sequence variation, diversity of PfHRP2 does not appear to be a major cause of RDT sensitivity variation. PMID:20470441
Distinct profiles of expressed sequence tags during intestinal regeneration in the sea cucumber Holothuria glaberrima

PubMed Central

Rojas-Cartagena, Carmencita; Ortíz-Pineda, Pablo; Ramírez-Gómez, Francisco; Suárez-Castillo, Edna C.; Matos-Cruz, Vanessa; Rodríguez, Carlos; Ortíz-Zuazaga, Humberto; García-Arrarás, José E.

2010-01-01

Repair and regeneration are key processes for tissue maintenance, and their disruption may lead to disease states. Little is known about the molecular mechanisms that underline the repair and regeneration of the digestive tract. The sea cucumber Holothuria glaberrima represents an excellent model to dissect and characterize the molecular events during intestinal regeneration. To study the gene expression profile, cDNA libraries were constructed from normal, 3-day, and 7-day regenerating intestines of H. glaberrima. Clones were randomly sequenced and queried against the nonredundant protein database at the National Center for Biotechnology Information. RT-PCR analyses were made of several genes to determine their expression profile during intestinal regeneration. A total of 5,173 sequences from three cDNA libraries were obtained. About 46.2, 35.6, and 26.2% of the sequences for the normal, 3-days, and 7-days cDNA libraries, respectively, shared significant similarity with known sequences in the protein database of GenBank but only present 10% of similarity among them. Analysis of the libraries in terms of functional processes, protein domains, and most common sequences suggests that a differential expression profile is taking place during the regeneration process. Further examination of the expressed sequence tag dataset revealed that 12 putative genes are differentially expressed at significant level (R > 6). Experimental validation by RT-PCR analysis reveals that at least three genes (unknown C-4677-1, melanotransferrin, and centaurin) present a differential expression during regeneration. These findings strongly suggest that the gene expression profile varies among regeneration stages and provide evidence for the existence of differential gene expression. PMID:17579180
PHYSICO: An UNIX based Standalone Procedure for Computation of Individual and Group Properties of Protein Sequences.

PubMed

Gupta, Parth Sarthi Sen; Banerjee, Shyamashree; Islam, Rifat Nawaz Ul; Mondal, Sudipta; Mondal, Buddhadev; Bandyopadhyay, Amal K

2014-01-01

In the genomic and proteomic era, efficient and automated analyses of sequence properties of protein have become an important task in bioinformatics. There are general public licensed (GPL) software tools to perform a part of the job. However, computations of mean properties of large number of orthologous sequences are not possible from the above mentioned GPL sets. Further, there is no GPL software or server which can calculate window dependent sequence properties for a large number of sequences in a single run. With a view to overcome above limitations, we have developed a standalone procedure i.e. PHYSICO, which performs various stages of computation in a single run based on the type of input provided either in RAW-FASTA or BLOCK-FASTA format and makes excel output for: a) Composition, Class composition, Mean molecular weight, Isoelectic point, Aliphatic index and GRAVY, b) column based compositions, variability and difference matrix, c) 25 kinds of window dependent sequence properties. The program is fast, efficient, error free and user friendly. Calculation of mean and standard deviation of homologous sequences sets, for comparison purpose when relevant, is another attribute of the program; a property seldom seen in existing GPL softwares. PHYSICO is freely available for non-commercial/academic user in formal request to the corresponding author akbanerjee@biotech.buruniv.ac.in.
PHYSICO: An UNIX based Standalone Procedure for Computation of Individual and Group Properties of Protein Sequences

PubMed Central

Gupta, Parth Sarthi Sen; Banerjee, Shyamashree; Islam, Rifat Nawaz Ul; Mondal, Sudipta; Mondal, Buddhadev; Bandyopadhyay, Amal K

2014-01-01

In the genomic and proteomic era, efficient and automated analyses of sequence properties of protein have become an important task in bioinformatics. There are general public licensed (GPL) software tools to perform a part of the job. However, computations of mean properties of large number of orthologous sequences are not possible from the above mentioned GPL sets. Further, there is no GPL software or server which can calculate window dependent sequence properties for a large number of sequences in a single run. With a view to overcome above limitations, we have developed a standalone procedure i.e. PHYSICO, which performs various stages of computation in a single run based on the type of input provided either in RAW-FASTA or BLOCK-FASTA format and makes excel output for: a) Composition, Class composition, Mean molecular weight, Isoelectic point, Aliphatic index and GRAVY, b) column based compositions, variability and difference matrix, c) 25 kinds of window dependent sequence properties. The program is fast, efficient, error free and user friendly. Calculation of mean and standard deviation of homologous sequences sets, for comparison purpose when relevant, is another attribute of the program; a property seldom seen in existing GPL softwares. Availability PHYSICO is freely available for non-commercial/academic user in formal request to the corresponding author akbanerjee@biotech.buruniv.ac.in PMID:24616564
How to Choose the Suitable Template for Homology Modelling of GPCRs: 5-HT7 Receptor as a Test Case.

PubMed

Shahaf, Nir; Pappalardo, Matteo; Basile, Livia; Guccione, Salvatore; Rayan, Anwar

2016-09-01

G protein-coupled receptors (GPCRs) are a super-family of membrane proteins that attract great pharmaceutical interest due to their involvement in almost every physiological activity, including extracellular stimuli, neurotransmission, and hormone regulation. Currently, structural information on many GPCRs is mainly obtained by the techniques of computer modelling in general and by homology modelling in particular. Based on a quantitative analysis of eighteen antagonist-bound, resolved structures of rhodopsin family "A" receptors - also used as templates to build 153 homology models - it was concluded that a higher sequence identity between two receptors does not guarantee a lower RMSD between their structures, especially when their pair-wise sequence identity (within trans-membrane domain and/or in binding pocket) lies between 25 % and 40 %. This study suggests that we should consider all template receptors having a sequence identity ≤50 % with the query receptor. In fact, most of the GPCRs, compared to the currently available resolved structures of GPCRs, fall within this range and lack a correlation between structure and sequence. When testing suitability for structure-based drug design, it was found that choosing as a template the most similar resolved protein, based on sequence resemblance only, led to unsound results in many cases. Molecular docking analyses were carried out, and enrichment factors as well as attrition rates were utilized as criteria for assessing suitability for structure-based drug design. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Characterisation and In Silico Analysis of Interleukin-4 cDNA of Nilgai (Boselaphus tragocamelus) and Indian Buffalo (Bubalus bubalis)

PubMed Central

Saini, M.; Palai, T. K.; Das, D. K.; Hatle, K. M.; Gupta, P. K.

2013-01-01

Interleukin-4 (IL-4) produced from Th2 cells modulates both innate and adaptive immune responses. It is a common belief that wild animals possess better immunity against diseases than domestic and laboratory animals; however, the immune system of wild animals is not fully explored yet. Therefore, a comparative study was designed to explore the wildlife immunity through characterisation of IL-4 cDNA of nilgai, a wild ruminant, and Indian buffalo, a domestic ruminant. Total RNA was extracted from peripheral blood mononuclear cells of nilgai and Indian buffalo and reverse transcribed into cDNA. Respective cDNA was further cloned and sequenced. Sequences were analysed in silico and compared with their homologues available at GenBank. The deduced 135 amino acid protein of nilgai IL-4 is 95.6% similar to that of Indian buffalo. N-linked glycosylation sequence, leader sequence, Cysteine residues in the signal peptide region, and 3′ UTR of IL-4 were found to be conserved across species. Six nonsynonymous nucleotide substitutions were found in Indian buffalo compared to nilgai amino acid sequence. Tertiary structure of this protein in both species was modeled, and it was found that this protein falls under 4-helical cytokines superfamily and short chain cytokine family. Phylogenetic analysis revealed a single cluster of ruminants including both nilgai and Indian buffalo that was placed distinct from other nonruminant mammals. PMID:24348167
1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life

DOE PAGES

Mukherjee, Supratim; Seshadri, Rekha; Varghese, Neha J.; ...

2017-06-12

We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space. These genomes double the number of existing type strains and expand their overall phylogenetic diversity by 25%. Comparative analyses with previously available finished and draft genomes reveal a 10.5% increase in novel protein families as a function of phylogenetic diversity. The GEBA genomes recruit 25 million previously unassigned metagenomic proteins from 4,650 samples, improving their phylogenetic and functional interpretation. We identify numerous biosynthetic clusters and experimentally validate a divergent phenazine cluster withmore » potential new chemical structure and antimicrobial activity. This Resource is the largest single release of reference genomes to date. Bacterial and archaeal isolate sequence space is still far from saturated, and future endeavors in this direction will continue to be a valuable resource for scientific discovery.« less
1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mukherjee, Supratim; Seshadri, Rekha; Varghese, Neha J.

We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space. These genomes double the number of existing type strains and expand their overall phylogenetic diversity by 25%. Comparative analyses with previously available finished and draft genomes reveal a 10.5% increase in novel protein families as a function of phylogenetic diversity. The GEBA genomes recruit 25 million previously unassigned metagenomic proteins from 4,650 samples, improving their phylogenetic and functional interpretation. We identify numerous biosynthetic clusters and experimentally validate a divergent phenazine cluster withmore » potential new chemical structure and antimicrobial activity. This Resource is the largest single release of reference genomes to date. Bacterial and archaeal isolate sequence space is still far from saturated, and future endeavors in this direction will continue to be a valuable resource for scientific discovery.« less
The complete mitochondrial genome of Papilio glaucus and its phylogenetic implications.

PubMed

Shen, Jinhui; Cong, Qian; Grishin, Nick V

2015-09-01

Due to the intriguing morphology, lifecycle, and diversity of butterflies and moths, Lepidoptera are emerging as model organisms for the study of genetics, evolution and speciation. The progress of these studies relies on decoding Lepidoptera genomes, both nuclear and mitochondrial. Here we describe a protocol to obtain mitogenomes from Next Generation Sequencing reads performed for whole-genome sequencing and report the complete mitogenome of Papilio (Pterourus) glaucus. The circular mitogenome is 15,306 bp in length and rich in A and T. It contains 13 protein-coding genes (PCGs), 22 transfer-RNA-coding genes (tRNA), and 2 ribosomal-RNA-coding genes (rRNA), with a gene order typical for mitogenomes of Lepidoptera. We performed phylogenetic analyses based on PCG and RNA-coding genes or protein sequences using Bayesian Inference and Maximum Likelihood methods. The phylogenetic trees consistently show that among species with available mitogenomes Papilio glaucus is the closest to Papilio (Agehana) maraho from Asia.
Integrative functional analyses using rainbow trout selected for tolerance to plant diets reveal nutrigenomic signatures for soy utilization without the concurrence of enteritis.

PubMed

Abernathy, Jason; Brezas, Andreas; Snekvik, Kevin R; Hardy, Ronald W; Overturf, Ken

2017-01-01

Finding suitable alternative protein sources for diets of carnivorous fish species remains a major concern for sustainable aquaculture. Through genetic selection, we created a strain of rainbow trout that outperforms parental lines in utilizing an all-plant protein diet and does not develop enteritis in the distal intestine, as is typical with salmonids on long-term plant protein-based feeds. By incorporating this strain into functional analyses, we set out to determine which genes are critical to plant protein utilization in the absence of gut inflammation. After a 12-week feeding trial with our selected strain and a control trout strain fed either a fishmeal-based diet or an all-plant protein diet, high-throughput RNA sequencing was completed on both liver and muscle tissues. Differential gene expression analyses, weighted correlation network analyses and further functional characterization were performed. A strain-by-diet design revealed differential expression ranging from a few dozen to over one thousand genes among the various comparisons and tissues. Major gene ontology groups identified between comparisons included those encompassing central, intermediary and foreign molecule metabolism, associated biosynthetic pathways as well as immunity. A systems approach indicated that genes involved in purine metabolism were highly perturbed. Systems analysis among the tissues tested further suggests the interplay between selection for growth, dietary utilization and protein tolerance may also have implications for nonspecific immunity. By combining data from differential gene expression and co-expression networks using selected trout, along with ontology and pathway analyses, a set of 63 candidate genes for plant diet tolerance was found. Risk loci in human inflammatory bowel diseases were also found in our datasets, indicating rainbow trout selected for plant-diet tolerance may have added utility as a potential biomedical model.
Genome-wide proteomics analysis on longissimus muscles in Qinchuan beef cattle.

PubMed

He, Hua; Chen, Si; Liang, Wei; Liu, Xiaolin

2017-04-01

To gain further insight into the molecular mechanism of bovine muscle development, we combined mass spectrometry characterization of proteins with Illumina deep sequencing of RNAs obtained from bovine longissimus muscle (LD) at prenatal and postnatal stages. For the proteomic study, each group of LD proteins was extracted and labeled using isobaric tags for relative and absolute quantitation (iTRAQ) method. Among the 1321 proteins identified from six samples, 390 proteins were differentially expressed in embryos at day 135 post-fertilization (Emb135d) vs. 30-month-old adult cattle (Emb135d vs. 30M) samples. Gene Ontology, Cluster of Orthologous Groups and Kyoto Encyclopedia of Genes and Genomes analyses were further conducted to better understand the different functions. Furthermore, we analyzed the relationship between transcript and protein regulation between samples by direct comparison of expression levels from transcriptomic and iTRAQ-based proteomics. Association results indicated that 1295 of 1321 proteins could be mapped to transcriptome sequencing data. This study provides the most comprehensive, targeted survey of bovine LD proteins to date and has shown the power of combining transcriptomic and proteomic approaches to provide molecular insights for understanding the developmental characteristics in bovine muscle, and even in other mammals. © 2016 Stichting International Foundation for Animal Genetics.
Cloning and characterization of a mouse gene with homology to the human von Hippel-Lindau disease tumor suppressor gene: implications for the potential organization of the human von Hippel-Lindau disease gene.

PubMed

Gao, J; Naglich, J G; Laidlaw, J; Whaley, J M; Seizinger, B R; Kley, N

1995-02-15

The human von Hippel-Lindau disease (VHL) gene has recently been identified and, based on the nucleotide sequence of a partial cDNA clone, has been predicted to encode a novel protein with as yet unknown functions [F. Latif et al., Science (Washington DC), 260: 1317-1320, 1993]. The length of the encoded protein and the characteristics of the cellular expressed protein are as yet unclear. Here we report the cloning and characterization of a mouse gene (mVHLh1) that is widely expressed in different mouse tissues and shares high homology with the human VHL gene. It predicts a protein 181 residues long (and/or 162 amino acids, considering a potential alternative start codon), which across a core region of approximately 140 residues displays a high degree of sequence identity (98%) to the predicted human VHL protein. High stringency DNA and RNA hybridization experiments and protein expression analyses indicate that this gene is the most highly VHL-related mouse gene, suggesting that it represents the mouse VHL gene homologue rather than a related gene sharing a conserved functional domain. These findings provide new insights into the potential organization of the VHL gene and nature of its encoded protein.
Evolution of the Translocation and Assembly Module (TAM)

PubMed Central

Heinz, Eva; Selkrig, Joel; Belousoff, Matthew J.; Lithgow, Trevor

2015-01-01

Bacterial outer membrane proteins require the beta-barrel assembly machinery (BAM) for their correct folding and function. The central component of this machinery is BamA, an Omp85 protein that is essential and found in all Gram-negative bacteria. An additional feature of the BAM is the translocation and assembly module (TAM), comprised TamA (an Omp85 family protein) and TamB. We report that TamA and a closely related protein TamL are confined almost exclusively to Proteobacteria and Bacteroidetes/Chlorobi respectively, whereas TamB is widely distributed across the majority of Gram-negative bacterial lineages. A comprehensive phylogenetic and secondary structure analysis of the TamB protein family revealed that TamB was present very early in the evolution of bacteria. Several sequence characteristics were discovered to define the TamB protein family: A signal-anchor linkage to the inner membrane, beta-helical structure, conserved domain architecture and a C-terminal region that mimics outer membrane protein beta-strands. Taken together, the structural and phylogenetic analyses suggest that the TAM likely evolved from an original combination of BamA and TamB, with a later gene duplication event of BamA, giving rise to an additional Omp85 sequence that evolved to be TamA in Proteobacteria and TamL in Bacteroidetes/Chlorobi. PMID:25994932
Bone morphogenetic protein-binding endothelial regulator of liver sinusoidal endothelial cells induces iron overload in a fatty liver mouse model.

PubMed

Hasebe, Takumu; Tanaka, Hiroki; Sawada, Koji; Nakajima, Shunsuke; Ohtake, Takaaki; Fujiya, Mikihiro; Kohgo, Yutaka

2017-03-01

Non-alcoholic fatty liver disease (NAFLD) is frequently accompanied by iron overload. However, because of the complex hepcidin-regulating molecules, the molecular mechanism underlying iron overload remains unknown. To identify the key molecule involved in NAFLD-associated iron dysregulation, we performed whole-RNA sequencing on the livers of obese mice. Male C57BL/6 mice were fed a regular or high-fat diet for 16 or 48 weeks. Internal iron was evaluated by plasma iron, ferritin or hepatic iron content. Whole-RNA sequencing was performed by transcriptome analysis using semiconductor high-throughput sequencer. Mouse liver tissues or isolated hepatocytes and sinusoidal endothelial cells were used to assess the expression of iron-regulating molecules. Mice fed a high-fat diet for 16 weeks showed excess iron accumulation. Longer exposure to a high-fat diet increased hepatic fibrosis and intrahepatic iron accumulation. A pathway analysis of the sequencing data showed that several inflammatory pathways, including bone morphogenetic protein (BMP)-SMAD signaling, were significantly affected. Sequencing analysis showed 2314 altered genes, including decreased mRNA expression of the hepcidin-coding gene Hamp. Hepcidin protein expression and SMAD phosphorylation, which induces Hamp, were found to be reduced. The expression of BMP-binding endothelial regulator (BMPER), which inhibits BMP-SMAD signaling by binding BMP extracellularly, was up-regulated in fatty livers. In addition, immunohistochemical and cell isolation analyses showed that BMPER was primarily expressed in the liver sinusoidal endothelial cells (LSECs) rather than hepatocytes. BMPER secretion by LSECs inhibits BMP-SMAD signaling in hepatocytes and further reduces hepcidin protein expression. These intrahepatic molecular interactions suggest a novel molecular basis of iron overload in NAFLD.

Comprehensive analysis of gene expression patterns in Friedreich's ataxia fibroblasts by RNA sequencing reveals altered levels of protein synthesis factors and solute carriers

PubMed Central

Li, Yanjie; Lu, Yue; Lin, Kevin; Hauser, Lauren A.; Lynch, David R.

2017-01-01

ABSTRACT Friedreich's ataxia (FRDA) is an autosomal recessive neurodegenerative disease usually caused by large homozygous expansions of GAA repeat sequences in intron 1 of the frataxin (FXN) gene. FRDA patients homozygous for GAA expansions have low FXN mRNA and protein levels when compared with heterozygous carriers or healthy controls. Frataxin is a mitochondrial protein involved in iron–sulfur cluster synthesis, and many FRDA phenotypes result from deficiencies in cellular metabolism due to lowered expression of FXN. Presently, there is no effective treatment for FRDA, and biomarkers to measure therapeutic trial outcomes and/or to gauge disease progression are lacking. Peripheral tissues, including blood cells, buccal cells and skin fibroblasts, can readily be isolated from FRDA patients and used to define molecular hallmarks of disease pathogenesis. For instance, FXN mRNA and protein levels as well as FXN GAA-repeat tract lengths are routinely determined using all of these cell types. However, because these tissues are not directly involved in disease pathogenesis, their relevance as models of the molecular aspects of the disease is yet to be decided. Herein, we conducted unbiased RNA sequencing to profile the transcriptomes of fibroblast cell lines derived from 18 FRDA patients and 17 unaffected control individuals. Bioinformatic analyses revealed significantly upregulated expression of genes encoding plasma membrane solute carrier proteins in FRDA fibroblasts. Conversely, the expression of genes encoding accessory factors and enzymes involved in cytoplasmic and mitochondrial protein synthesis was consistently decreased in FRDA fibroblasts. Finally, comparison of genes differentially expressed in FRDA fibroblasts to three previously published gene expression signatures defined for FRDA blood cells showed substantial overlap between the independent datasets, including correspondingly deficient expression of antioxidant defense genes. Together, these results indicate that gene expression profiling of cells derived from peripheral tissues can, in fact, consistently reveal novel molecular pathways of the disease. When performed on statistically meaningful sample group sizes, unbiased global profiling analyses utilizing peripheral tissues are critical for the discovery and validation of FRDA disease biomarkers. PMID:29125828
Global genetic diversity of the Plasmodium vivax transmission-blocking vaccine candidate Pvs48/45.

PubMed

Vallejo, Andres F; Martinez, Nora L; Tobon, Alejandra; Alger, Jackeline; Lacerda, Marcus V; Kajava, Andrey V; Arévalo-Herrera, Myriam; Herrera, Sócrates

2016-04-12

Plasmodium vivax 48/45 protein is expressed on the surface of gametocytes/gametes and plays a key role in gamete fusion during fertilization. This protein was recently expressed in Escherichia coli host as a recombinant product that was highly immunogenic in mice and monkeys and induced antibodies with high transmission-blocking activity, suggesting its potential as a P. vivax transmission-blocking vaccine candidate. To determine sequence polymorphism of natural parasite isolates and its potential influence on the protein structure, all pvs48/45 sequences reported in databases from around the world as well as those from low-transmission settings of Latin America were compared. Plasmodium vivax parasite isolates from malaria-endemic regions of Colombia, Brazil and Honduras (n = 60) were used to sequence the Pvs48/45 gene, and compared to those previously reported to GenBank and PlasmoDB (n = 222). Pvs48/45 gene haplotypes were analysed to determine the functional significance of genetic variation in protein structure and vaccine potential. Nine non-synonymous substitutions (E35K, Y196H, H211N, K250N, D335Y, E353Q, A376T, K390T, K418R) and three synonymous substitutions (I73, T149, C156) that define seven different haplotypes were found among the 282 isolates from nine countries when compared with the Sal I reference sequence. Nucleotide diversity (π) was 0.00173 for worldwide samples (range 0.00033-0.00216), resulting in relatively high diversity in Myanmar and Colombia, and low diversity in Mexico, Peru and South Korea. The two most frequent substitutions (E353Q: 41.9 %, K250N: 39.5 %) were predicted to be located in antigenic regions without affecting putative B cell epitopes or the tertiary protein structure. There is limited sequence polymorphism in pvs48/45 with noted geographical clustering among Asian and American isolates. The low genetic diversity of the protein does not influence the predicted antigenicity or protein structure and, therefore, supports its further development as transmission-blocking vaccine candidate.
Transcript and proteomic analysis of developing white lupin (Lupinus albus L.) roots

PubMed Central

Tian, Li; Peel, Gregory J; Lei, Zhentian; Aziz, Naveed; Dai, Xinbin; He, Ji; Watson, Bonnie; Zhao, Patrick X; Sumner, Lloyd W; Dixon, Richard A

2009-01-01

Background White lupin (Lupinus albus L.) roots efficiently take up and accumulate (heavy) metals, adapt to phosphate deficiency by forming cluster roots, and secrete antimicrobial prenylated isoflavones during development. Genomic and proteomic approaches were applied to identify candidate genes and proteins involved in antimicrobial defense and (heavy) metal uptake and translocation. Results A cDNA library was constructed from roots of white lupin seedlings. Eight thousand clones were randomly sequenced and assembled into 2,455 unigenes, which were annotated based on homologous matches in the NCBInr protein database. A reference map of developing white lupin root proteins was established through 2-D gel electrophoresis and peptide mass fingerprinting. High quality peptide mass spectra were obtained for 170 proteins. Microsomal membrane proteins were separated by 1-D gel electrophoresis and identified by LC-MS/MS. A total of 74 proteins were putatively identified by the peptide mass fingerprinting and the LC-MS/MS methods. Genomic and proteomic analyses identified candidate genes and proteins encoding metal binding and/or transport proteins, transcription factors, ABC transporters and phenylpropanoid biosynthetic enzymes. Conclusion The combined EST and protein datasets will facilitate the understanding of white lupin's response to biotic and abiotic stresses and its utility for phytoremediation. The root ESTs provided 82 perfect simple sequence repeat (SSR) markers with potential utility in breeding white lupin for enhanced agronomic traits. PMID:19123941
Genome-wide identification, classification, and expression analysis of the arabinogalactan protein gene family in rice (Oryza sativa L.)

PubMed Central

Zhao, Jie

2010-01-01

Arabinogalactan proteins (AGPs) comprise a family of hydroxyproline-rich glycoproteins that are implicated in plant growth and development. In this study, 69 AGPs are identified from the rice genome, including 13 classical AGPs, 15 arabinogalactan (AG) peptides, three non-classical AGPs, three early nodulin-like AGPs (eNod-like AGPs), eight non-specific lipid transfer protein-like AGPs (nsLTP-like AGPs), and 27 fasciclin-like AGPs (FLAs). The results from expressed sequence tags, microarrays, and massively parallel signature sequencing tags are used to analyse the expression of AGP-encoding genes, which is confirmed by real-time PCR. The results reveal that several rice AGP-encoding genes are predominantly expressed in anthers and display differential expression patterns in response to abscisic acid, gibberellic acid, and abiotic stresses. Based on the results obtained from this analysis, an attempt has been made to link the protein structures and expression patterns of rice AGP-encoding genes to their functions. Taken together, the genome-wide identification and expression analysis of the rice AGP gene family might facilitate further functional studies of rice AGPs. PMID:20423940
Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.

PubMed

Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai

2015-12-01

The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.
Transcriptional Profiles of Mating-Responsive Genes from Testes and Male Accessory Glands of the Mediterranean Fruit Fly, Ceratitis capitata

PubMed Central

Scolari, Francesca; Gomulski, Ludvik M.; Ribeiro, José M. C.; Siciliano, Paolo; Meraldi, Alice; Falchetto, Marco; Bonomi, Angelica; Manni, Mosè; Gabrieli, Paolo; Malovini, Alberto; Bellazzi, Riccardo; Aksoy, Serap; Gasperi, Giuliano; Malacrida, Anna R.

2012-01-01

Background Insect seminal fluid is a complex mixture of proteins, carbohydrates and lipids, produced in the male reproductive tract. This seminal fluid is transferred together with the spermatozoa during mating and induces post-mating changes in the female. Molecular characterization of seminal fluid proteins in the Mediterranean fruit fly, Ceratitis capitata, is limited, although studies suggest that some of these proteins are biologically active. Methodology/Principal Findings We report on the functional annotation of 5914 high quality expressed sequence tags (ESTs) from the testes and male accessory glands, to identify transcripts encoding putative secreted peptides that might elicit post-mating responses in females. The ESTs were assembled into 3344 contigs, of which over 33% produced no hits against the nr database, and thus may represent novel or rapidly evolving sequences. Extraction of the coding sequences resulted in a total of 3371 putative peptides. The annotated dataset is available as a hyperlinked spreadsheet. Four hundred peptides were identified with putative secretory activity, including odorant binding proteins, protease inhibitor domain-containing peptides, antigen 5 proteins, mucins, and immunity-related sequences. Quantitative RT-PCR-based analyses of a subset of putative secretory protein-encoding transcripts from accessory glands indicated changes in their abundance after one or more copulations when compared to virgin males of the same age. These changes in abundance, particularly evident after the third mating, may be related to the requirement to replenish proteins to be transferred to the female. Conclusions/Significance We have developed the first large-scale dataset for novel studies on functions and processes associated with the reproductive biology of Ceratitis capitata. The identified genes may help study genome evolution, in light of the high adaptive potential of the medfly. In addition, studies of male recovery dynamics in terms of accessory gland gene expression profiles and correlated remating inhibition mechanisms may permit the improvement of pest management approaches. PMID:23071645
Comprehensive in silico allergenicity assessment of novel protein engineered chimeric Cry proteins for safe deployment in crops.

PubMed

Rathinam, Maniraj; Singh, Shweta; Pattanayak, Debasis; Sreevathsa, Rohini

2017-08-02

Development of chimeric Cry toxins by protein engineering of known and validated proteins is imperative for enhancing the efficacy and broadening the insecticidal spectrum of these genes. Expression of novel Cry proteins in food crops has however created apprehensions with respect to the safety aspects. To clarify this, premarket evaluation consisting of an array of analyses to evaluate the unintended effects is a prerequisite to provide safety assurance to the consumers. Additionally, series of bioinformatic tools as in silico aids are being used to evaluate the likely allergenic reaction of the proteins based on sequence and epitope similarity with known allergens. In the present study, chimeric Cry toxins developed through protein engineering were evaluated for allergenic potential using various in silico algorithms. Major emphasis was on the validation of allergenic potential on three aspects of paramount significance viz., sequence-based homology between allergenic proteins, validation of conformational epitopes towards identification of food allergens and physico-chemical properties of amino acids. Additionally, in vitro analysis pertaining to heat stability of two of the eight chimeric proteins and pepsin digestibility further demonstrated the non-allergenic potential of these chimeric toxins. The study revealed for the first time an all-encompassing evaluation that the recombinant Cry proteins did not show any potential similarity with any known allergens with respect to the parameters generally considered for a protein to be designated as an allergen. These novel chimeric proteins hence can be considered safe to be introgressed into plants.
The rearrangement of motif F in the flavivirus RNA-directed RNA polymerase.

PubMed

Potapova, Ulyana; Feranchuk, Sergey; Leonova, Galina; Belikov, Sergei

2018-03-01

In the flavivirus genus, the non-structural protein NS5 plays a central role in RNA viral replication and constitutes a major target for drug discovery. One of the prime challenges in the study of NS5 protein is to investigate the interplay between the two protein domains, namely, the RNA-dependent RNA polymerase (RdRp) domain and the methyltransferase (MTase) domain. These investigations could clarify the multiple roles of NS5 protein in the virus life cycle. Here we present the results of sequence analyses and structural bioinformatics studies of NS5 protein, which suggest that the conserved motif F in the NS5 protein could act as a lock which controls the rearrangement of the domains and as a switch in the protein enzymatic activity. Copyright © 2017 Elsevier B.V. All rights reserved.
PARPs database: A LIMS systems for protein-protein interaction data mining or laboratory information management system

PubMed Central

Droit, Arnaud; Hunter, Joanna M; Rouleau, Michèle; Ethier, Chantal; Picard-Cloutier, Aude; Bourgais, David; Poirier, Guy G

2007-01-01

Background In the "post-genome" era, mass spectrometry (MS) has become an important method for the analysis of proteins and the rapid advancement of this technique, in combination with other proteomics methods, results in an increasing amount of proteome data. This data must be archived and analysed using specialized bioinformatics tools. Description We herein describe "PARPs database," a data analysis and management pipeline for liquid chromatography tandem mass spectrometry (LC-MS/MS) proteomics. PARPs database is a web-based tool whose features include experiment annotation, protein database searching, protein sequence management, as well as data-mining of the peptides and proteins identified. Conclusion Using this pipeline, we have successfully identified several interactions of biological significance between PARP-1 and other proteins, namely RFC-1, 2, 3, 4 and 5. PMID:18093328
Genome of Rhodnius prolixus, an insect vector of Chagas disease, reveals unique adaptations to hematophagy and parasite infection

PubMed Central

Mesquita, Rafael D.; Vionette-Amaral, Raquel J.; Lowenberger, Carl; Rivera-Pomar, Rolando; Monteiro, Fernando A.; Minx, Patrick; Spieth, John; Carvalho, A. Bernardo; Panzera, Francisco; Lawson, Daniel; Torres, André Q.; Ribeiro, Jose M. C.; Sorgine, Marcos H. F.; Waterhouse, Robert M.; Abad-Franch, Fernando; Alves-Bezerra, Michele; Amaral, Laurence R.; Araujo, Helena M.; Aravind, L.; Atella, Georgia C.; Azambuja, Patricia; Berni, Mateus; Bittencourt-Cunha, Paula R.; Braz, Gloria R. C.; Calderón-Fernández, Gustavo; Carareto, Claudia M. A.; Christensen, Mikkel B.; Costa, Igor R.; Costa, Samara G.; Dansa, Marilvia; Daumas-Filho, Carlos R. O.; De-Paula, Iron F.; Dias, Felipe A.; Dimopoulos, George; Emrich, Scott J.; Esponda-Behrens, Natalia; Fampa, Patricia; Fernandez-Medina, Rita D.; da Fonseca, Rodrigo N.; Fontenele, Marcio; Fronick, Catrina; Fulton, Lucinda A.; Gandara, Ana Caroline; Garcia, Eloi S.; Genta, Fernando A.; Giraldo-Calderón, Gloria I.; Gomes, Bruno; Gondim, Katia C.; Granzotto, Adriana; Guarneri, Alessandra A.; Guigó, Roderic; Harry, Myriam; Hughes, Daniel S. T.; Jablonka, Willy; Jacquin-Joly, Emmanuelle; Juárez, M. Patricia; Koerich, Leonardo B.; Lange, Angela B.; Latorre-Estivalis, José Manuel; Lavore, Andrés; Lawrence, Gena G.; Lazoski, Cristiano; Lazzari, Claudio R.; Lopes, Raphael R.; Lorenzo, Marcelo G.; Lugon, Magda D.; Marcet, Paula L.; Mariotti, Marco; Masuda, Hatisaburo; Megy, Karine; Missirlis, Fanis; Mota, Theo; Noriega, Fernando G.; Nouzova, Marcela; Nunes, Rodrigo D.; Oliveira, Raquel L. L.; Oliveira-Silveira, Gilbert; Ons, Sheila; Orchard, Ian; Pagola, Lucia; Paiva-Silva, Gabriela O.; Pascual, Agustina; Pavan, Marcio G.; Pedrini, Nicolás; Peixoto, Alexandre A.; Pereira, Marcos H.; Pike, Andrew; Polycarpo, Carla; Prosdocimi, Francisco; Ribeiro-Rodrigues, Rodrigo; Robertson, Hugh M.; Salerno, Ana Paula; Salmon, Didier; Santesmasses, Didac; Schama, Renata; Seabra-Junior, Eloy S.; Silva-Cardoso, Livia; Silva-Neto, Mario A. C.; Souza-Gomes, Matheus; Sterkel, Marcos; Taracena, Mabel L.; Tojo, Marta; Tu, Zhijian Jake; Tubio, Jose M. C.; Ursic-Bedoya, Raul; Venancio, Thiago M.; Walter-Nuno, Ana Beatriz; Wilson, Derek; Warren, Wesley C.; Wilson, Richard K.; Huebner, Erwin; Dotson, Ellen M.; Oliveira, Pedro L.

2015-01-01

Rhodnius prolixus not only has served as a model organism for the study of insect physiology, but also is a major vector of Chagas disease, an illness that affects approximately seven million people worldwide. We sequenced the genome of R. prolixus, generated assembled sequences covering 95% of the genome (∼702 Mb), including 15,456 putative protein-coding genes, and completed comprehensive genomic analyses of this obligate blood-feeding insect. Although immune-deficiency (IMD)-mediated immune responses were observed, R. prolixus putatively lacks key components of the IMD pathway, suggesting a reorganization of the canonical immune signaling network. Although both Toll and IMD effectors controlled intestinal microbiota, neither affected Trypanosoma cruzi, the causal agent of Chagas disease, implying the existence of evasion or tolerance mechanisms. R. prolixus has experienced an extensive loss of selenoprotein genes, with its repertoire reduced to only two proteins, one of which is a selenocysteine-based glutathione peroxidase, the first found in insects. The genome contained actively transcribed, horizontally transferred genes from Wolbachia sp., which showed evidence of codon use evolution toward the insect use pattern. Comparative protein analyses revealed many lineage-specific expansions and putative gene absences in R. prolixus, including tandem expansions of genes related to chemoreception, feeding, and digestion that possibly contributed to the evolution of a blood-feeding lifestyle. The genome assembly and these associated analyses provide critical information on the physiology and evolution of this important vector species and should be instrumental for the development of innovative disease control methods. PMID:26627243
Genome of Rhodnius prolixus, an insect vector of Chagas disease, reveals unique adaptations to hematophagy and parasite infection.

PubMed

Mesquita, Rafael D; Vionette-Amaral, Raquel J; Lowenberger, Carl; Rivera-Pomar, Rolando; Monteiro, Fernando A; Minx, Patrick; Spieth, John; Carvalho, A Bernardo; Panzera, Francisco; Lawson, Daniel; Torres, André Q; Ribeiro, Jose M C; Sorgine, Marcos H F; Waterhouse, Robert M; Montague, Michael J; Abad-Franch, Fernando; Alves-Bezerra, Michele; Amaral, Laurence R; Araujo, Helena M; Araujo, Ricardo N; Aravind, L; Atella, Georgia C; Azambuja, Patricia; Berni, Mateus; Bittencourt-Cunha, Paula R; Braz, Gloria R C; Calderón-Fernández, Gustavo; Carareto, Claudia M A; Christensen, Mikkel B; Costa, Igor R; Costa, Samara G; Dansa, Marilvia; Daumas-Filho, Carlos R O; De-Paula, Iron F; Dias, Felipe A; Dimopoulos, George; Emrich, Scott J; Esponda-Behrens, Natalia; Fampa, Patricia; Fernandez-Medina, Rita D; da Fonseca, Rodrigo N; Fontenele, Marcio; Fronick, Catrina; Fulton, Lucinda A; Gandara, Ana Caroline; Garcia, Eloi S; Genta, Fernando A; Giraldo-Calderón, Gloria I; Gomes, Bruno; Gondim, Katia C; Granzotto, Adriana; Guarneri, Alessandra A; Guigó, Roderic; Harry, Myriam; Hughes, Daniel S T; Jablonka, Willy; Jacquin-Joly, Emmanuelle; Juárez, M Patricia; Koerich, Leonardo B; Lange, Angela B; Latorre-Estivalis, José Manuel; Lavore, Andrés; Lawrence, Gena G; Lazoski, Cristiano; Lazzari, Claudio R; Lopes, Raphael R; Lorenzo, Marcelo G; Lugon, Magda D; Majerowicz, David; Marcet, Paula L; Mariotti, Marco; Masuda, Hatisaburo; Megy, Karine; Melo, Ana C A; Missirlis, Fanis; Mota, Theo; Noriega, Fernando G; Nouzova, Marcela; Nunes, Rodrigo D; Oliveira, Raquel L L; Oliveira-Silveira, Gilbert; Ons, Sheila; Orchard, Ian; Pagola, Lucia; Paiva-Silva, Gabriela O; Pascual, Agustina; Pavan, Marcio G; Pedrini, Nicolás; Peixoto, Alexandre A; Pereira, Marcos H; Pike, Andrew; Polycarpo, Carla; Prosdocimi, Francisco; Ribeiro-Rodrigues, Rodrigo; Robertson, Hugh M; Salerno, Ana Paula; Salmon, Didier; Santesmasses, Didac; Schama, Renata; Seabra-Junior, Eloy S; Silva-Cardoso, Livia; Silva-Neto, Mario A C; Souza-Gomes, Matheus; Sterkel, Marcos; Taracena, Mabel L; Tojo, Marta; Tu, Zhijian Jake; Tubio, Jose M C; Ursic-Bedoya, Raul; Venancio, Thiago M; Walter-Nuno, Ana Beatriz; Wilson, Derek; Warren, Wesley C; Wilson, Richard K; Huebner, Erwin; Dotson, Ellen M; Oliveira, Pedro L

2015-12-01

Rhodnius prolixus not only has served as a model organism for the study of insect physiology, but also is a major vector of Chagas disease, an illness that affects approximately seven million people worldwide. We sequenced the genome of R. prolixus, generated assembled sequences covering 95% of the genome (∼ 702 Mb), including 15,456 putative protein-coding genes, and completed comprehensive genomic analyses of this obligate blood-feeding insect. Although immune-deficiency (IMD)-mediated immune responses were observed, R. prolixus putatively lacks key components of the IMD pathway, suggesting a reorganization of the canonical immune signaling network. Although both Toll and IMD effectors controlled intestinal microbiota, neither affected Trypanosoma cruzi, the causal agent of Chagas disease, implying the existence of evasion or tolerance mechanisms. R. prolixus has experienced an extensive loss of selenoprotein genes, with its repertoire reduced to only two proteins, one of which is a selenocysteine-based glutathione peroxidase, the first found in insects. The genome contained actively transcribed, horizontally transferred genes from Wolbachia sp., which showed evidence of codon use evolution toward the insect use pattern. Comparative protein analyses revealed many lineage-specific expansions and putative gene absences in R. prolixus, including tandem expansions of genes related to chemoreception, feeding, and digestion that possibly contributed to the evolution of a blood-feeding lifestyle. The genome assembly and these associated analyses provide critical information on the physiology and evolution of this important vector species and should be instrumental for the development of innovative disease control methods.
Structural and functional partitioning of bread wheat chromosome 3B.

PubMed

Choulet, Frédéric; Alberti, Adriana; Theil, Sébastien; Glover, Natasha; Barbe, Valérie; Daron, Josquin; Pingault, Lise; Sourdille, Pierre; Couloux, Arnaud; Paux, Etienne; Leroy, Philippe; Mangenot, Sophie; Guilhot, Nicolas; Le Gouis, Jacques; Balfourier, Francois; Alaux, Michael; Jamilloux, Véronique; Poulain, Julie; Durand, Céline; Bellec, Arnaud; Gaspin, Christine; Safar, Jan; Dolezel, Jaroslav; Rogers, Jane; Vandepoele, Klaas; Aury, Jean-Marc; Mayer, Klaus; Berges, Hélène; Quesneville, Hadi; Wincker, Patrick; Feuillet, Catherine

2014-07-18

We produced a reference sequence of the 1-gigabase chromosome 3B of hexaploid bread wheat. By sequencing 8452 bacterial artificial chromosomes in pools, we assembled a sequence of 774 megabases carrying 5326 protein-coding genes, 1938 pseudogenes, and 85% of transposable elements. The distribution of structural and functional features along the chromosome revealed partitioning correlated with meiotic recombination. Comparative analyses indicated high wheat-specific inter- and intrachromosomal gene duplication activities that are potential sources of variability for adaption. In addition to providing a better understanding of the organization, function, and evolution of a large and polyploid genome, the availability of a high-quality sequence anchored to genetic maps will accelerate the identification of genes underlying important agronomic traits. Copyright © 2014, American Association for the Advancement of Science.
The twin-arginine translocation pathway of Mycobacterium smegmatis is functional and required for the export of mycobacterial beta-lactamases.

PubMed

McDonough, Justin A; Hacker, Kari E; Flores, Anthony R; Pavelka, Martin S; Braunstein, Miriam

2005-11-01

The twin-arginine translocation (Tat) pathway exports folded proteins across the bacterial cytoplasmic membrane and is responsible for the proper extracytoplasmic localization of proteins involved in a variety of cellular functions, including pathogenesis. The Mycobacterium tuberculosis and Mycobacterium smegmatis genomes contain open reading frames with homology to components of the Tat export system (TatABC) as well as potential Tat-exported proteins possessing N-terminal signal sequences with the characteristic twin-arginine motif. Due to the importance of exported virulence factors in the pathogenesis of M. tuberculosis and the limited understanding of mycobacterial protein export systems, we sought to determine the functional nature of the Tat export pathway in mycobacteria. Here we describe phenotypic analyses of DeltatatA and DeltatatC deletion mutants of M. smegmatis, which demonstrated that tatA and tatC encode components of a functional Tat system capable of exporting characteristic Tat substrates. Both mutants displayed a growth defect on agar medium and hypersensitivity to sodium dodecyl sulfate. The mutants were also defective in the export of active beta-lactamases of M. smegmatis (BlaS) and M. tuberculosis (BlaC), both of which possess twin-arginine signal sequences. The Tat-dependent nature of BlaC was further revealed by mutation of the twin-arginine motif. Finally, we demonstrated that replacement of the native signal sequence of BlaC with the predicted Tat signal sequences of M. tuberculosis phospholipase C proteins (PlcA and PlcB) resulted in the Tat-dependent export of an enzymatically active 'BlaC. Thus, 'BlaC can be used as a genetic reporter for Tat-dependent export in mycobacteria.
Molecular characterization of KGH, the first human isolate of rabies virus in Korea.

PubMed

Park, Jun-Sun; Kim, Chi-Kyeong; Kim, Su Yeon; Ju, Young Ran

2013-04-01

The complete genome sequence of the KGH strain of the first human rabies virus, which was isolated from a skin biopsy of a patient with rabies, whose symptoms developed due to bites from a raccoon dog in 2001. The size of the KGH strain genome was determined to be 11,928 nucleotides (nt) with a leader sequence of 58 nt, nucleoprotein gene of 1,353 nt, phosphoprotein gene of 894 nt, matrix protein gene of 609 nt, glycoprotein gene of 1,575 nt, RNA-dependent RNA polymerase gene of 6,384 nt, and trailer region of 69 nt. Sequence similarity was compared with 39 fully sequenced rabies virus genomes currently available, and the result showed 70.6-91.6 % at the nucleotide level, and 82.8-97.9 % at the amino acid level. The deduced amino acids in the viral protein were compared with those of other rabies viruses, and various functional regions were investigated. As a result, we found that the KGH strain only had a unique amino acid substitution that was identified to be associated either with host immune response and pathogenicity in the N protein, or with a related region regulating STAT1 in the P protein, and related to pathogenicity in G protein. Based on phylogenetic analyses using the complete genome of 39 rabies viruses, the KGH strain was determined to be closely related with the NNV-RAB-H strain and transplant rabies virus serotype 1, which are Indian isolates, and was confirmed to belong to the Arctic-like 2 clade. The KGH strain was most closely related to the SKRRD0204HC and SKRRD0205HC strain when compared with Korean animal isolates, which was separated around the same time and place, and belonged to the Gangwon III subgroup.
Common Data Analysis Pipeline | Office of Cancer Clinical Proteomics Research

Cancer.gov

CPTAC supports analyses of the mass spectrometry raw data (mapping of spectra to peptide sequences and protein identification) for the public using a Common Data Analysis Pipeline (CDAP). The data types available on the public portal are described below. A general overview of this pipeline can be downloaded here. Mass Spectrometry Data Formats RAW (Vendor) Format
Phylogenetic selection of target species in Amaryllidaceae tribe Haemantheae for acetylcholinesterase inhibition and affinity to the serotonin reuptake transport protein

USDA-ARS?s Scientific Manuscript database

We present phylogenetic analyses of 37 taxa of Amaryllidaceae, tribe Haemantheae and Amaryllis belladonna L. as an outgroup, in order to provide a phylogenetic framework for the selection of candidate plants for lead discoveries in relation to Alzheimer´s disease and depression. DNA sequences from t...
P-type ATPase superfamily: evidence for critical roles for kingdom evolution.

PubMed

Okamura, Hideyuki; Denawa, Masatsugu; Ohniwa, Ryosuke; Takeyasu, Kunio

2003-04-01

The P-type ATPase has become a protein superfamily. On the basis of sequence similarities, the phylogenetic analyses, and substrate specificities, this superfamily can be classified into 5 families and 11 subfamilies. A comparative phylogenetic analysis demonstrates the relationship between the molecular evolution of these subfamilies and the establishment of the kingdoms of living things.
The maturation inhibitor bevirimat (PA-457) can be active in patients carrying HIV type-1 non-B subtypes and recombinants.

PubMed

Yebra, Gonzalo; Holguín, Africa

2008-01-01

Bevirimat (PA-457) is the first candidate of a new family of antiretroviral drugs, the maturation inhibitors. Its action is based on disruption of the protease cleavage of the Gag precursor region. Six resistance mutations have been described and analysed in virus from both treatment-naive and protease inhibitor (PI)-experienced patients, but only in the subtype B of HIV type-1 (HIV-1) virus. Thus, genotypic resistance in non-B subtypes still requires analysis. HIV-1 sequences of different subtypes (54 B, 81 non-B and recombinants) were analysed for the presence of resistance mutations to bevirimat, located within the capsid (CA) protein and spacer peptide 1 (SP1) cleavage site. No resistance mutations were found, although polymorphisms appeared in some CA-SP1 residues. The C-terminal CA protein and the N-terminal SP1 presented high conservation, whereas C-terminal SP1 was highly variable in sequence and length, with unknown influence in resistance acquisition. The results of the present study confirm an absolute conservation of the residues involved in bevirimat in vitro resistance in a large panel of HIV-1 subtypes and recombinants from both treatment-naive and PI-experienced patients. Treatment alone seemed to increase the polymorphisms account in CRF02_AG recombinant sequences; however, the influence of natural polymorphisms needs to be explored.
Spatial and temporal plasticity of chromatin during programmed DNA-reorganization in Stylonychia macronuclear development

PubMed Central

Postberg, Jan; Heyse, Katharina; Cremer, Marion; Cremer, Thomas; Lipps, Hans J

2008-01-01

Background: In this study we exploit the unique genome organization of ciliates to characterize the biological function of histone modification patterns and chromatin plasticity for the processing of specific DNA sequences during a nuclear differentiation process. Ciliates are single-cell eukaryotes containing two morphologically and functionally specialized types of nuclei, the somatic macronucleus and the germline micronucleus. In the course of sexual reproduction a new macronucleus develops from a micronuclear derivative. During this process specific DNA sequences are eliminated from the genome, while sequences that will be transcribed in the mature macronucleus are retained. Results: We show by immunofluorescence microscopy, Western analyses and chromatin immunoprecipitation (ChIP) experiments that each nuclear type establishes its specific histone modification signature. Our analyses reveal that the early macronuclear anlage adopts a permissive chromatin state immediately after the fusion of two heterochromatic germline micronuclei. As macronuclear development progresses, repressive histone modifications that specify sequences to be eliminated are introduced de novo. ChIP analyses demonstrate that permissive histone modifications are associated with sequences that will be retained in the new macronucleus. Furthermore, our data support the hypothesis that a PIWI-family protein is involved in a transnuclear cross-talk and in the RNAi-dependent control of developmental chromatin reorganization. Conclusion: Based on these data we present a comprehensive analysis of the spatial and temporal pattern of histone modifications during this nuclear differentiation process. Results obtained in this study may also be relevant for our understanding of chromatin plasticity during metazoan embryogenesis. PMID:19014664
Microdissection of black widow spider silk-producing glands.

PubMed

Jeffery, Felicia; La Mattina, Coby; Tuton-Blasingame, Tiffany; Hsia, Yang; Gnesa, Eric; Zhao, Liang; Franz, Andreas; Vierra, Craig

2011-01-11

Modern spiders spin high-performance silk fibers with a broad range of biological functions, including locomotion, prey capture and protection of developing offspring. Spiders accomplish these tasks by spinning several distinct fiber types that have diverse mechanical properties. Such specialization of fiber types has occurred through the evolution of different silk-producing glands, which function as small biofactories. These biofactories manufacture and store large quantities of silk proteins for fiber production. Through a complex series of biochemical events, these silk proteins are converted from a liquid into a solid material upon extrusion. Mechanical studies have demonstrated that spider silks are stronger than high-tensile steel. Analyses to understand the relationship between the structure and function of spider silk threads have revealed that spider silk consists largely of proteins, or fibroins, that have block repeats within their protein sequences. Common molecular signatures that contribute to the incredible tensile strength and extensibility of spider silks are being unraveled through the analyses of translated silk cDNAs. Given the extraordinary material properties of spider silks, research labs across the globe are racing to understand and mimic the spinning process to produce synthetic silk fibers for commercial, military and industrial applications. One of the main challenges to spinning artificial spider silk in the research lab involves a complete understanding of the biochemical processes that occur during extrusion of the fibers from the silk-producing glands. Here we present a method for the isolation of the seven different silk-producing glands from the cobweaving black widow spider, which includes the major and minor ampullate glands [manufactures dragline and scaffolding silk], tubuliform [synthesizes egg case silk], flagelliform [unknown function in cob-weavers], aggregate [makes glue silk], aciniform [synthesizes prey wrapping and egg case threads] and pyriform [produces attachment disc silk]. This approach is based upon anesthetizing the spider with carbon dioxide gas, subsequent separation of the cephalothorax from the abdomen, and microdissection of the abdomen to obtain the silk-producing glands. Following the separation of the different silk-producing glands, these tissues can be used to retrieve different macromolecules for distinct biochemical analyses, including quantitative real-time PCR, northern- and western blotting, mass spectrometry (MS or MS/MS) analyses to identify new silk protein sequences, search for proteins that participate in the silk assembly pathway, or use the intact tissue for cell culture or histological experiments.

Reassessment of the taxonomic position of Burkholderia andropogonis and description of Robbsia andropogonis gen. nov., comb. nov.

PubMed

Lopes-Santos, Lucilene; Castro, Daniel Bedo Assumpção; Ferreira-Tonin, Mariana; Corrêa, Daniele Bussioli Alves; Weir, Bevan Simon; Park, Duckchul; Ottoboni, Laura Maria Mariscal; Neto, Júlio Rodrigues; Destéfano, Suzete Aparecida Lanza

2017-06-01

The phylogenetic classification of the species Burkholderia andropogonis within the Burkholderia genus was reassessed using 16S rRNA gene phylogenetic analysis and multilocus sequence analysis (MLSA). Both phylogenetic trees revealed two main groups, named A and B, strongly supported by high bootstrap values (100%). Group A encompassed all of the Burkholderia species complex, whi.le Group B only comprised B. andropogonis species, with low percentage similarities with other species of the genus, from 92 to 95% for 16S rRNA gene sequences and 83% for conserved gene sequences. Average nucleotide identity (ANI), tetranucleotide signature frequency, and percentage of conserved proteins POCP analyses were also carried out, and in the three analyses B. andropogonis showed lower values when compared to the other Burkholderia species complex, near 71% for ANI, from 0.484 to 0.724 for tetranucleotide signature frequency, and around 50% for POCP, reinforcing the distance observed in the phylogenetic analyses. Our findings provide an important insight into the taxonomy of B. andropogonis. It is clear from the results that this bacterial species exhibits genotypic differences and represents a new genus described herein as Robbsia andropogonis gen. nov., comb. nov.
Whale song analyses using bioinformatics sequence analysis approaches

NASA Astrophysics Data System (ADS)

Chen, Yian A.; Almeida, Jonas S.; Chou, Lien-Siang

2005-04-01

Animal songs are frequently analyzed using discrete hierarchical units, such as units, themes and songs. Because animal songs and bio-sequences may be understood as analogous, bioinformatics analysis tools DNA/protein sequence alignment and alignment-free methods are proposed to quantify the theme similarities of the songs of false killer whales recorded off northeast Taiwan. The eighteen themes with discrete units that were identified in an earlier study [Y. A. Chen, masters thesis, University of Charleston, 2001] were compared quantitatively using several distance metrics. These metrics included the scores calculated using the Smith-Waterman algorithm with the repeated procedure; the standardized Euclidian distance and the angle metrics based on word frequencies. The theme classifications based on different metrics were summarized and compared in dendrograms using cluster analyses. The results agree with earlier classifications derived by human observation qualitatively. These methods further quantify the similarities among themes. These methods could be applied to the analyses of other animal songs on a larger scale. For instance, these techniques could be used to investigate song evolution and cultural transmission quantifying the dissimilarities of humpback whale songs across different seasons, years, populations, and geographic regions. [Work supported by SC Sea Grant, and Ilan County Government, Taiwan.
The Extracellular Heme-binding Protein HbpS from the Soil Bacterium Streptomyces reticuli Is an Aquo-cobalamin Binder*

PubMed Central

Ortiz de Orué Lucana, Darío; Fedosov, Sergey N.; Wedderhoff, Ina; Che, Edith N.; Torda, Andrew E.

2014-01-01

The extracellular protein HbpS from Streptomyces reticuli interacts with iron ions and heme. It also acts in concert with the two-component sensing system SenS-SenR in response to oxidative stress. Sequence comparisons suggested that the protein may bind a cobalamin. UV-visible spectroscopy confirmed binding (Kd = 34 μm) to aquo-cobalamin (H2OCbl+) but not to other cobalamins. Competition experiments with the H2OCbl+-coordinating ligand CN− and comparison of mutants identified a histidine residue (His-156) that coordinates the cobalt ion of H2OCbl+ and substitutes for water. HbpS·Cobalamin lacks the Asp-X-His-X-X-Gly motif seen in some cobalamin binding enzymes. Preliminary tests showed that a related HbpS protein from a different species also binds H2OCbl+. Furthermore, analyses of HbpS-heme binding kinetics are consistent with the role of HbpS as a heme-sensor and suggested a role in heme transport. Given the high occurrence of HbpS-like sequences among Gram-positive and Gram-negative bacteria, our findings suggest a great functional versatility among these proteins. PMID:25342754
Folding and Stabilization of Native-Sequence-Reversed Proteins

PubMed Central

Zhang, Yuanzhao; Weber, Jeffrey K; Zhou, Ruhong

2016-01-01

Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols. PMID:27113844
Folding and Stabilization of Native-Sequence-Reversed Proteins

NASA Astrophysics Data System (ADS)

Zhang, Yuanzhao; Weber, Jeffrey K.; Zhou, Ruhong

2016-04-01

Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols.
Production of a reference transcriptome and transcriptomic database (EdwardsiellaBase) for the lined sea anemone, Edwardsiella lineata, a parasitic cnidarian

PubMed Central

2014-01-01

Background The lined sea anemone Edwardsiella lineata is an informative model system for evolutionary-developmental studies of parasitism. In this species, it is possible to compare alternate developmental pathways leading from a larva to either a free-living polyp or a vermiform parasite that inhabits the mesoglea of a ctenophore host. Additionally, E. lineata is confamilial with the model cnidarian Nematostella vectensis, providing an opportunity for comparative genomic, molecular and organismal studies. Description We generated a reference transcriptome for E. lineata via high-throughput sequencing of RNA isolated from five developmental stages (parasite; parasite-to-larva transition; larva; larva-to-adult transition; adult). The transcriptome comprises 90,440 contigs assembled from >15 billion nucleotides of DNA sequence. Using a molecular clock approach, we estimated the divergence between E. lineata and N. vectensis at 215–364 million years ago. Based on gene ontology and metabolic pathway analyses and gene family surveys (bHLH-PAS, deiodinases, Fox genes, LIM homeodomains, minicollagens, nuclear receptors, Sox genes, and Wnts), the transcriptome of E. lineata is comparable in depth and completeness to N. vectensis. Analyses of protein motifs and revealed extensive conservation between the proteins of these two edwardsiid anemones, although we show the NF-κB protein of E. lineata reflects the ancestral structure, while the NF-κB protein of N. vectensis has undergone a split that separates the DNA-binding domain from the inhibitory domain. All contigs have been deposited in a public database (EdwardsiellaBase), where they may be searched according to contig ID, gene ontology, protein family motif (Pfam), enzyme commission number, and BLAST. The alignment of the raw reads to the contigs can also be visualized via JBrowse. Conclusions The transcriptomic data and database described here provide a platform for studying the evolutionary developmental genomics of a derived parasitic life cycle. In addition, these data from E. lineata will aid in the interpretation of evolutionary novelties in gene sequence or structure that have been reported for the model cnidarian N. vectensis (e.g., the split NF-κB locus). Finally, we include custom computational tools to facilitate the annotation of a transcriptome based on high-throughput sequencing data obtained from a “non-model system.” PMID:24467778
Production of a reference transcriptome and transcriptomic database (EdwardsiellaBase) for the lined sea anemone, Edwardsiella lineata, a parasitic cnidarian.

PubMed

Stefanik, Derek J; Lubinski, Tristan J; Granger, Brian R; Byrd, Allyson L; Reitzel, Adam M; DeFilippo, Lukas; Lorenc, Allison; Finnerty, John R

2014-01-28

The lined sea anemone Edwardsiella lineata is an informative model system for evolutionary-developmental studies of parasitism. In this species, it is possible to compare alternate developmental pathways leading from a larva to either a free-living polyp or a vermiform parasite that inhabits the mesoglea of a ctenophore host. Additionally, E. lineata is confamilial with the model cnidarian Nematostella vectensis, providing an opportunity for comparative genomic, molecular and organismal studies. We generated a reference transcriptome for E. lineata via high-throughput sequencing of RNA isolated from five developmental stages (parasite; parasite-to-larva transition; larva; larva-to-adult transition; adult). The transcriptome comprises 90,440 contigs assembled from >15 billion nucleotides of DNA sequence. Using a molecular clock approach, we estimated the divergence between E. lineata and N. vectensis at 215-364 million years ago. Based on gene ontology and metabolic pathway analyses and gene family surveys (bHLH-PAS, deiodinases, Fox genes, LIM homeodomains, minicollagens, nuclear receptors, Sox genes, and Wnts), the transcriptome of E. lineata is comparable in depth and completeness to N. vectensis. Analyses of protein motifs and revealed extensive conservation between the proteins of these two edwardsiid anemones, although we show the NF-κB protein of E. lineata reflects the ancestral structure, while the NF-κB protein of N. vectensis has undergone a split that separates the DNA-binding domain from the inhibitory domain. All contigs have been deposited in a public database (EdwardsiellaBase), where they may be searched according to contig ID, gene ontology, protein family motif (Pfam), enzyme commission number, and BLAST. The alignment of the raw reads to the contigs can also be visualized via JBrowse. The transcriptomic data and database described here provide a platform for studying the evolutionary developmental genomics of a derived parasitic life cycle. In addition, these data from E. lineata will aid in the interpretation of evolutionary novelties in gene sequence or structure that have been reported for the model cnidarian N. vectensis (e.g., the split NF-κB locus). Finally, we include custom computational tools to facilitate the annotation of a transcriptome based on high-throughput sequencing data obtained from a "non-model system."
First Transcriptome and Digital Gene Expression Analysis in Neuroptera with an Emphasis on Chemoreception Genes in Chrysopa pallens (Rambur)

PubMed Central

Li, Zhao-Qun; Zhang, Shuai; Ma, Yan; Luo, Jun-Yu; Wang, Chun-Yi; Lv, Li-Min; Dong, Shuang-Lin; Cui, Jin-Jie

2013-01-01

Background Chrysopa pallens (Rambur) are the most important natural enemies and predators of various agricultural pests. Understanding the sophisticated olfactory system in insect antennae is crucial for studying the physiological bases of olfaction and also could lead to effective applications of C. pallens in integrated pest management. However no transcriptome information is available for Neuroptera, and sequence data for C. pallens are scarce, so obtaining more sequence data is a priority for researchers on this species. Results To facilitate identifying sets of genes involved in olfaction, a normalized transcriptome of C. pallens was sequenced. A total of 104,603 contigs were obtained and assembled into 10,662 clusters and 39,734 singletons; 20,524 were annotated based on BLASTX analyses. A large number of candidate chemosensory genes were identified, including 14 odorant-binding proteins (OBPs), 22 chemosensory proteins (CSPs), 16 ionotropic receptors, 14 odorant receptors, and genes potentially involved in olfactory modulation. To better understand the OBPs, CSPs and cytochrome P450s, phylogenetic trees were constructed. In addition, 10 digital gene expression libraries of different tissues were constructed and gene expression profiles were compared among different tissues in males and females. Conclusions Our results provide a basis for exploring the mechanisms of chemoreception in C. pallens, as well as other insects. The evolutionary analyses in our study provide new insights into the differentiation and evolution of insect OBPs and CSPs. Our study provided large-scale sequence information for further studies in C. pallens. PMID:23826220
FragIdent--automatic identification and characterisation of cDNA-fragments.

PubMed

Seelow, Dominik; Goehler, Heike; Hoffmann, Katrin

2009-03-02

Many genetic studies and functional assays are based on cDNA fragments. After the generation of cDNA fragments from an mRNA sample, their content is at first unknown and must be assigned by sequencing reactions or hybridisation experiments. Even in characterised libraries, a considerable number of clones are wrongly annotated. Furthermore, mix-ups can happen in the laboratory. It is therefore essential to the relevance of experimental results to confirm or determine the identity of the employed cDNA fragments. However, the manual approach for the characterisation of these fragments using BLAST web interfaces is not suited for larger number of sequences and so far, no user-friendly software is publicly available. Here we present the development of FragIdent, an application for the automatic identification of open reading frames (ORFs) within cDNA-fragments. The software performs BLAST analyses to identify the genes represented by the sequences and suggests primers to complete the sequencing of the whole insert. Gene-specific information as well as the protein domains encoded by the cDNA fragment are retrieved from Internet-based databases and included in the output. The application features an intuitive graphical interface and is designed for researchers without any bioinformatics skills. It is suited for projects comprising up to several hundred different clones. We used FragIdent to identify 84 cDNA clones from a yeast two-hybrid experiment. Furthermore, we identified 131 protein domains within our analysed clones. The source code is freely available from our homepage at http://compbio.charite.de/genetik/FragIdent/.
Comparative oncology DNA sequencing of canine T cell lymphoma via human hotspot panel

PubMed Central

Beheshti, Afshin; Pilichowska, Monika; Burgess, Kristine; Ricks-Santi, Luisel; McNiel, Elizabeth; London, Cheryl B.; Ravi, Dashnamoorthy; Evens, Andrew M.

2018-01-01

T-cell lymphoma (TCL) is an uncommon and aggressive form of human cancer. Lymphoma is the most common hematopoietic tumor in canines (companion animals), with TCL representing approximately 30% of diagnoses. Collectively, the canine is an appealing model for cancer research given the spontaneous occurrence of cancer, intact immune system, and phytogenetic proximity to humans. We sought to establish mutational congruence of the canine with known human TCL mutations in order to identify potential actionable oncogenic pathways. Following pathologic confirmation, DNA was sequenced in 16 canine TCL (cTCL) cases using a custom Human Cancer Hotspot Panel of 68 genes commonly mutated in human TCL. Sequencing identified 4,527,638 total reads with average length of 229 bases containing 346 unique variants and 1,474 total variants; each sample had an average of 92 variants. Among these, there were 258 germline and 32 somatic variants. Among the 32 somatic variants there were 8 missense variants, 1 splice junction variant and the remaining were intron or synonymous variants. A frequency of 4 somatic mutations per sample were noted with >7 mutations detected in MET, KDR, STK11 and BRAF. Expression of these associated proteins were also detected via Western blot analyses. In addition, Sanger sequencing confirmed three variants of high quality (MYC, MET, and TP53 missense mutation). Taken together, the mutational spectrum and protein analyses showed mutations in signaling pathways similar to human TCL and also identified novel mutations that may serve as drug targets as well as potential biomarkers. PMID:29854308
Shedding new light on opsin evolution

PubMed Central

Porter, Megan L.; Blasic, Joseph R.; Bok, Michael J.; Cameron, Evan G.; Pringle, Thomas; Cronin, Thomas W.; Robinson, Phyllis R.

2012-01-01

Opsin proteins are essential molecules in mediating the ability of animals to detect and use light for diverse biological functions. Therefore, understanding the evolutionary history of opsins is key to understanding the evolution of light detection and photoreception in animals. As genomic data have appeared and rapidly expanded in quantity, it has become possible to analyse opsins that functionally and histologically are less well characterized, and thus to examine opsin evolution strictly from a genetic perspective. We have incorporated these new data into a large-scale, genome-based analysis of opsin evolution. We use an extensive phylogeny of currently known opsin sequence diversity as a foundation for examining the evolutionary distributions of key functional features within the opsin clade. This new analysis illustrates the lability of opsin protein-expression patterns, site-specific functionality (i.e. counterion position) and G-protein binding interactions. Further, it demonstrates the limitations of current model organisms, and highlights the need for further characterization of many of the opsin sequence groups with unknown function. PMID:22012981
Genome sequences and comparative genomics of two Lactobacillus ruminis strains from the bovine and human intestinal tracts

PubMed Central

2011-01-01

Background The genus Lactobacillus is characterized by an extraordinary degree of phenotypic and genotypic diversity, which recent genomic analyses have further highlighted. However, the choice of species for sequencing has been non-random and unequal in distribution, with only a single representative genome from the L. salivarius clade available to date. Furthermore, there is no data to facilitate a functional genomic analysis of motility in the lactobacilli, a trait that is restricted to the L. salivarius clade. Results The 2.06 Mb genome of the bovine isolate Lactobacillus ruminis ATCC 27782 comprises a single circular chromosome, and has a G+C content of 44.4%. In silico analysis identified 1901 coding sequences, including genes for a pediocin-like bacteriocin, a single large exopolysaccharide-related cluster, two sortase enzymes, two CRISPR loci and numerous IS elements and pseudogenes. A cluster of genes related to a putative pilin was identified, and shown to be transcribed in vitro. A high quality draft assembly of the genome of a second L. ruminis strain, ATCC 25644 isolated from humans, suggested a slightly larger genome of 2.138 Mb, that exhibited a high degree of synteny with the ATCC 27782 genome. In contrast, comparative analysis of L. ruminis and L. salivarius identified a lack of long-range synteny between these closely related species. Comparison of the L. salivarius clade core proteins with those of nine other Lactobacillus species distributed across 4 major phylogenetic groups identified the set of shared proteins, and proteins unique to each group. Conclusions The genome of L. ruminis provides a comparative tool for directing functional analyses of other members of the L. salivarius clade, and it increases understanding of the divergence of this distinct Lactobacillus lineage from other commensal lactobacilli. The genome sequence provides a definitive resource to facilitate investigation of the genetics, biochemistry and host interactions of these motile intestinal lactobacilli. PMID:21995554
Contribution of the first K-homology domain of poly(C)-binding protein 1 to its affinity and specificity for C-rich oligonucleotides

PubMed Central

Yoga, Yano M. K.; Traore, Daouda A. K.; Sidiqi, Mahjooba; Szeto, Chris; Pendini, Nicole R.; Barker, Andrew; Leedman, Peter J.; Wilce, Jacqueline A.; Wilce, Matthew C. J.

2012-01-01

Poly-C-binding proteins are triple KH (hnRNP K homology) domain proteins with specificity for single stranded C-rich RNA and DNA. They play diverse roles in the regulation of protein expression at both transcriptional and translational levels. Here, we analyse the contributions of individual αCP1 KH domains to binding C-rich oligonucleotides using biophysical and structural methods. Using surface plasmon resonance (SPR), we demonstrate that KH1 makes the most stable interactions with both RNA and DNA, KH3 binds with intermediate affinity and KH2 only interacts detectibly with DNA. The crystal structure of KH1 bound to a 5′-CCCTCCCT-3′ DNA sequence shows a 2:1 protein:DNA stoichiometry and demonstrates a molecular arrangement of KH domains bound to immediately adjacent oligonucleotide target sites. SPR experiments, with a series of poly-C-sequences reveals that cytosine is preferred at all four positions in the oligonucleotide binding cleft and that a C-tetrad binds KH1 with 10 times higher affinity than a C-triplet. The basis for this high affinity interaction is finally detailed with the structure determination of a KH1.W.C54S mutant bound to 5′-ACCCCA-3′ DNA sequence. Together, these data establish the lead role of KH1 in oligonucleotide binding by αCP1 and reveal the molecular basis of its specificity for a C-rich tetrad. PMID:22344691
Contribution of the first K-homology domain of poly(C)-binding protein 1 to its affinity and specificity for C-rich oligonucleotides.

PubMed

Yoga, Yano M K; Traore, Daouda A K; Sidiqi, Mahjooba; Szeto, Chris; Pendini, Nicole R; Barker, Andrew; Leedman, Peter J; Wilce, Jacqueline A; Wilce, Matthew C J

2012-06-01

Poly-C-binding proteins are triple KH (hnRNP K homology) domain proteins with specificity for single stranded C-rich RNA and DNA. They play diverse roles in the regulation of protein expression at both transcriptional and translational levels. Here, we analyse the contributions of individual αCP1 KH domains to binding C-rich oligonucleotides using biophysical and structural methods. Using surface plasmon resonance (SPR), we demonstrate that KH1 makes the most stable interactions with both RNA and DNA, KH3 binds with intermediate affinity and KH2 only interacts detectibly with DNA. The crystal structure of KH1 bound to a 5'-CCCTCCCT-3' DNA sequence shows a 2:1 protein:DNA stoichiometry and demonstrates a molecular arrangement of KH domains bound to immediately adjacent oligonucleotide target sites. SPR experiments, with a series of poly-C-sequences reveals that cytosine is preferred at all four positions in the oligonucleotide binding cleft and that a C-tetrad binds KH1 with 10 times higher affinity than a C-triplet. The basis for this high affinity interaction is finally detailed with the structure determination of a KH1.W.C54S mutant bound to 5'-ACCCCA-3' DNA sequence. Together, these data establish the lead role of KH1 in oligonucleotide binding by αCP1 and reveal the molecular basis of its specificity for a C-rich tetrad.
An experimental phylogeny to benchmark ancestral sequence reconstruction

PubMed Central

Randall, Ryan N.; Radford, Caelan E.; Roof, Kelsey A.; Natarajan, Divya K.; Gaucher, Eric A.

2016-01-01

Ancestral sequence reconstruction (ASR) is a still-burgeoning method that has revealed many key mechanisms of molecular evolution. One criticism of the approach is an inability to validate its algorithms within a biological context as opposed to a computer simulation. Here we build an experimental phylogeny using the gene of a single red fluorescent protein to address this criticism. The evolved phylogeny consists of 19 operational taxonomic units (leaves) and 17 ancestral bifurcations (nodes) that display a wide variety of fluorescent phenotypes. The 19 leaves then serve as ‘modern' sequences that we subject to ASR analyses using various algorithms and to benchmark against the known ancestral genotypes and ancestral phenotypes. We confirm computer simulations that show all algorithms infer ancient sequences with high accuracy, yet we also reveal wide variation in the phenotypes encoded by incorrectly inferred sequences. Specifically, Bayesian methods incorporating rate variation significantly outperform the maximum parsimony criterion in phenotypic accuracy. Subsampling of extant sequences had minor effect on the inference of ancestral sequences. PMID:27628687
Organizing, exploring, and analyzing antibody sequence data: the case for relational-database managers.

PubMed

Owens, John

2009-01-01

Technological advances in the acquisition of DNA and protein sequence information and the resulting onrush of data can quickly overwhelm the scientist unprepared for the volume of information that must be evaluated and carefully dissected to discover its significance. Few laboratories have the luxury of dedicated personnel to organize, analyze, or consistently record a mix of arriving sequence data. A methodology based on a modern relational-database manager is presented that is both a natural storage vessel for antibody sequence information and a conduit for organizing and exploring sequence data and accompanying annotation text. The expertise necessary to implement such a plan is equal to that required by electronic word processors or spreadsheet applications. Antibody sequence projects maintained as independent databases are selectively unified by the relational-database manager into larger database families that contribute to local analyses, reports, interactive HTML pages, or exported to facilities dedicated to sophisticated sequence analysis techniques. Database files are transposable among current versions of Microsoft, Macintosh, and UNIX operating systems.
Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

PubMed

Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

2017-06-01

Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
An asparagine residue at the N-terminus affects the maturation process of low molecular weight glutenin subunits of wheat endosperm

PubMed Central

2014-01-01

Background Wheat glutenin polymers are made up of two main subunit types, the high- (HMW-GS) and low- (LMW-GS) molecular weight subunits. These latter are represented by heterogeneous proteins. The most common, based on the first amino acid of the mature sequence, are known as LMW-m and LMW-s types. The mature sequences differ as a consequence of three extra amino acids (MET-) at the N-terminus of LMW-m types. The nucleotide sequences of their encoding genes are, however, nearly identical, so that the relationship between gene and protein sequences is difficult to ascertain. It has been hypothesized that the presence of an asparagine residue in position 23 of the complete coding sequence for the LMW-s type might account for the observed three-residue shortened sequence, as a consequence of cleavage at the asparagine by an asparaginyl endopeptidase. Results We performed site-directed mutagenesis of a LMW-s gene to replace asparagine at position 23 with threonine and thus convert it to a candidate LMW-m type gene. Similarly, a candidate LMW-m type gene was mutated at position 23 to replace threonine with asparagine. Next, we produced transgenic durum wheat (cultivar Svevo) lines by introducing the mutated versions of the LMW-m and LMW-s genes, along with the wild type counterpart of the LMW-m gene. Proteomic comparisons between the transgenic and null segregant plants enabled identification of transgenic proteins by mass spectrometry analyses and Edman N-terminal sequencing. Conclusions Our results show that the formation of LMW-s type relies on the presence of an asparagine residue close to the N-terminus generated by signal peptide cleavage, and that LMW-GS can be quantitatively processed most likely by vacuolar asparaginyl endoproteases, suggesting that those accumulated in the vacuole are not sequestered into stable aggregates that would hinder the action of proteolytic enzymes. Rather, whatever is the mechanism of glutenin polymer transport to the vacuole, the proteins remain available for proteolytic processing, and can be converted to the mature form by the removal of a short N-terminal sequence. PMID:24629124
Bioinformatics and the allergy assessment of agricultural biotechnology products: industry practices and recommendations.

PubMed

Ladics, Gregory S; Cressman, Robert F; Herouet-Guicheney, Corinne; Herman, Rod A; Privalle, Laura; Song, Ping; Ward, Jason M; McClain, Scott

2011-06-01

Bioinformatic tools are being increasingly utilized to evaluate the degree of similarity between a novel protein and known allergens within the context of a larger allergy safety assessment process. Importantly, bioinformatics is not a predictive analysis that can determine if a novel protein will ''become" an allergen, but rather a tool to assess whether the protein is a known allergen or is potentially cross-reactive with an existing allergen. Bioinformatic tools are key components of the 2009 CodexAlimentarius Commission's weight-of-evidence approach, which encompasses a variety of experimental approaches for an overall assessment of the allergenic potential of a novel protein. Bioinformatic search comparisons between novel protein sequences, as well as potential novel fusion sequences derived from the genome and transgene, and known allergens are required by all regulatory agencies that assess the safety of genetically modified (GM) products. The objective of this paper is to identify opportunities for consensus in the methods of applying bioinformatics and to outline differences that impact a consistent and reliable allergy safety assessment. The bioinformatic comparison process has some critical features, which are outlined in this paper. One of them is a curated, publicly available and well-managed database with known allergenic sequences. In this paper, the best practices, scientific value, and food safety implications of bioinformatic analyses, as they are applied to GM food crops are discussed. Recommendations for conducting bioinformatic analysis on novel food proteins for potential cross-reactivity to known allergens are also put forth. Copyright © 2011 Elsevier Inc. All rights reserved.
Predicting helix–helix interactions from residue contacts in membrane proteins

PubMed Central

Lo, Allan; Chiu, Yi-Yuan; Rødland, Einar Andreas; Lyu, Ping-Chiang; Sung, Ting-Yi; Hsu, Wen-Lian

2009-01-01

Motivation: Helix–helix interactions play a critical role in the structure assembly, stability and function of membrane proteins. On the molecular level, the interactions are mediated by one or more residue contacts. Although previous studies focused on helix-packing patterns and sequence motifs, few of them developed methods specifically for contact prediction. Results: We present a new hierarchical framework for contact prediction, with an application in membrane proteins. The hierarchical scheme consists of two levels: in the first level, contact residues are predicted from the sequence and their pairing relationships are further predicted in the second level. Statistical analyses on contact propensities are combined with other sequence and structural information for training the support vector machine classifiers. Evaluated on 52 protein chains using leave-one-out cross validation (LOOCV) and an independent test set of 14 protein chains, the two-level approach consistently improves the conventional direct approach in prediction accuracy, with 80% reduction of input for prediction. Furthermore, the predicted contacts are then used to infer interactions between pairs of helices. When at least three predicted contacts are required for an inferred interaction, the accuracy, sensitivity and specificity are 56%, 40% and 89%, respectively. Our results demonstrate that a hierarchical framework can be applied to eliminate false positives (FP) while reducing computational complexity in predicting contacts. Together with the estimated contact propensities, this method can be used to gain insights into helix-packing in membrane proteins. Availability: http://bio-cluster.iis.sinica.edu.tw/TMhit/ Contact: tsung@iis.sinica.edu.tw Supplementary information:Supplementary data are available at Bioinformatics online. PMID:19244388

Exploring synonymous codon usage preferences of disulfide-bonded and non-disulfide bonded cysteines in the E. coli genome.

PubMed

Song, Jiangning; Wang, Minglei; Burrage, Kevin

2006-07-21

High-quality data about protein structures and their gene sequences are essential to the understanding of the relationship between protein folding and protein coding sequences. Firstly we constructed the EcoPDB database, which is a high-quality database of Escherichia coli genes and their corresponding PDB structures. Based on EcoPDB, we presented a novel approach based on information theory to investigate the correlation between cysteine synonymous codon usages and local amino acids flanking cysteines, the correlation between cysteine synonymous codon usages and synonymous codon usages of local amino acids flanking cysteines, as well as the correlation between cysteine synonymous codon usages and the disulfide bonding states of cysteines in the E. coli genome. The results indicate that the nearest neighboring residues and their synonymous codons of the C-terminus have the greatest influence on the usages of the synonymous codons of cysteines and the usage of the synonymous codons has a specific correlation with the disulfide bond formation of cysteines in proteins. The correlations may result from the regulation mechanism of protein structures at gene sequence level and reflect the biological function restriction that cysteines pair to form disulfide bonds. The results may also be helpful in identifying residues that are important for synonymous codon selection of cysteines to introduce disulfide bridges in protein engineering and molecular biology. The approach presented in this paper can also be utilized as a complementary computational method and be applicable to analyse the synonymous codon usages in other model organisms.
A 250 plastome phylogeny of the grass family (Poaceae): topological support under different data partitions

PubMed Central

Burke, Sean V.; Wysocki, William P.; Clark, Lynn G.

2018-01-01

The systematics of grasses has advanced through applications of plastome phylogenomics, although studies have been largely limited to subfamilies or other subgroups of Poaceae. Here we present a plastome phylogenomic analysis of 250 complete plastomes (179 genera) sampled from 44 of the 52 tribes of Poaceae. Plastome sequences were determined from high throughput sequencing libraries and the assemblies represent over 28.7 Mbases of sequence data. Phylogenetic signal was characterized in 14 partitions, including (1) complete plastomes; (2) protein coding regions; (3) noncoding regions; and (4) three loci commonly used in single and multi-gene studies of grasses. Each of the four main partitions was further refined, alternatively including or excluding positively selected codons and also the gaps introduced by the alignment. All 76 protein coding plastome loci were found to be predominantly under purifying selection, but specific codons were found to be under positive selection in 65 loci. The loci that have been widely used in multi-gene phylogenetic studies had among the highest proportions of positively selected codons, suggesting caution in the interpretation of these earlier results. Plastome phylogenomic analyses confirmed the backbone topology for Poaceae with maximum bootstrap support (BP). Among the 14 analyses, 82 clades out of 309 resolved were maximally supported in all trees. Analyses of newly sequenced plastomes were in agreement with current classifications. Five of seven partitions in which alignment gaps were removed retrieved Panicoideae as sister to the remaining PACMAD subfamilies. Alternative topologies were recovered in trees from partitions that included alignment gaps. This suggests that ambiguities in aligning these uncertain regions might introduce a false signal. Resolution of these and other critical branch points in the phylogeny of Poaceae will help to better understand the selective forces that drove the radiation of the BOP and PACMAD clades comprising more than 99.9% of grass diversity. PMID:29416954
A transcriptome resource for the koala (Phascolarctos cinereus): insights into koala retrovirus transcription and sequence diversity.

PubMed

Hobbs, Matthew; Pavasovic, Ana; King, Andrew G; Prentis, Peter J; Eldridge, Mark D B; Chen, Zhiliang; Colgan, Donald J; Polkinghorne, Adam; Wilkins, Marc R; Flanagan, Cheyne; Gillett, Amber; Hanger, Jon; Johnson, Rebecca N; Timms, Peter

2014-09-11

The koala, Phascolarctos cinereus, is a biologically unique and evolutionarily distinct Australian arboreal marsupial. The goal of this study was to sequence the transcriptome from several tissues of two geographically separate koalas, and to create the first comprehensive catalog of annotated transcripts for this species, enabling detailed analysis of the unique attributes of this threatened native marsupial, including infection by the koala retrovirus. RNA-Seq data was generated from a range of tissues from one male and one female koala and assembled de novo into transcripts using Velvet-Oases. Transcript abundance in each tissue was estimated. Transcripts were searched for likely protein-coding regions and a non-redundant set of 117,563 putative protein sequences was produced. In similarity searches there were 84,907 (72%) sequences that aligned to at least one sequence in the NCBI nr protein database. The best alignments were to sequences from other marsupials. After applying a reciprocal best hit requirement of koala sequences to those from tammar wallaby, Tasmanian devil and the gray short-tailed opossum, we estimate that our transcriptome dataset represents approximately 15,000 koala genes. The marsupial alignment information was used to look for potential gene duplications and we report evidence for copy number expansion of the alpha amylase gene, and of an aldehyde reductase gene.Koala retrovirus (KoRV) transcripts were detected in the transcriptomes. These were analysed in detail and the structure of the spliced envelope gene transcript was determined. There was appreciable sequence diversity within KoRV, with 233 sites in the KoRV genome showing small insertions/deletions or single nucleotide polymorphisms. Both koalas had sequences from the KoRV-A subtype, but the male koala transcriptome has, in addition, sequences more closely related to the KoRV-B subtype. This is the first report of a KoRV-B-like sequence in a wild population. This transcriptomic dataset is a useful resource for molecular genetic studies of the koala, for evolutionary genetic studies of marsupials, for validation and annotation of the koala genome sequence, and for investigation of koala retrovirus. Annotated transcripts can be browsed and queried at http://koalagenome.org.
Ser/Thr Motifs in Transmembrane Proteins: Conservation Patterns and Effects on Local Protein Structure and Dynamics

PubMed Central

del Val, Coral; White, Stephen H.

2014-01-01

We combined systematic bioinformatics analyses and molecular dynamics simulations to assess the conservation patterns of Ser and Thr motifs in membrane proteins, and the effect of such motifs on the structure and dynamics of α-helical transmembrane (TM) segments. We find that Ser/Thr motifs are often present in β-barrel TM proteins. At least one Ser/Thr motif is present in almost half of the sequences of α-helical proteins analyzed here. The extensive bioinformatics analyses and inspection of protein structures led to the identification of molecular transporters with noticeable numbers of Ser/Thr motifs within the TM region. Given the energetic penalty for burying multiple Ser/Thr groups in the membrane hydrophobic core, the observation of transporters with multiple membrane-embedded Ser/Thr is intriguing and raises the question of how the presence of multiple Ser/Thr affects protein local structure and dynamics. Molecular dynamics simulations of four different Ser-containing model TM peptides indicate that backbone hydrogen bonding of membrane-buried Ser/Thr hydroxyl groups can significantly change the local structure and dynamics of the helix. Ser groups located close to the membrane interface can hydrogen bond to solvent water instead of protein backbone, leading to an enhanced local solvation of the peptide. PMID:22836667
Venom gland transcriptomic and venom proteomic analyses of the scorpion Megacormus gertschi Díaz-Najera, 1966 (Scorpiones: Euscorpiidae: Megacorminae).

PubMed

Santibáñez-López, Carlos E; Cid-Uribe, Jimena I; Zamudio, Fernando Z; Batista, Cesar V F; Ortiz, Ernesto; Possani, Lourival D

2017-07-01

The soluble venom from the Mexican scorpion Megacormus gertschi of the family Euscorpiidae was obtained and its biological effects were tested in several animal models. This venom is not toxic to mice at doses of 100 μg per 20 g of mouse weight, while being lethal to arthropods (insects and crustaceans), at doses of 20 μg (for crickets) and 100 μg (for shrimps) per animal. Samples of the venom were separated by high performance liquid chromatography and circa 80 distinct chromatographic fractions were obtained from which 67 components have had their molecular weights determined by mass spectrometry analysis. The N-terminal amino acid sequence of seven protein/peptides were obtained by Edman degradation and are reported. Among the high molecular weight components there are enzymes with experimentally-confirmed phospholipase activity. A pair of telsons from this scorpion species was dissected, from which total RNA was extracted and used for cDNA library construction. Massive sequencing by the Illumina protocol, followed by de novo assembly, resulted in a total of 110,528 transcripts. From those, we were able to annotate 182, which putatively code for peptides/proteins with sequence similarity to previously-reported venom components available from different protein databases. Transcripts seemingly coding for enzymes showed the richest diversity, with 52 sequences putatively coding for proteases, 20 for phospholipases, 8 for lipases and 5 for hyaluronidases. The number of different transcripts potentially coding for peptides with sequence similarity to those that affect ion channels was 19, for putative antimicrobial peptides 19, and for protease inhibitor-like peptides, 18. Transcripts seemingly coding for other venom components were identified and described. The LC/MS analysis of a trypsin-digested venom aliquot resulted in 23 matches with the translated transcriptome database, which validates the transcriptome. The proteomic and transcriptomic analyses reported here constitute the first approach to study the venom components from a scorpion species belonging to the family Euscorpiidae. The data certainly show that this venom is different from all the ones described thus far in the literature. Copyright © 2017 Elsevier Ltd. All rights reserved.
Identification and characterization of a fruit-specific, thaumatin-like protein that accumulates at very high levels in conjunction with the onset of sugar accumulation and berry softening in grapes.

PubMed Central

Tattersall, D B; van Heeswijck, R; Høj, P B

1997-01-01

The protein composition of the grape (Vitis vinifera cv Muscat of Alexandria) berry was examined from flowering to ripeness by gel electrophoresis. A protein with an apparent molecular mass of 24 kD, which was one of the most abundant proteins in extracts of mature berries, was purified and identified by amino acid sequence to be a thaumatin-like protein. Combined cDNA sequence analysis and electrospray mass spectrometry revealed that this protein, VVTL1 (for V. vinifera thaumatin-like protein 1), is synthesized with a transient signal peptide as seen for apoplastic preproteins. Apart from the removal of the targeting signal and the formation of eight disulfide bonds, VVTL1 undergoes no other posttranslational modification. Southern, northern, and western analyses revealed that VVTL1 is found in the berry only and is encoded by a single gene that is expressed in conjunction with the onset of sugar accumulation and softening. The exact role of VVTL1 is unknown, but the timing of its accumulation correlates with the inability of the fungal pathogen powdery mildew (Uncinula necator) to initiate new infections of the berry. Western analysis revealed that the presence of thaumatin-like proteins in ripening fruit might be a widespread phenomenon. PMID:9232867
Sequence Complexity of Amyloidogenic Regions in Intrinsically Disordered Human Proteins

PubMed Central

Das, Swagata; Pal, Uttam; Das, Supriya; Bagga, Khyati; Roy, Anupam; Mrigwani, Arpita; Maiti, Nakul C.

2014-01-01

An amyloidogenic region (AR) in a protein sequence plays a significant role in protein aggregation and amyloid formation. We have investigated the sequence complexity of AR that is present in intrinsically disordered human proteins. More than 80% human proteins in the disordered protein databases (DisProt+IDEAL) contained one or more ARs. With decrease of protein disorder, AR content in the protein sequence was decreased. A probability density distribution analysis and discrete analysis of AR sequences showed that ∼8% residue in a protein sequence was in AR and the region was in average 8 residues long. The residues in the AR were high in sequence complexity and it seldom overlapped with low complexity regions (LCR), which was largely abundant in disorder proteins. The sequences in the AR showed mixed conformational adaptability towards α-helix, β-sheet/strand and coil conformations. PMID:24594841
Proteome Studies of Filamentous Fungi

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, Scott E.; Panisko, Ellen A.

2011-04-20

The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide breadth of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, non-gel basedmore » proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of different variations on the general method and technologies for identifying peptides in a given sample. We present a method that can serve as a “baseline” for proteomic studies of fungi.« less
Proteome studies of filamentous fungi.

PubMed

Baker, Scott E; Panisko, Ellen A

2011-01-01

The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide variety of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, nongel-based proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of variations on the general methods and technologies for identifying peptides in a given sample. We present a method that can serve as a "baseline" for proteomic studies of fungi.
Differential principal component analysis of ChIP-seq.

PubMed

Ji, Hongkai; Li, Xia; Wang, Qian-fei; Ning, Yang

2013-04-23

We propose differential principal component analysis (dPCA) for analyzing multiple ChIP-sequencing datasets to identify differential protein-DNA interactions between two biological conditions. dPCA integrates unsupervised pattern discovery, dimension reduction, and statistical inference into a single framework. It uses a small number of principal components to summarize concisely the major multiprotein synergistic differential patterns between the two conditions. For each pattern, it detects and prioritizes differential genomic loci by comparing the between-condition differences with the within-condition variation among replicate samples. dPCA provides a unique tool for efficiently analyzing large amounts of ChIP-sequencing data to study dynamic changes of gene regulation across different biological conditions. We demonstrate this approach through analyses of differential chromatin patterns at transcription factor binding sites and promoters as well as allele-specific protein-DNA interactions.
How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis.

PubMed

Tian, Pengfei; Best, Robert B

2017-10-17

Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance. Published by Elsevier Inc.
P41IDENTIFICATION OF GLIOMA SPECIFIC APTAMER TARGETS

PubMed Central

Arora, Mohit; Alder, Jane; Lawrence, Clare; Davis, Charles; Dawson, Tim; Hall, Greg; Shaw, Lisa

2014-01-01

INTRODUCTION: Aptamers are in vitro generated DNA and RNA sequences which are randomly created as a library, with multiple permutations and combinations. These are then exposed to the target structure against which we want an aptamer ‘selected’ using Sequential Enumeration of Ligands by Exponential enrichment (SELEX). METHOD: Commercially available glioma and glial cell lines and in-house generated primary glioma cultures were used. Modified aptamers based on published sequences against glioma cell lines and newly generated sequences were used in the project to identify their binding targets. Cy3 or biotin- conjugated aptamers were incubated with live glioma cell cultures and imaged using confocal or light microscopy.To determine the target ligand, aptamers were then reacted with glial cell lysate and subjected to precipitation using streptavidin agarose beads and SDS polyacrylamide electrophoresis. Proteins were analysed by mass spectroscopy. RESULTS: Known and unknown aptamer protein ligands were co-precipitated. Ku70, Ku80 were precipitated along with nucleolin and related proteins. CONCLUSION: The aptamer has shown preferential binding to glioma cells and could act as a delivery system for therapeutic payloads. The aptamer targets Ku70 and Ku80, which are known to be over expressed in other forms of cancer but their role in gliomagenesis has not been fully elucidated. Other novel proteins have also been identified. Thus the aptamer co-precipitation technique has identified potential glioma biomarkers that may be of clinical significance.
A specific indel marker for the Philippines Schistosoma japonicum revealed by analysis of mitochondrial genome sequences.

PubMed

Li, Juan; Chen, Fen; Sugiyama, Hiromu; Blair, David; Lin, Rui-Qing; Zhu, Xing-Quan

2015-07-01

In the present study, near-complete mitochondrial (mt) genome sequences for Schistosoma japonicum from different regions in the Philippines and Japan were amplified and sequenced. Comparisons among S. japonicum from the Philippines, Japan, and China revealed a geographically based length difference in mt genomes, but the mt genomic organization and gene arrangement were the same. Sequence differences among samples from the Philippines and all samples from the three endemic areas were 0.57-2.12 and 0.76-3.85 %, respectively. The most variable part of the mt genome was the non-coding region. In the coding portion of the genome, protein-coding genes varied more than rRNA genes and tRNAs. The near-complete mt genome sequences for Philippine specimens were identical in length (14,091 bp) which was 4 bp longer than those of S. japonicum samples from Japan and China. This indel provides a unique genetic marker for S. japonicum samples from the Philippines. Phylogenetic analyses based on the concatenated amino acids of 12 protein-coding genes showed that samples of S. japonicum clustered according to their geographical origins. The identified mitochondrial indel marker will be useful for tracing the source of S. japonicum infection in humans and animals in Southeast Asia.
Interpreting the biological relevance of bioinformatic analyses with T-DNA sequence for protein allergenicity.

PubMed

Harper, B; McClain, S; Ganko, E W

2012-08-01

Global regulatory agencies require bioinformatic sequence analysis as part of their safety evaluation for transgenic crops. Analysis typically focuses on encoded proteins and adjacent endogenous flanking sequences. Recently, regulatory expectations have expanded to include all reading frames of the inserted DNA. The intent is to provide biologically relevant results that can be used in the overall assessment of safety. This paper evaluates the relevance of assessing the allergenic potential of all DNA reading frames found in common food genes using methods considered for the analysis of T-DNA sequences used in transgenic crops. FASTA and BLASTX algorithms were used to compare genes from maize, rice, soybean, cucumber, melon, watermelon, and tomato using international regulatory guidance. Results show that BLASTX for maize yielded 7254 alignments that exceeded allergen similarity thresholds and 210,772 alignments that matched eight or more consecutive amino acids with an allergen; other crops produced similar results. This analysis suggests that each nontransgenic crop has a much greater potential for allergenic risk than what has been observed clinically. We demonstrate that a meaningful safety assessment is unlikely to be provided by using methods with inherently high frequencies of false positive alignments when broadly applied to all reading frames of DNA sequence. Copyright © 2012 Elsevier Inc. All rights reserved.
The sequence of camelpox virus shows it is most closely related to variola virus, the cause of smallpox.

PubMed

Gubser, Caroline; Smith, Geoffrey L

2002-04-01

Camelpox virus (CMPV) and variola virus (VAR) are orthopoxviruses (OPVs) that share several biological features and cause high mortality and morbidity in their single host species. The sequence of a virulent CMPV strain was determined; it is 202182 bp long, with inverted terminal repeats (ITRs) of 6045 bp and has 206 predicted open reading frames (ORFs). As for other poxviruses, the genes are tightly packed with little non-coding sequence. Most genes within 25 kb of each terminus are transcribed outwards towards the terminus, whereas genes within the centre of the genome are transcribed from either DNA strand. The central region of the genome contains genes that are highly conserved in other OPVs and 87 of these are conserved in all sequenced chordopoxviruses. In contrast, genes towards either terminus are more variable and encode proteins involved in host range, virulence or immunomodulation. In some cases, these are broken versions of genes found in other OPVs. The relationship of CMPV to other OPVs was analysed by comparisons of DNA and predicted protein sequences, repeats within the ITRs and arrangement of ORFs within the terminal regions. Each comparison gave the same conclusion: CMPV is the closest known virus to variola virus, the cause of smallpox.
Integration of Structural Dynamics and Molecular Evolution via Protein Interaction Networks: A New Era in Genomic Medicine

PubMed Central

Kumar, Avishek; Butler, Brandon M.; Kumar, Sudhir; Ozkan, S. Banu

2016-01-01

Summary Sequencing technologies are revealing many new non-synonymous single nucleotide variants (nsSNVs) in each personal exome. To assess their functional impacts, comparative genomics is frequently employed to predict if they are benign or not. However, evolutionary analysis alone is insufficient, because it misdiagnoses many disease-associated nsSNVs, such as those at positions involved in protein interfaces, and because evolutionary predictions do not provide mechanistic insights into functional change or loss. Structural analyses can aid in overcoming both of these problems by incorporating conformational dynamics and allostery in nSNV diagnosis. Finally, protein-protein interaction networks using systems-level methodologies shed light onto disease etiology and pathogenesis. Bridging these network approaches with structurally resolved protein interactions and dynamics will advance genomic medicine. PMID:26684487
Reversible Heat-Induced Inactivation of Chimeric β-Glucuronidase in Transgenic Plants1

PubMed Central

Almoguera, Concepción; Rojas, Anabel; Jordano, Juan

2002-01-01

We compared the expression patterns in transgenic tobacco (Nicotiana tabacum) of two chimeric genes: a translational fusion to β-glucuronidase (GUS) and a transcriptional fusion, both with the same promoter and 5′-flanking sequences of Ha hsp17.7 G4, a small heat shock protein (sHSP) gene from sunflower (Helianthus annuus). We found that immediately after heat shock, the induced expression from the two fusions in seedlings was similar, considering chimeric mRNA or GUS protein accumulation. Surprisingly, we discovered that the chimeric GUS protein encoded by the translational fusion was mostly inactive in such conditions. We also found that this inactivation was fully reversible. Thus, after returning to control temperature, the GUS activity was fully recovered without substantial changes in GUS protein accumulation. In contrast, we did not find differences in the in vitro heat inactivation of the respective GUS proteins. Insolubilization of the chimeric GUS protein correlated with its inactivation, as indicated by immunoprecipitation analyses. The inclusion in another chimeric gene of the 21 amino-terminal amino acids from a different sHSP lead to a comparable reversible inactivation. That effect not only illustrates unexpected post-translational problems, but may also point to sequences involved in interactions specific to sHSPs and in vivo heat stress conditions. PMID:12011363
Characterization of aromatic residue-controlled protein retention in the endoplasmic reticulum of Saccharomyces cerevisiae.

PubMed

Mei, Meng; Zhai, Chao; Li, Xinzhi; Zhou, Yu; Peng, Wenfang; Ma, Lixin; Wang, Qinhong; Iverson, Brent L; Zhang, Guimin; Yi, Li

2017-12-15

An endoplasmic reticulum (ER) retention sequence (ERS) is a characteristic short sequence that mediates protein retention in the ER of eukaryotic cells. However, little is known about the detailed molecular mechanism involved in ERS-mediated protein ER retention. Using a new surface display-based fluorescence technique that effectively quantifies ERS-promoted protein ER retention within Saccharomyces cerevisiae cells, we performed comprehensive ERS analyses. We found that the length, type of amino acid residue, and additional residues at positions -5 and -6 of the C-terminal HDEL motif all determined the retention of ERS in the yeast ER. Moreover, the biochemical results guided by structure simulation revealed that aromatic residues (Phe-54, Trp-56, and other aromatic residues facing the ER lumen) in both the ERS (at positions -6 and -4) and its receptor, Erd2, jointly determined their interaction with each other. Our studies also revealed that this aromatic residue interaction might lead to the discriminative recognition of HDEL or KDEL as ERS in yeast or human cells, respectively. Our findings expand the understanding of ERS-mediated residence of proteins in the ER and may guide future research into protein folding, modification, and translocation affected by ER retention. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Phosphorylation of Ser136 is critical for potent bone sialoprotein-mediated nucleation of hydroxyapatite crystals.

PubMed

Baht, Gurpreet S; O'Young, Jason; Borovina, Antonia; Chen, Hong; Tye, Coralee E; Karttunen, Mikko; Lajoie, Gilles A; Hunter, Graeme K; Goldberg, Harvey A

2010-05-27

Acidic phosphoproteins of mineralized tissues such as bone and dentin are believed to play important roles in HA (hydroxyapatite) nucleation and growth. BSP (bone sialoprotein) is the most potent known nucleator of HA, an activity that is thought to be dependent on phosphorylation of the protein. The present study identifies the role phosphate groups play in mineral formation. Recombinant BSP and peptides corresponding to residues 1-100 and 133-205 of the rat sequence were phosphorylated with CK2 (protein kinase CK2). Phosphorylation increased the nucleating activity of BSP and BSP-(133-205), but not BSP-(1-100). MS analysis revealed that the major site phosphorylated within BSP-(133-205) was Ser136, a site adjacent to the series of contiguous glutamate residues previously implicated in HA nucleation. The critical role of phosphorylated Ser136 in HA nucleation was confirmed by site-directed mutagenesis and functional analyses. Furthermore, peptides corresponding to the 133-148 sequence of rat BSP were synthesized with or without a phosphate group on Ser136. As expected, the phosphopeptide was a more potent nucleator. The mechanism of nucleation was investigated using molecular-dynamics simulations analysing BSP-(133-148) interacting with the {100} crystal face of HA. Both phosphorylated and non-phosphorylated sequences adsorbed to HA in extended conformations with alternating residues in contact with and facing away from the crystal face. However, this alternating-residue pattern was more pronounced when Ser136 was phosphorylated. These studies demonstrate a critical role for Ser136 phosphorylation in BSP-mediated HA nucleation and identify a unique mode of interaction between the nucleating site of the protein and the {100} face of HA.
[Genetic Diversity and Evolution of the M Gene of Human Influenza A Viruses from 2009 to 2013 in Hangzhou, China].

PubMed

Shao, Tiejuan; Li, Jun; Pu, Xiaoying; Yu, Xinfen; Kou, Yu; Zhou, Yinyan; Qian, Xin

2015-03-01

We investigated the genetic diversity and evolution of the M gene of human influenza A viruses in Hangzhou (Zhejiang province, China) from 2009 to 2013, including subtypes of A(H1N1) pdm09 strains and seasonal A(H3N2) strains. Subtypes of analyzed viruses were identified by cell culture and real-time reverse transcription-polymerase chain reaction, followed by cloning, sequencing and phylogenetic analyses of the M gene. Assessment of 5675 throat swabs revealed a positive rate for the influenza virus of 20.46%, and 827 cases were diagnosed as. infections due to influenza A viruses. Seventy-six influenza-A strains were selected randomly from nine stages during six phases of a virus epidemic. Sequences of the M gene showed high homology among six epidemics with identities of amino-acid sequences of 98.98-100%. All strains contained the adamantine-resistant mutation S31N in its M2 protein. Two of the A(H1N1)pdm09 strains had double mutants of V27A/S31N or V271/S31N. One of the seasonal A(H3N2) viruses had another form of double-mutant R45H/S31N. Evolutionary rate of the M gene was much lower than that of the HA gene and NA gene. Compared with A(H3N2) strains, higher positive pressure on the M1 and M2 proteins of A(H1N1) pdm09 viruses was observed. Separate analyses of M1 and M2 proteins revealed very different selection pressures. Knowledge of the genetic diversity and evolution of the M gene of human influenza-A viruses will be valuable for the control and prevention of diseases.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.