Zhou, Carol L Ecale
2015-01-01
In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.
The Purine Bias of Coding Sequences is Determined by Physicochemical Constraints on Proteins.
Ponce de Leon, Miguel; de Miranda, Antonio Basilio; Alvarez-Valin, Fernando; Carels, Nicolas
2014-01-01
For this report, we analyzed protein secondary structures in relation to the statistics of three nucleotide codon positions. The purpose of this investigation was to find which properties of the ribosome, tRNA or protein level, could explain the purine bias (Rrr) as it is observed in coding DNA. We found that the Rrr pattern is the consequence of a regularity (the codon structure) resulting from physicochemical constraints on proteins and thermodynamic constraints on ribosomal machinery. The physicochemical constraints on proteins mainly come from the hydropathy and molecular weight (MW) of secondary structures as well as the energy cost of amino acid synthesis. These constraints appear through a network of statistical correlations, such as (i) the cost of amino acid synthesis, which is in favor of a higher level of guanine in the first codon position, (ii) the constructive contribution of hydropathy alternation in proteins, (iii) the spatial organization of secondary structure in proteins according to solvent accessibility, (iv) the spatial organization of secondary structure according to amino acid hydropathy, (v) the statistical correlation of MW with protein secondary structures and their overall hydropathy, (vi) the statistical correlation of thymine in the second codon position with hydropathy and the energy cost of amino acid synthesis, and (vii) the statistical correlation of adenine in the second codon position with amino acid complexity and the MW of secondary protein structures. Amino acid physicochemical properties and functional constraints on proteins constitute a code that is translated into a purine bias within the coding DNA via tRNAs. In that sense, the Rrr pattern within coding DNA is the effect of information transfer on nucleotide composition from protein to DNA by selection according to the codon positions. Thus, coding DNA structure and ribosomal machinery co-evolved to minimize the energy cost of protein coding given the functional constraints on proteins.
The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition
NASA Astrophysics Data System (ADS)
Štambuk, Nikola
The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.
Caetano-Anollés, Gustavo; Wang, Minglei; Caetano-Anollés, Derek
2013-01-01
The genetic code shapes the genetic repository. Its origin has puzzled molecular scientists for over half a century and remains a long-standing mystery. Here we show that the origin of the genetic code is tightly coupled to the history of aminoacyl-tRNA synthetase enzymes and their interactions with tRNA. A timeline of evolutionary appearance of protein domain families derived from a structural census in hundreds of genomes reveals the early emergence of the ‘operational’ RNA code and the late implementation of the standard genetic code. The emergence of codon specificities and amino acid charging involved tight coevolution of aminoacyl-tRNA synthetases and tRNA structures as well as episodes of structural recruitment. Remarkably, amino acid and dipeptide compositions of single-domain proteins appearing before the standard code suggest archaic synthetases with structures homologous to catalytic domains of tyrosyl-tRNA and seryl-tRNA synthetases were capable of peptide bond formation and aminoacylation. Results reveal that genetics arose through coevolutionary interactions between polypeptides and nucleic acid cofactors as an exacting mechanism that favored flexibility and folding of the emergent proteins. These enhancements of phenotypic robustness were likely internalized into the emerging genetic system with the early rise of modern protein structure. PMID:23991065
Computer analysis of protein functional sites projection on exon structure of genes in Metazoa.
Medvedeva, Irina V; Demenkov, Pavel S; Ivanisenko, Vladimir A
2015-01-01
Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residues that are distantly located from each other in the amino acid sequence. They are highly conserved within their functional group and vary significantly in structure between such groups. According to this facts analysis of the general properties of the structural organization of the functional sites at the protein level and, at the level of exon-intron structure of the coding gene is still an actual problem. One approach to this analysis is the projection of amino acid residue positions of the functional sites along with the exon boundaries to the gene structure. In this paper, we examined the discontinuity of the functional sites in the exon-intron structure of genes and the distribution of lengths and phases of the functional site encoding exons in vertebrate genes. We have shown that the DNA fragments coding the functional sites were in the same exons, or in close exons. The observed tendency to cluster the exons that code functional sites which could be considered as the unit of protein evolution. We studied the characteristics of the structure of the exon boundaries that code, and do not code, functional sites in 11 Metazoa species. This is accompanied by a reduced frequency of intercodon gaps (phase 0) in exons encoding the amino acid residue functional site, which may be evidence of the existence of evolutionary limitations to the exon shuffling. These results characterize the features of the coding exon-intron structure that affect the functionality of the encoded protein and allow a better understanding of the emergence of biological diversity.
Computer analysis of protein functional sites projection on exon structure of genes in Metazoa
2015-01-01
Background Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residues that are distantly located from each other in the amino acid sequence. They are highly conserved within their functional group and vary significantly in structure between such groups. According to this facts analysis of the general properties of the structural organization of the functional sites at the protein level and, at the level of exon-intron structure of the coding gene is still an actual problem. Results One approach to this analysis is the projection of amino acid residue positions of the functional sites along with the exon boundaries to the gene structure. In this paper, we examined the discontinuity of the functional sites in the exon-intron structure of genes and the distribution of lengths and phases of the functional site encoding exons in vertebrate genes. We have shown that the DNA fragments coding the functional sites were in the same exons, or in close exons. The observed tendency to cluster the exons that code functional sites which could be considered as the unit of protein evolution. We studied the characteristics of the structure of the exon boundaries that code, and do not code, functional sites in 11 Metazoa species. This is accompanied by a reduced frequency of intercodon gaps (phase 0) in exons encoding the amino acid residue functional site, which may be evidence of the existence of evolutionary limitations to the exon shuffling. Conclusions These results characterize the features of the coding exon-intron structure that affect the functionality of the encoded protein and allow a better understanding of the emergence of biological diversity. PMID:26693737
Brunak, S; Engelbrecht, J
1996-06-01
A direct comparison of experimentally determined protein structures and their corresponding protein coding mRNA sequences has been performed. We examine whether real world data support the hypothesis that clusters of rare codons correlate with the location of structural units in the resulting protein. The degeneracy of the genetic code allows for a biased selection of codons which may control the translational rate of the ribosome, and may thus in vivo have a catalyzing effect on the folding of the polypeptide chain. A complete search for GenBank nucleotide sequences coding for structural entries in the Brookhaven Protein Data Bank produced 719 protein chains with matching mRNA sequence, amino acid sequence, and secondary structure assignment. By neural network analysis, we found strong signals in mRNA sequence regions surrounding helices and sheets. These signals do not originate from the clustering of rare codons, but from the similarity of codons coding for very abundant amino acid residues at the N- and C-termini of helices and sheets. No correlation between the positioning of rare codons and the location of structural units was found. The mRNA signals were also compared with conserved nucleotide features of 16S-like ribosomal RNA sequences and related to mechanisms for maintaining the correct reading frame by the ribosome.
Hidden Structural Codes in Protein Intrinsic Disorder.
Borkosky, Silvia S; Camporeale, Gabriela; Chemes, Lucía B; Risso, Marikena; Noval, María Gabriela; Sánchez, Ignacio E; Alonso, Leonardo G; de Prat Gay, Gonzalo
2017-10-17
Intrinsic disorder is a major structural category in biology, accounting for more than 30% of coding regions across the domains of life, yet consists of conformational ensembles in equilibrium, a major challenge in protein chemistry. Anciently evolved papillomavirus genomes constitute an unparalleled case for sequence to structure-function correlation in cases in which there are no folded structures. E7, the major transforming oncoprotein of human papillomaviruses, is a paradigmatic example among the intrinsically disordered proteins. Analysis of a large number of sequences of the same viral protein allowed for the identification of a handful of residues with absolute conservation, scattered along the sequence of its N-terminal intrinsically disordered domain, which intriguingly are mostly leucine residues. Mutation of these led to a pronounced increase in both α-helix and β-sheet structural content, reflected by drastic effects on equilibrium propensities and oligomerization kinetics, and uncovers the existence of local structural elements that oppose canonical folding. These folding relays suggest the existence of yet undefined hidden structural codes behind intrinsic disorder in this model protein. Thus, evolution pinpoints conformational hot spots that could have not been identified by direct experimental methods for analyzing or perturbing the equilibrium of an intrinsically disordered protein ensemble.
Informational structure of genetic sequences and nature of gene splicing
NASA Astrophysics Data System (ADS)
Trifonov, E. N.
1991-10-01
Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.
Sounds of silence: synonymous nucleotides as a key to biological regulation and complexity
Shabalina, Svetlana A.; Spiridonov, Nikolay A.; Kashina, Anna
2013-01-01
Messenger RNA is a key component of an intricate regulatory network of its own. It accommodates numerous nucleotide signals that overlap protein coding sequences and are responsible for multiple levels of regulation and generation of biological complexity. A wealth of structural and regulatory information, which mRNA carries in addition to the encoded amino acid sequence, raises the question of how these signals and overlapping codes are delineated along non-synonymous and synonymous positions in protein coding regions, especially in eukaryotes. Silent or synonymous codon positions, which do not determine amino acid sequences of the encoded proteins, define mRNA secondary structure and stability and affect the rate of translation, folding and post-translational modifications of nascent polypeptides. The RNA level selection is acting on synonymous sites in both prokaryotes and eukaryotes and is more common than previously thought. Selection pressure on the coding gene regions follows three-nucleotide periodic pattern of nucleotide base-pairing in mRNA, which is imposed by the genetic code. Synonymous positions of the coding regions have a higher level of hybridization potential relative to non-synonymous positions, and are multifunctional in their regulatory and structural roles. Recent experimental evidence and analysis of mRNA structure and interspecies conservation suggest that there is an evolutionary tradeoff between selective pressure acting at the RNA and protein levels. Here we provide a comprehensive overview of the studies that define the role of silent positions in regulating RNA structure and processing that exert downstream effects on proteins and their functions. PMID:23293005
USDA-ARS?s Scientific Manuscript database
The Rift Valley fever virus (RVFV) encodes structural proteins, nucleoprotein (N), N-terminus glycoprotein (Gn), C-terminus glycoprotein (Gc) and L protein, 78-kDa and non-structural proteins NSm and NSs. Using the baculovirus system we expressed the full-length coding sequence of N, NSs, NSm, Gc an...
Structure-related statistical singularities along protein sequences: a correlation study.
Colafranceschi, Mauro; Colosimo, Alfredo; Zbilut, Joseph P; Uversky, Vladimir N; Giuliani, Alessandro
2005-01-01
A data set composed of 1141 proteins representative of all eukaryotic protein sequences in the Swiss-Prot Protein Knowledge base was coded by seven physicochemical properties of amino acid residues. The resulting numerical profiles were submitted to correlation analysis after the application of a linear (simple mean) and a nonlinear (Recurrence Quantification Analysis, RQA) filter. The main RQA variables, Recurrence and Determinism, were subsequently analyzed by Principal Component Analysis. The RQA descriptors showed that (i) within protein sequences is embedded specific information neither present in the codes nor in the amino acid composition and (ii) the most sensitive code for detecting ordered recurrent (deterministic) patterns of residues in protein sequences is the Miyazawa-Jernigan hydrophobicity scale. The most deterministic proteins in terms of autocorrelation properties of primary structures were found (i) to be involved in protein-protein and protein-DNA interactions and (ii) to display a significantly higher proportion of structural disorder with respect to the average data set. A study of the scaling behavior of the average determinism with the setting parameters of RQA (embedding dimension and radius) allows for the identification of patterns of minimal length (six residues) as possible markers of zones specifically prone to inter- and intramolecular interactions.
Protein functional features are reflected in the patterns of mRNA translation speed.
López, Daniel; Pazos, Florencio
2015-07-09
The degeneracy of the genetic code makes it possible for the same amino acid string to be coded by different messenger RNA (mRNA) sequences. These "synonymous mRNAs" may differ largely in a number of aspects related to their overall translational efficiency, such as secondary structure content and availability of the encoded transfer RNAs (tRNAs). Consequently, they may render different yields of the translated polypeptides. These mRNA features related to translation efficiency are also playing a role locally, resulting in a non-uniform translation speed along the mRNA, which has been previously related to some protein structural features and also used to explain some dramatic effects of "silent" single-nucleotide-polymorphisms (SNPs). In this work we perform the first large scale analysis of the relationship between three experimental proxies of mRNA local translation efficiency and the local features of the corresponding encoded proteins. We found that a number of protein functional and structural features are reflected in the patterns of ribosome occupancy, secondary structure and tRNA availability along the mRNA. One or more of these proxies of translation speed have distinctive patterns around the mRNA regions coding for certain protein local features. In some cases the three patterns follow a similar trend. We also show specific examples where these patterns of translation speed point to the protein's important structural and functional features. This support the idea that the genome not only codes the protein functional features as sequences of amino acids, but also as subtle patterns of mRNA properties which, probably through local effects on the translation speed, have some consequence on the final polypeptide. These results open the possibility of predicting a protein's functional regions based on a single genomic sequence, and have implications for heterologous protein expression and fine-tuning protein function.
3D RNA and functional interactions from evolutionary couplings
Weinreb, Caleb; Riesselman, Adam; Ingraham, John B.; Gross, Torsten; Sander, Chris; Marks, Debora S.
2016-01-01
Summary Non-coding RNAs are ubiquitous, but the discovery of new RNA gene sequences far outpaces research on their structure and functional interactions. We mine the evolutionary sequence record to derive precise information about function and structure of RNAs and RNA-protein complexes. As in protein structure prediction, we use maximum entropy global probability models of sequence co-variation to infer evolutionarily constrained nucleotide-nucleotide interactions within RNA molecules, and nucleotide-amino acid interactions in RNA-protein complexes. The predicted contacts allow all-atom blinded 3D structure prediction at good accuracy for several known RNA structures and RNA-protein complexes. For unknown structures, we predict contacts in 160 non-coding RNA families. Beyond 3D structure prediction, evolutionary couplings help identify important functional interactions, e.g., at switch points in riboswitches and at a complex nucleation site in HIV. Aided by accelerating sequence accumulation, evolutionary coupling analysis can accelerate the discovery of functional interactions and 3D structures involving RNA. PMID:27087444
Necessities for the First Life to Emerge
NASA Astrophysics Data System (ADS)
Ikehara, K.
2017-07-01
For the first life to emerge, the first protein must be produced by random joining of amino acids in protein 0th-order structure. In addition, the first genetic code and the first double-stranded gene must encode the protein 0th-order structure.
Beyond terrestrial biology: charting the chemical universe of α-amino acid structures.
Meringer, Markus; Cleaves, H James; Freeland, Stephen J
2013-11-25
α-Amino acids are fundamental to biochemistry as the monomeric building blocks with which cells construct proteins according to genetic instructions. However, the 20 amino acids of the standard genetic code represent a tiny fraction of the number of α-amino acid chemical structures that could plausibly play such a role, both from the perspective of natural processes by which life emerged and evolved, and from the perspective of human-engineered genetically coded proteins. Until now, efforts to describe the structures comprising this broader set, or even estimate their number, have been hampered by the complex combinatorial properties of organic molecules. Here, we use computer software based on graph theory and constructive combinatorics in order to conduct an efficient and exhaustive search of the chemical structures implied by two careful and precise definitions of the α-amino acids relevant to coded biological proteins. Our results include two virtual libraries of α-amino acid structures corresponding to these different approaches, comprising 121 044 and 3 846 structures, respectively, and suggest a simple approach to exploring much larger, as yet uncomputed, libraries of interest.
Web3DMol: interactive protein structure visualization based on WebGL.
Shi, Maoxiang; Gao, Juntao; Zhang, Michael Q
2017-07-03
A growing number of web-based databases and tools for protein research are being developed. There is now a widespread need for visualization tools to present the three-dimensional (3D) structure of proteins in web browsers. Here, we introduce our 3D modeling program-Web3DMol-a web application focusing on protein structure visualization in modern web browsers. Users submit a PDB identification code or select a PDB archive from their local disk, and Web3DMol will display and allow interactive manipulation of the 3D structure. Featured functions, such as sequence plot, fragment segmentation, measure tool and meta-information display, are offered for users to gain a better understanding of protein structure. Easy-to-use APIs are available for developers to reuse and extend Web3DMol. Web3DMol can be freely accessed at http://web3dmol.duapp.com/, and the source code is distributed under the MIT license. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Structural architecture of the human long non-coding RNA, steroid receptor RNA activator
Novikova, Irina V.; Hennelly, Scott P.; Sanbonmatsu, Karissa Y.
2012-01-01
While functional roles of several long non-coding RNAs (lncRNAs) have been determined, the molecular mechanisms are not well understood. Here, we report the first experimentally derived secondary structure of a human lncRNA, the steroid receptor RNA activator (SRA), 0.87 kB in size. The SRA RNA is a non-coding RNA that coactivates several human sex hormone receptors and is strongly associated with breast cancer. Coding isoforms of SRA are also expressed to produce proteins, making the SRA gene a unique bifunctional system. Our experimental findings (SHAPE, in-line, DMS and RNase V1 probing) reveal that this lncRNA has a complex structural organization, consisting of four domains, with a variety of secondary structure elements. We examine the coevolution of the SRA gene at the RNA structure and protein structure levels using comparative sequence analysis across vertebrates. Rapid evolutionary stabilization of RNA structure, combined with frame-disrupting mutations in conserved regions, suggests that evolutionary pressure preserves the RNA structural core rather than its translational product. We perform similar experiments on alternatively spliced SRA isoforms to assess their structural features. PMID:22362738
Expanding the proteome: disordered and alternatively-folded proteins
Dyson, H. Jane
2011-01-01
Proteins provide much of the scaffolding for life, as well as undertaking a variety of essential catalytic reactions. These characteristic functions have led us to presuppose that proteins are in general functional only when well-structured and correctly folded. As we begin to explore the repertoire of possible protein sequences inherent in the human and other genomes, two stark facts that belie this supposition become clear: firstly, the number of apparent open reading frames in the human genome is significantly smaller than appears to be necessary to code for all of the diverse proteins in higher organisms, and secondly that a significant proportion of the protein sequences that would be coded by the genome would not be expected to form stable three-dimensional structures. Clearly the genome must include coding for a multitude of alternative forms of proteins, some of which may be partly or fully disordered or incompletely structured in their functional states. At the same time as this likelihood was recognized, experimental studies also began to uncover examples of important protein molecules and domains that were incompletely structured or completely disordered in solution, yet remained perfectly functional. In the ensuing years, we have seen an explosion of experimental and genome-annotation studies that have mapped the extent of the intrinsic disorder phenomenon and explored the possible biological rationales for its widespread occurrence. Answers to the question “why would a particular domain need to be unstructured?” are as varied as the systems where such domains are found. This review provides a survey of recent new directions in this field, and includes an evaluation of the role not only of intrinsically disordered proteins but of partially structured and highly dynamic members of the disorder-order continuum. PMID:21729349
Bhasi, Ashwini; Philip, Philge; Manikandan, Vinu; Senapathy, Periannan
2009-01-01
We have developed ExDom, a unique database for the comparative analysis of the exon–intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon–intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon–intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon–intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/. PMID:18984624
Zucchelli, Silvia; Patrucco, Laura; Persichetti, Francesca; Gustincich, Stefano; Cotella, Diego
2016-01-01
Mammalian cells are an indispensable tool for the production of recombinant proteins in contexts where function depends on post-translational modifications. Among them, Chinese Hamster Ovary (CHO) cells are the primary factories for the production of therapeutic proteins, including monoclonal antibodies (MAbs). To improve expression and stability, several methodologies have been adopted, including methods based on media formulation, selective pressure and cell- or vector engineering. This review presents current approaches aimed at improving mammalian cell factories that are based on the enhancement of translation. Among well-established techniques (codon optimization and improvement of mRNA secondary structure), we describe SINEUPs, a family of antisense long non-coding RNAs that are able to increase translation of partially overlapping protein-coding mRNAs. By exploiting their modular structure, SINEUP molecules can be designed to target virtually any mRNA of interest, and thus to increase the production of secreted proteins. Thus, synthetic SINEUPs represent a new versatile tool to improve the production of secreted proteins in biomanufacturing processes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bae, Euiyoung; Bingman, Craig A.; Aceti, David J.
LOC79017 (MW 21.0 kDa, residues 1-188) was annotated as a hypothetical protein encoded by Homo sapiens chromosome 7 open reading frame 24. It was selected as a target by the Center for Eukaryotic Structural Genomics (CESG) because it did not share more than 30% sequence identity with any protein for which the three-dimensional structure is known. The biological function of the protein has not been established yet. Parts of LOC79017 were identified as members of uncharacterized Pfam families (residues 1-95 as PB006073 and residues 104-180 as PB031696). BLAST searches revealed homologues of LOC79017 in many eukaryotes, but none of themmore » have been functionally characterized. Here, we report the crystal structure of H. sapiens protein LOC79017 (UniGene code Hs.530024, UniProt code O75223, CESG target number go.35223).« less
Principles of protein folding--a perspective from simple exact models.
Dill, K. A.; Bromberg, S.; Yue, K.; Fiebig, K. M.; Yee, D. P.; Thomas, P. D.; Chan, H. S.
1995-01-01
General principles of protein structure, stability, and folding kinetics have recently been explored in computer simulations of simple exact lattice models. These models represent protein chains at a rudimentary level, but they involve few parameters, approximations, or implicit biases, and they allow complete explorations of conformational and sequence spaces. Such simulations have resulted in testable predictions that are sometimes unanticipated: The folding code is mainly binary and delocalized throughout the amino acid sequence. The secondary and tertiary structures of a protein are specified mainly by the sequence of polar and nonpolar monomers. More specific interactions may refine the structure, rather than dominate the folding code. Simple exact models can account for the properties that characterize protein folding: two-state cooperativity, secondary and tertiary structures, and multistage folding kinetics--fast hydrophobic collapse followed by slower annealing. These studies suggest the possibility of creating "foldable" chain molecules other than proteins. The encoding of a unique compact chain conformation may not require amino acids; it may require only the ability to synthesize specific monomer sequences in which at least one monomer type is solvent-averse. PMID:7613459
Bogdanov, Yuri F; Dadashev, Sergei Y; Grishaeva, Tatiana M
2003-01-01
Evolutionarily distant organisms have not only orthologs, but also nonhomologous proteins that build functionally similar subcellular structures. For instance, this is true with protein components of the synaptonemal complex (SC), a universal ultrastructure that ensures the successful pairing and recombination of homologous chromosomes during meiosis. We aimed at developing a method to search databases for genes that code for such nonhomologous but functionally analogous proteins. Advantage was taken of the ultrastructural parameters of SC and the conformation of SC proteins responsible for these. Proteins involved in SC central space are known to be similar in secondary structure. Using published data, we found a highly significant correlation between the width of the SC central space and the length of rod-shaped central domain of mammalian and yeast intermediate proteins forming transversal filaments in the SC central space. Basing on this, we suggested a method for searching genome databases of distant organisms for genes whose virtual proteins meet the above correlation requirement. Our recent finding of the Drosophila melanogaster CG17604 gene coding for synaptonemal complex transversal filament protein received experimental support from another lab. With the same strategy, we showed that the Arabidopsis thaliana and Caenorhabditis elegans genomes contain unique genes coding for such proteins.
Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil
2015-01-01
The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073
HippDB: a database of readily targeted helical protein-protein interactions.
Bergey, Christina M; Watkins, Andrew M; Arora, Paramjit S
2013-11-01
HippDB catalogs every protein-protein interaction whose structure is available in the Protein Data Bank and which exhibits one or more helices at the interface. The Web site accepts queries on variables such as helix length and sequence, and it provides computational alanine scanning and change in solvent-accessible surface area values for every interfacial residue. HippDB is intended to serve as a starting point for structure-based small molecule and peptidomimetic drug development. HippDB is freely available on the web at http://www.nyu.edu/projects/arora/hippdb. The Web site is implemented in PHP, MySQL and Apache. Source code freely available for download at http://code.google.com/p/helidb, implemented in Perl and supported on Linux. arora@nyu.edu.
A CPU benchmark for protein crystallographic refinement.
Bourne, P E; Hendrickson, W A
1990-01-01
The CPU time required to complete a cycle of restrained least-squares refinement of a protein structure from X-ray crystallographic data using the FORTRAN codes PROTIN and PROLSQ are reported for 48 different processors, ranging from single-user workstations to supercomputers. Sequential, vector, VLIW, multiprocessor, and RISC hardware architectures are compared using both a small and a large protein structure. Representative compile times for each hardware type are also given, and the improvement in run-time when coding for a specific hardware architecture considered. The benchmarks involve scalar integer and vector floating point arithmetic and are representative of the calculations performed in many scientific disciplines.
Jenkins, Adam M; Waterhouse, Robert M; Muskavitch, Marc A T
2015-04-23
Long non-coding RNAs (lncRNAs) have been defined as mRNA-like transcripts longer than 200 nucleotides that lack significant protein-coding potential, and many of them constitute scaffolds for ribonucleoprotein complexes with critical roles in epigenetic regulation. Various lncRNAs have been implicated in the modulation of chromatin structure, transcriptional and post-transcriptional gene regulation, and regulation of genomic stability in mammals, Caenorhabditis elegans, and Drosophila melanogaster. The purpose of this study is to identify the lncRNA landscape in the malaria vector An. gambiae and assess the evolutionary conservation of lncRNAs and their secondary structures across the Anopheles genus. Using deep RNA sequencing of multiple Anopheles gambiae life stages, we have identified 2,949 lncRNAs and more than 300 previously unannotated putative protein-coding genes. The lncRNAs exhibit differential expression profiles across life stages and adult genders. We find that across the genus Anopheles, lncRNAs display much lower sequence conservation than protein-coding genes. Additionally, we find that lncRNA secondary structure is highly conserved within the Gambiae complex, but diverges rapidly across the rest of the genus Anopheles. This study offers one of the first lncRNA secondary structure analyses in vector insects. Our description of lncRNAs in An. gambiae offers the most comprehensive genome-wide insights to date into lncRNAs in this vector mosquito, and defines a set of potential targets for the development of vector-based interventions that may further curb the human malaria burden in disease-endemic countries.
Making Ordered DNA and Protein Structures from Computer-Printed Transparency Film Cut-Outs
ERIC Educational Resources Information Center
Jittivadhna, Karnyupha; Ruenwongsa, Pintip; Panijpan, Bhinyo
2009-01-01
Instructions are given for building physical scale models of ordered structures of B-form DNA, protein [alpha]-helix, and parallel and antiparallel protein [beta]-pleated sheets made from colored computer printouts designed for transparency film sheets. Cut-outs from these sheets are easily assembled. Conventional color coding for atoms are used…
Fujisawa, Takatomo; Narikawa, Rei; Okamoto, Shinobu; Ehira, Shigeki; Yoshimura, Hidehisa; Suzuki, Iwane; Masuda, Tatsuru; Mochimaru, Mari; Takaichi, Shinichi; Awai, Koichiro; Sekine, Mitsuo; Horikawa, Hiroshi; Yashiro, Isao; Omata, Seiha; Takarada, Hiromi; Katano, Yoko; Kosugi, Hiroki; Tanikawa, Satoshi; Ohmori, Kazuko; Sato, Naoki; Ikeuchi, Masahiko; Fujita, Nobuyuki; Ohmori, Masayuki
2010-01-01
A filamentous non-N2-fixing cyanobacterium, Arthrospira (Spirulina) platensis, is an important organism for industrial applications and as a food supply. Almost the complete genome of A. platensis NIES-39 was determined in this study. The genome structure of A. platensis is estimated to be a single, circular chromosome of 6.8 Mb, based on optical mapping. Annotation of this 6.7 Mb sequence yielded 6630 protein-coding genes as well as two sets of rRNA genes and 40 tRNA genes. Of the protein-coding genes, 78% are similar to those of other organisms; the remaining 22% are currently unknown. A total 612 kb of the genome comprise group II introns, insertion sequences and some repetitive elements. Group I introns are located in a protein-coding region. Abundant restriction-modification systems were determined. Unique features in the gene composition were noted, particularly in a large number of genes for adenylate cyclase and haemolysin-like Ca2+-binding proteins and in chemotaxis proteins. Filament-specific genes were highlighted by comparative genomic analysis. PMID:20203057
He, Yi; Xiao, Yi; Liwo, Adam; Scheraga, Harold A
2009-10-01
We explored the energy-parameter space of our coarse-grained UNRES force field for large-scale ab initio simulations of protein folding, to obtain good initial approximations for hierarchical optimization of the force field with new virtual-bond-angle bending and side-chain-rotamer potentials which we recently introduced to replace the statistical potentials. 100 sets of energy-term weights were generated randomly, and good sets were selected by carrying out replica-exchange molecular dynamics simulations of two peptides with a minimal alpha-helical and a minimal beta-hairpin fold, respectively: the tryptophan cage (PDB code: 1L2Y) and tryptophan zipper (PDB code: 1LE1). Eight sets of parameters produced native-like structures of these two peptides. These eight sets were tested on two larger proteins: the engrailed homeodomain (PDB code: 1ENH) and FBP WW domain (PDB code: 1E0L); two sets were found to produce native-like conformations of these proteins. These two sets were tested further on a larger set of nine proteins with alpha or alpha + beta structure and found to locate native-like structures of most of them. These results demonstrate that, in addition to finding reasonable initial starting points for optimization, an extensive search of parameter space is a powerful method to produce a transferable force field. Copyright 2009 Wiley Periodicals, Inc.
D'Onofrio, Giuseppe; Ghosh, Tapash Chandra
2005-01-17
Fluctuations and increments of both C(3) and G(3) levels along the human coding sequences were investigated comparing two sets of Xenopus/human orthologous genes. The first set of genes shows minor differences of the GC(3) levels, the second shows considerable increments of the GC(3) levels in the human genes. In both data sets, the fluctuations of C(3) and G(3) levels along the coding sequences correlated with the secondary structures of the encoded proteins. The human genes that underwent the compositional transition showed a different increment of the C(3) and G(3) levels within and among the structural units of the proteins. The relative synonymous codon usage (RSCU) of several amino acids were also affected during the compositional transition, showing that there exists a correlation between RSCU and protein secondary structures in human genes. The importance of natural selection for the formation of isochore organization of the human genome has been discussed on the basis of these results.
RBind: computational network method to predict RNA binding sites.
Wang, Kaili; Jian, Yiren; Wang, Huiwen; Zeng, Chen; Zhao, Yunjie
2018-04-26
Non-coding RNA molecules play essential roles by interacting with other molecules to perform various biological functions. However, it is difficult to determine RNA structures due to their flexibility. At present, the number of experimentally solved RNA-ligand and RNA-protein structures is still insufficient. Therefore, binding sites prediction of non-coding RNA is required to understand their functions. Current RNA binding site prediction algorithms produce many false positive nucleotides that are distance away from the binding sites. Here, we present a network approach, RBind, to predict the RNA binding sites. We benchmarked RBind in RNA-ligand and RNA-protein datasets. The average accuracy of 0.82 in RNA-ligand and 0.63 in RNA-protein testing showed that this network strategy has a reliable accuracy for binding sites prediction. The codes and datasets are available at https://zhaolab.com.cn/RBind. yjzhaowh@mail.ccnu.edu.cn. Supplementary data are available at Bioinformatics online.
Genetics of PCOS: A systematic bioinformatics approach to unveil the proteins responsible for PCOS.
Panda, Pritam Kumar; Rane, Riya; Ravichandran, Rahul; Singh, Shrinkhla; Panchal, Hetalkumar
2016-06-01
Polycystic ovary syndrome (PCOS) is a hormonal imbalance in women, which causes problems during menstrual cycle and in pregnancy that sometimes results in fatality. Though the genetics of PCOS is not fully understood, early diagnosis and treatment can prevent long-term effects. In this study, we have studied the proteins involved in PCOS and the structural aspects of the proteins that are taken into consideration using computational tools. The proteins involved are modeled using Modeller 9v14 and Ab-initio programs. All the 43 proteins responsible for PCOS were subjected to phylogenetic analysis to identify the relatedness of the proteins. Further, microarray data analysis of PCOS datasets was analyzed that was downloaded from GEO datasets to find the significant protein-coding genes responsible for PCOS, which is an addition to the reported protein-coding genes. Various statistical analyses were done using R programming to get an insight into the structural aspects of PCOS that can be used as drug targets to treat PCOS and other related reproductive diseases.
2014-01-01
Background The genome is pervasively transcribed but most transcripts do not code for proteins, constituting non-protein-coding RNAs. Despite increasing numbers of functional reports of individual long non-coding RNAs (lncRNAs), assessing the extent of functionality among the non-coding transcriptional output of mammalian cells remains intricate. In the protein-coding world, transcripts differentially expressed in the context of processes essential for the survival of multicellular organisms have been instrumental in the discovery of functionally relevant proteins and their deregulation is frequently associated with diseases. We therefore systematically identified lncRNAs expressed differentially in response to oncologically relevant processes and cell-cycle, p53 and STAT3 pathways, using tiling arrays. Results We found that up to 80% of the pathway-triggered transcriptional responses are non-coding. Among these we identified very large macroRNAs with pathway-specific expression patterns and demonstrated that these are likely continuous transcripts. MacroRNAs contain elements conserved in mammals and sauropsids, which in part exhibit conserved RNA secondary structure. Comparing evolutionary rates of a macroRNA to adjacent protein-coding genes suggests a local action of the transcript. Finally, in different grades of astrocytoma, a tumor disease unrelated to the initially used cell lines, macroRNAs are differentially expressed. Conclusions It has been shown previously that the majority of expressed non-ribosomal transcripts are non-coding. We now conclude that differential expression triggered by signaling pathways gives rise to a similar abundance of non-coding content. It is thus unlikely that the prevalence of non-coding transcripts in the cell is a trivial consequence of leaky or random transcription events. PMID:24594072
Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil
2015-02-01
The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
On the Origin of Protein Superfamilies and Superfolds
NASA Astrophysics Data System (ADS)
Magner, Abram; Szpankowski, Wojciech; Kihara, Daisuke
2015-02-01
Distributions of protein families and folds in genomes are highly skewed, having a small number of prevalent superfamiles/superfolds and a large number of families/folds of a small size. Why are the distributions of protein families and folds skewed? Why are there only a limited number of protein families? Here, we employ an information theoretic approach to investigate the protein sequence-structure relationship that leads to the skewed distributions. We consider that protein sequences and folds constitute an information theoretic channel and computed the most efficient distribution of sequences that code all protein folds. The identified distributions of sequences and folds are found to follow a power law, consistent with those observed for proteins in nature. Importantly, the skewed distributions of sequences and folds are suggested to have different origins: the skewed distribution of sequences is due to evolutionary pressure to achieve efficient coding of necessary folds, whereas that of folds is based on the thermodynamic stability of folds. The current study provides a new information theoretic framework for proteins that could be widely applied for understanding protein sequences, structures, functions, and interactions.
Structure-based barcoding of proteins.
Metri, Rahul; Jerath, Gaurav; Kailas, Govind; Gacche, Nitin; Pal, Adityabarna; Ramakrishnan, Vibin
2014-01-01
A reduced representation in the format of a barcode has been developed to provide an overview of the topological nature of a given protein structure from 3D coordinate file. The molecular structure of a protein coordinate file from Protein Data Bank is first expressed in terms of an alpha-numero code and further converted to a barcode image. The barcode representation can be used to compare and contrast different proteins based on their structure. The utility of this method has been exemplified by comparing structural barcodes of proteins that belong to same fold family, and across different folds. In addition to this, we have attempted to provide an illustration to (i) the structural changes often seen in a given protein molecule upon interaction with ligands and (ii) Modifications in overall topology of a given protein during evolution. The program is fully downloadable from the website http://www.iitg.ac.in/probar/. © 2013 The Protein Society.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4).
Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C
2015-01-01
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.
GeneBuilder: interactive in silico prediction of gene structure.
Milanesi, L; D'Angelo, D; Rogozin, I B
1999-01-01
Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.
Inhibition of Pancreatic Cancer Cell Proliferation by LRH-1 Inhibitors
2014-12-01
coordinates and structure factors have been deposited in the Protein Data Bank, www.pdb.org [ PDB ID codes 4QJR (SF-1/PIP3) and 4QK4 (SF-1/PIP2)]. 1To whom...with Rfree/Rcryst values of 23/19% (Table S2). The structure was deposited with the PDB ID code 4QJR. SF 1/PIP3 (Fig. 1C) adopts the classic NR LBD...PIP2) was solved by molecular replacement, using PDB ID code 1YOW as the search model, and compared with the SF 1/PIP3 structure (Table S2). The
AbouHaidar, Mounir Georges; Venkataraman, Srividhya; Golshani, Ashkan; Liu, Bolin; Ahmad, Tauqeer
2014-10-07
The highly structured (64% GC) covalently closed circular (CCC) RNA (220 nt) of the virusoid associated with rice yellow mottle virus codes for a 16-kDa highly basic protein using novel modalities for coding, translation, and gene expression. This CCC RNA is the smallest among all known viroids and virusoids and the only one that codes proteins. Its sequence possesses an internal ribosome entry site and is directly translated through two (or three) completely overlapping ORFs (shifting to a new reading frame at the end of each round). The initiation and termination codons overlap UGAUGA (underline highlights the initiation codon AUG within the combined initiation-termination sequence). Termination codons can be ignored to obtain larger read-through proteins. This circular RNA with no noncoding sequences is a unique natural supercompact "nanogenome."
Solov'ev, V V; Kel', A E; Kolchanov, N A
1989-01-01
The factors, determining the presence of inverted and symmetrical repeats in genes coding for globular proteins, have been analysed. An interesting property of genetical code has been revealed in the analysis of symmetrical repeats: the pairs of symmetrical codons corresponded to pairs of amino acids with mostly similar physical-chemical parameters. This property may explain the presence of symmetrical repeats and palindromes only in genes coding for beta-structural proteins-polypeptides, where amino acids with similar physical-chemical properties occupy symmetrical positions. A stochastic model of evolution of polynucleotide sequences has been used for analysis of inverted repeats. The modelling demonstrated that only limiting of sequences (uneven frequencies of used codons) is enough for arising of nonrandom inverted repeats in genes.
Chiusano, M L; D'Onofrio, G; Alvarez-Valin, F; Jabbari, K; Colonna, G; Bernardi, G
1999-09-30
We investigated the relationships between the nucleotide substitution rates and the predicted secondary structures in the three states representation (alpha-helix, beta-sheet, and coil). The analysis was carried out on 34 alignments, each of which comprised sequences belonging to at least four different mammalian orders. The rates of synonymous substitution were found to be significantly different in regions predicted to be alpha-helix, beta-sheet, or coil. Likewise, the nonsynonymous rates also differ, although expectedly at a lower extent, in the three types of secondary structure, suggesting that different selective constraints associated with the different structures are affecting in a similar way the synonymous and nonsynonymous rates. Moreover, the base composition of the third codon positions is different in coding sequence regions corresponding to different secondary structures of proteins.
Rubini, Marina; Lepthien, Sandra; Golbik, Ralph; Budisa, Nediljko
2006-07-01
The indole ring of the canonical amino acid tryptophan (Trp) possesses distinguished features, such as sterical bulk, hydrophobicity and the nitrogen atom which is capable of acting as a hydrogen bond donor. The introduction of an amino group into the indole moiety of Trp yields the structural analogs 4-aminotryptophan ((4-NH(2))Trp) and 5-aminotryptophan ((5-NH(2))Trp). Their hydrophobicity and spectral properties are substantially different when compared to those of Trp. They resemble the purine bases of DNA and share their capacity for pH-sensitive intramolecular charge transfer. The Trp --> aminotryptophan substitution in proteins during ribosomal translation is expected to result in related protein variants that acquire these features. These expectations have been fulfilled by incorporating (4-NH(2))Trp and (5-NH(2))Trp into barstar, an intracellular inhibitor of the ribonuclease barnase from Bacillus amyloliquefaciens. The crystal structure of (4-NH(2))Trp-barstar is similar to that of the parent protein, whereas its spectral and thermodynamic behavior is found to be remarkably different. The T(m) value of (4-NH(2))Trp- and (5-NH(2))Trp-barstar is lowered by about 20 degrees Celsius, and they exhibit a strongly reduced unfolding cooperativity and substantial loss of free energy in folding. Furthermore, folding kinetic study of (4-NH(2))Trp-barstar revealed that the denatured state is even preferred over native one. The combination of structural and thermodynamic analyses clearly shows how structures of substituted barstar display a typical structure-function tradeoff: the acquirement of unique pH-sensitive charge transfer as a novel function is achieved at the expense of protein stability. These findings provide a new insight into the evolution of the amino acid repertoire of the universal genetic code and highlight possible problems regarding protein engineering and design by using an expanded genetic code.
Faburay, Bonto; Wilson, William; McVey, D. Scott; Drolet, Barbara S.; Weingartl, Hana; Madden, Daniel; Young, Alan; Ma, Wenjun
2013-01-01
Abstract The Rift Valley fever virus (RVFV) encodes the structural proteins nucleoprotein (N), aminoterminal glycoprotein (Gn), carboxyterminal glycoprotein (Gc), and L protein, 78-kD, and the nonstructural proteins NSm and NSs. Using the baculovirus system, we expressed the full-length coding sequence of N, NSs, NSm, Gc, and the ectodomain of the coding sequence of the Gn glycoprotein derived from the virulent strain of RVFV ZH548. Western blot analysis using anti-His antibodies and monoclonal antibodies against Gn and N confirmed expression of the recombinant proteins, and in vitro biochemical analysis showed that the two glycoproteins, Gn and Gc, were expressed in glycosylated form. Immunoreactivity profiles of the recombinant proteins in western blot and in indirect enzyme-linked immunosorbent assay against a panel of antisera obtained from vaccinated or wild type (RVFV)-challenged sheep confirmed the results obtained with anti-His antibodies and demonstrated the suitability of the baculo-expressed antigens for diagnostic assays. In addition, these recombinant proteins could be valuable for the development of diagnostic methods that differentiate infected from vaccinated animals (DIVA). PMID:23962238
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.
Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H
2017-04-15
Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. Copyright © 2017 Sinclair et al.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification
Sinclair, Robert M.; Ravantti, Janne J.
2017-01-01
ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. PMID:28122979
Falconi, M; Oteri, F; Eliseo, T; Cicero, D O; Desideri, A
2008-08-01
The structural dynamics of the DNA binding domains of the human papillomavirus strain 16 and the bovine papillomavirus strain 1, complexed with their DNA targets, has been investigated by modeling, molecular dynamics simulations, and nuclear magnetic resonance analysis. The simulations underline different dynamical features of the protein scaffolds and a different mechanical interaction of the two proteins with DNA. The two protein structures, although very similar, show differences in the relative mobility of secondary structure elements. Protein structural analyses, principal component analysis, and geometrical and energetic DNA analyses indicate that the two transcription factors utilize a different strategy in DNA recognition and deformation. Results show that the protein indirect DNA readout is not only addressable to the DNA molecule flexibility but it is finely tuned by the mechanical and dynamical properties of the protein scaffold involved in the interaction.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...
2015-10-26
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones
Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio
2004-01-01
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394
The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...
2016-02-24
The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less
The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos
The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less
Meyer, Irmtraud M
2017-05-01
RNA transcripts are the primary products of active genes in any living organism, including many viruses. Their cellular destiny not only depends on primary sequence signals, but can also be determined by RNA structure. Recent experimental evidence shows that many transcripts can be assigned more than a single functional RNA structure throughout their cellular life and that structure formation happens co-transcriptionally, i.e. as the transcript is synthesised in the cell. Moreover, functional RNA structures are not limited to non-coding transcripts, but can also feature in coding transcripts. The picture that now emerges is that RNA structures constitute an additional layer of information that can be encoded in any RNA transcript (and on top of other layers of information such as protein-context) in order to exert a wide range of functional roles. Moreover, different encoded RNA structures can be expressed at different stages of a transcript's life in order to alter the transcript's behaviour depending on its actual cellular context. Similar to the concept of alternative splicing for protein-coding genes, where a single transcript can yield different proteins depending on cellular context, it is thus appropriate to propose the notion of alternative RNA structure expression for any given transcript. This review introduces several computational strategies that my group developed to detect different aspects of RNA structure expression in vivo. Two aspects are of particular interest to us: (1) RNA secondary structure features that emerge during co-transcriptional folding and (2) functional RNA structure features that are expressed at different times of a transcript's life and potentially mutually exclusive. Copyright © 2017. Published by Elsevier Inc.
AbouHaidar, Mounir Georges; Venkataraman, Srividhya; Golshani, Ashkan; Liu, Bolin; Ahmad, Tauqeer
2014-01-01
The highly structured (64% GC) covalently closed circular (CCC) RNA (220 nt) of the virusoid associated with rice yellow mottle virus codes for a 16-kDa highly basic protein using novel modalities for coding, translation, and gene expression. This CCC RNA is the smallest among all known viroids and virusoids and the only one that codes proteins. Its sequence possesses an internal ribosome entry site and is directly translated through two (or three) completely overlapping ORFs (shifting to a new reading frame at the end of each round). The initiation and termination codons overlap UGAUGA (underline highlights the initiation codon AUG within the combined initiation-termination sequence). Termination codons can be ignored to obtain larger read-through proteins. This circular RNA with no noncoding sequences is a unique natural supercompact “nanogenome.” PMID:25253891
Wise, C A; Chiang, L C; Paznekas, W A; Sharma, M; Musy, M M; Ashley, J A; Lovett, M; Jabs, E W
1997-04-01
Treacher Collins Syndrome (TCS) is the most common of the human mandibulofacial dysostosis disorders. Recently, a partial TCOF1 cDNA was identified and shown to contain mutations in TCS families. Here we present the entire exon/intron genomic structure and the complete coding sequence of TCOF1. TCOF1 encodes a low complexity protein of 1,411 amino acids, whose predicted protein structure reveals repeated motifs that mirror the organization of its exons. These motifs are shared with nucleolar trafficking proteins in other species and are predicted to be highly phosphorylated by casein kinase. Consistent with this, the full-length TCOF1 protein sequence also contains putative nuclear and nucleolar localization signals. Throughout the open reading frame, we detected an additional eight mutations in TCS families and several polymorphisms. We postulate that TCS results from defects in a nucleolar trafficking protein that is critically required during human craniofacial development.
Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan
2015-12-11
High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.
Acquisition of a Thermophoresis Instrument for Molecular Association Thermodynamic Studies
2015-05-20
using NAMD.27 Crystallographic structures of C3d ( PDB code 1C3D) and C3d-CR2 ( PDB code 3OED) were obtained from the protein data bank ( PDB ).28 Missing...This project is funded by DTRA (Defense Threat Reduction Agency) and aims to develop new multienzyme structures for the controlled destruction of...enable detection. Pharmacophore models were developed based on known C3d-ligand interactions and information from computational analysis of structural
On the evolution of primitive genetic codes.
Weberndorfer, Günter; Hofacker, Ivo L; Stadler, Peter F
2003-10-01
The primordial genetic code probably has been a drastically simplified ancestor of the canonical code that is used by contemporary cells. In order to understand how the present-day code came about we first need to explain how the language of the building plan can change without destroying the encoded information. In this work we introduce a minimal organism model that is based on biophysically reasonable descriptions of RNA and protein, namely secondary structure folding and knowledge based potentials. The evolution of a population of such organism under competition for a common resource is simulated explicitly at the level of individual replication events. Starting with very simple codes, and hence greatly reduced amino acid alphabets, we observe a diversification of the codes in most simulation runs. The driving force behind this effect is the possibility to produce fitter proteins when the repertoire of amino acids is enlarged.
Advances in RNA Structure Determination | Center for Cancer Research
The recent years have witnessed a revolution in the field of RNA structure and function. Until recently the main contribution of RNA in cellular and disease functions was considered to be a role defined by the central dogma, namely DNA codes for mRNAs, which in turn encode for proteins, a notion facilitated by non-coding ribosomal RNA and tRNA. It was also assumed at the time
Making the Bend: DNA Tertiary Structure and Protein-DNA Interactions
Harteis, Sabrina; Schneider, Sabine
2014-01-01
DNA structure functions as an overlapping code to the DNA sequence. Rapid progress in understanding the role of DNA structure in gene regulation, DNA damage recognition and genome stability has been made. The three dimensional structure of both proteins and DNA plays a crucial role for their specific interaction, and proteins can recognise the chemical signature of DNA sequence (“base readout”) as well as the intrinsic DNA structure (“shape recognition”). These recognition mechanisms do not exist in isolation but, depending on the individual interaction partners, are combined to various extents. Driving force for the interaction between protein and DNA remain the unique thermodynamics of each individual DNA-protein pair. In this review we focus on the structures and conformations adopted by DNA, both influenced by and influencing the specific interaction with the corresponding protein binding partner, as well as their underlying thermodynamics. PMID:25026169
A Dual Origin of the Xist Gene from a Protein-Coding Gene and a Set of Transposable Elements
Elisaphenko, Eugeny A.; Kolesnikov, Nikolay N.; Shevchenko, Alexander I.; Rogozin, Igor B.; Nesterova, Tatyana B.; Brockdorff, Neil; Zakian, Suren M.
2008-01-01
X-chromosome inactivation, which occurs in female eutherian mammals is controlled by a complex X-linked locus termed the X-inactivation center (XIC). Previously it was proposed that genes of the XIC evolved, at least in part, as a result of pseudogenization of protein-coding genes. In this study we show that the key XIC gene Xist, which displays fragmentary homology to a protein-coding gene Lnx3, emerged de novo in early eutherians by integration of mobile elements which gave rise to simple tandem repeats. The Xist gene promoter region and four out of ten exons found in eutherians retain homology to exons of the Lnx3 gene. The remaining six Xist exons including those with simple tandem repeats detectable in their structure have similarity to different transposable elements. Integration of mobile elements into Xist accompanies the overall evolution of the gene and presumably continues in contemporary eutherian species. Additionally we showed that the combination of remnants of protein-coding sequences and mobile elements is not unique to the Xist gene and is found in other XIC genes producing non-coding nuclear RNA. PMID:18575625
Vicente, Juan J; Galardi-Castilla, María; Escalante, Ricardo; Sastre, Leandro
2008-01-03
The social amoeba Dictyostelium discoideum executes a multicellular development program upon starvation. This morphogenetic process requires the differential regulation of a large number of genes and is coordinated by extracellular signals. The MADS-box transcription factor SrfA is required for several stages of development, including slug migration and spore terminal differentiation. Subtractive hybridization allowed the isolation of a gene, sigN (SrfA-induced gene N), that was dependent on the transcription factor SrfA for expression at the slug stage of development. Homology searches detected the existence of a large family of sigN-related genes in the Dictyostelium discoideum genome. The 13 most similar genes are grouped in two regions of chromosome 2 and have been named Group1 and Group2 sigN genes. The putative encoded proteins are 87-89 amino acids long. All these genes have a similar structure, composed of a first exon containing a 13 nucleotides long open reading frame and a second exon comprising the remaining of the putative coding region. The expression of these genes is induced at10 hours of development. Analyses of their promoter regions indicate that these genes are expressed in the prestalk region of developing structures. The addition of antibodies raised against SigN Group 2 proteins induced disintegration of multi-cellular structures at the mound stage of development. A large family of genes coding for small proteins has been identified in D. discoideum. Two groups of very similar genes from this family have been shown to be specifically expressed in prestalk cells during development. Functional studies using antibodies raised against Group 2 SigN proteins indicate that these genes could play a role during multicellular development.
Do plant cell walls have a code?
Tavares, Eveline Q P; Buckeridge, Marcos S
2015-12-01
A code is a set of rules that establish correspondence between two worlds, signs (consisting of encrypted information) and meaning (of the decrypted message). A third element, the adaptor, connects both worlds, assigning meaning to a code. We propose that a Glycomic Code exists in plant cell walls where signs are represented by monosaccharides and phenylpropanoids and meaning is cell wall architecture with its highly complex association of polymers. Cell wall biosynthetic mechanisms, structure, architecture and properties are addressed according to Code Biology perspective, focusing on how they oppose to cell wall deconstruction. Cell wall hydrolysis is mainly focused as a mechanism of decryption of the Glycomic Code. Evidence for encoded information in cell wall polymers fine structure is highlighted and the implications of the existence of the Glycomic Code are discussed. Aspects related to fine structure are responsible for polysaccharide packing and polymer-polymer interactions, affecting the final cell wall architecture. The question whether polymers assembly within a wall display similar properties as other biological macromolecules (i.e. proteins, DNA, histones) is addressed, i.e. do they display a code? Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Jaag, Hannah Miriam; Kawchuk, Lawrence; Rohde, Wolfgang; Fischer, Rainer; Emans, Neil; Prüfer, Dirk
2003-01-01
Potato leafroll polerovirus (PLRV) genomic RNA acts as a polycistronic mRNA for the production of proteins P0, P1, and P2 translated from the 5′-proximal half of the genome. Within the P1 coding region we identified a 5-kDa replication-associated protein 1 (Rap1) essential for viral multiplication. An internal ribosome entry site (IRES) with unusual structure and location was identified that regulates Rap1 translation. Core structural elements for internal ribosome entry include a conserved AUG codon and a downstream GGAGAGAGAGG motif with inverted symmetry. Reporter gene expression in potato protoplasts confirmed the internal ribosome entry function. Unlike known IRES motifs, the PLRV IRES is located completely within the coding region of Rap1 at the center of the PLRV genome. PMID:12835413
Online interactive analysis of protein structure ensembles with Bio3D-web.
Skjærven, Lars; Jariwala, Shashank; Yao, Xin-Qiu; Grant, Barry J
2016-11-15
Bio3D-web is an online application for analyzing the sequence, structure and conformational heterogeneity of protein families. Major functionality is provided for identifying protein structure sets for analysis, their alignment and refined structure superposition, sequence and structure conservation analysis, mapping and clustering of conformations and the quantitative comparison of their predicted structural dynamics. Bio3D-web is based on the Bio3D and Shiny R packages. All major browsers are supported and full source code is available under a GPL2 license from http://thegrantlab.org/bio3d-web CONTACT: bjgrant@umich.edu or lars.skjarven@uib.no. © The Author 2016. Published by Oxford University Press.
Origins of the protein synthesis cycle
NASA Technical Reports Server (NTRS)
Fox, S. W.
1981-01-01
Largely derived from experiments in molecular evolution, a theory of protein synthesis cycles has been constructed. The sequence begins with ordered thermal proteins resulting from the self-sequencing of mixed amino acids. Ordered thermal proteins then aggregate to cell-like structures. When they contained proteinoids sufficiently rich in lysine, the structures were able to synthesize offspring peptides. Since lysine-rich proteinoid (LRP) also catalyzes the polymerization of nucleoside triphosphate to polynucleotides, the same microspheres containing LRP could have synthesized both original cellular proteins and cellular nucleic acids. The LRP within protocells would have provided proximity advantageous for the origin and evolution of the genetic code.
Shah, M S; Ashraf, A; Khan, M I; Rahman, M; Habib, M; Qureshi, J A
2016-01-01
Fowl adenovirus-4 is an infectious agent causing Hydropericardium syndrome in chickens. Adenovirus are non-enveloped virions having linear, double stranded DNA. Viral genome codes for few structural and non structural proteins. 100K is an important non-structural viral protein. Open reading frame for coding sequence of 100K protein was cloned with oligo histidine tag and expressed in Escherichia coli as a fusion protein. Nucleotide sequence of the gene revealed that 100K gene of FAdV-4 has high homology (98%) with the respective gene of FAdV-10. Recombinant 100K protein was expressed in E. coli and purified by nickel affinity chromatography. Immunization of chickens with recombinant 100K protein elicited significant serum antibody titers. However challenge protection test revealed that 100K protein conferred little protection (40%) to the immunized chicken against pathogenic viral challenge. So it was concluded that 100K gene has 2397 bp length and recombinant 100K protein has molecular weight of 95 kDa. It was also found that the recombinant protein has little capacity to affect the immune response because in-spite of having an important role in intracellular transport & folding of viral capsid proteins during viral replication, it is not exposed on the surface of the virus at any stage. Copyright © 2015 The International Alliance for Biological Standardization. All rights reserved.
Neuhaus, Klaus; Landstorfer, Richard; Fellner, Lea; Simon, Svenja; Schafferhans, Andrea; Goldberg, Tatyana; Marx, Harald; Ozoline, Olga N; Rost, Burkhard; Kuster, Bernhard; Keim, Daniel A; Scherer, Siegfried
2016-02-24
Genomes of E. coli, including that of the human pathogen Escherichia coli O157:H7 (EHEC) EDL933, still harbor undetected protein-coding genes which, apparently, have escaped annotation due to their small size and non-essential function. To find such genes, global gene expression of EHEC EDL933 was examined, using strand-specific RNAseq (transcriptome), ribosomal footprinting (translatome) and mass spectrometry (proteome). Using the above methods, 72 short, non-annotated protein-coding genes were detected. All of these showed signals in the ribosomal footprinting assay indicating mRNA translation. Seven were verified by mass spectrometry. Fifty-seven genes are annotated in other enterobacteriaceae, mainly as hypothetical genes; the remaining 15 genes constitute novel discoveries. In addition, protein structure and function were predicted computationally and compared between EHEC-encoded proteins and 100-times randomly shuffled proteins. Based on this comparison, 61 of the 72 novel proteins exhibit predicted structural and functional features similar to those of annotated proteins. Many of the novel genes show differential transcription when grown under eleven diverse growth conditions suggesting environmental regulation. Three genes were found to confer a phenotype in previous studies, e.g., decreased cattle colonization. These findings demonstrate that ribosomal footprinting can be used to detect novel protein coding genes, contributing to the growing body of evidence that hypothetical genes are not annotation artifacts and opening an additional way to study their functionality. All 72 genes are taxonomically restricted and, therefore, appear to have evolved relatively recently de novo.
Specific and Modular Binding Code for Cytosine Recognition in Pumilio/FBF (PUF) RNA-binding Domains
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dong, Shuyun; Wang, Yang; Cassidy-Amstutz, Caleb
2011-10-28
Pumilio/fem-3 mRNA-binding factor (PUF) proteins possess a recognition code for bases A, U, and G, allowing designed RNA sequence specificity of their modular Pumilio (PUM) repeats. However, recognition side chains in a PUM repeat for cytosine are unknown. Here we report identification of a cytosine-recognition code by screening random amino acid combinations at conserved RNA recognition positions using a yeast three-hybrid system. This C-recognition code is specific and modular as specificity can be transferred to different positions in the RNA recognition sequence. A crystal structure of a modified PUF domain reveals specific contacts between an arginine side chain and themore » cytosine base. We applied the C-recognition code to design PUF domains that recognize targets with multiple cytosines and to generate engineered splicing factors that modulate alternative splicing. Finally, we identified a divergent yeast PUF protein, Nop9p, that may recognize natural target RNAs with cytosine. This work deepens our understanding of natural PUF protein target recognition and expands the ability to engineer PUF domains to recognize any RNA sequence.« less
G2S: a web-service for annotating genomic variants on 3D protein structures.
Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong
2018-06-01
Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that supports programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online.
An Amino Acid Packing Code for α-helical Structure and Protein Design
Joo, Hyun; Chavan, Archana G.; Phan, Jamie; Day, Ryan; Tsai, Jerry
2012-01-01
This work demonstrates that all packing in α-helices can be simplified to repetitive patterns of a single motif: the knob-socket. Using the precision of Voronoi Polyhedra/Deluaney Tessellations to identify contacts, the knob-socket is a 4 residue tetrahedral motif: a knob residue on one α-helix packs into the 3 residue socket on another α-helix. The principle of the knob-socket model relates the packing between levels of protein structure: the intra-helical packing arrangements within secondary structure that permit inter-helix tertiary packing interactions. Within an α-helix, the 3 residue sockets arrange residues into a uniform packing lattice. Inter-helix packing results from a definable pattern of interdigitated knob-socket motifs between 2 α-helices. Furthermore, the knob-socket model classifies 3 types of sockets: 1) free: favoring only intra-helical packing, 2) filled: favoring inter-helical interactions and 3) non: disfavoring α-helical structure. The amino acid propensities in these 3 socket classes essentially represent an amino acid code for structure in α-helical packing. Using this code, a novel yet straightforward approach for the design of α-helical structure was used to validate the knob-socket model. Unique sequences for 3 peptides were created to produce a predicted amount of α-helical structure: mostly helical, some helical, and no-helix. These 3 peptides were synthesized and helical content assessed using CD spectroscopy. The measured α-helicity of each peptide was consistent with the expected predictions. These results and analysis demonstrate that the knob-socket motif functions as the basic unit of packing and presents an intuitive tool to decipher the rules governing packing in protein structure. PMID:22426125
Reinhardt, Josephine A.; Wanjiru, Betty M.; Brant, Alicia T.; Saelao, Perot; Begun, David J.; Jones, Corbin D.
2013-01-01
How non-coding DNA gives rise to new protein-coding genes (de novo genes) is not well understood. Recent work has revealed the origins and functions of a few de novo genes, but common principles governing the evolution or biological roles of these genes are unknown. To better define these principles, we performed a parallel analysis of the evolution and function of six putatively protein-coding de novo genes described in Drosophila melanogaster. Reconstruction of the transcriptional history of de novo genes shows that two de novo genes emerged from novel long non-coding RNAs that arose at least 5 MY prior to evolution of an open reading frame. In contrast, four other de novo genes evolved a translated open reading frame and transcription within the same evolutionary interval suggesting that nascent open reading frames (proto-ORFs), while not required, can contribute to the emergence of a new de novo gene. However, none of the genes arose from proto-ORFs that existed long before expression evolved. Sequence and structural evolution of de novo genes was rapid compared to nearby genes and the structural complexity of de novo genes steadily increases over evolutionary time. Despite the fact that these genes are transcribed at a higher level in males than females, and are most strongly expressed in testes, RNAi experiments show that most of these genes are essential in both sexes during metamorphosis. This lethality suggests that protein coding de novo genes in Drosophila quickly become functionally important. PMID:24146629
Crowley, T E; Bond, M W; Meyerowitz, E M
1983-01-01
The polytene chromosome puff at 68C on the Drosophila melanogaster third chromosome is thought from genetic experiments to contain the structural gene for one of the secreted salivary gland glue polypeptides, sgs-3. Previous work has demonstrated that the DNA included in this puff contains sequences that are transcribed to give three different polyadenylated RNAs that are abundant in third-larval-instar salivary glands. These have been called the group II, group III, and group IV RNAs. In the experiments reported here, we used the nucleotide sequence of the DNA coding for these RNAs to predict some of the physical and chemical properties expected of their protein products, including molecular weight, amino acid composition, and amino acid sequence. Salivary gland polypeptides with molecular weights similar to those expected for the 68C RNA translation products, and with the expected degree of incorporation of different radioactive amino acids, were purified. These proteins were shown by amino acid sequencing to correspond to the protein products of the 68C RNAs. It was further shown that each of these proteins is a part of the secreted salivary gland glue: the group IV RNA codes for the previously described sgs-3, whereas the group II and III RNAs code for the newly identified glue polypeptides sgs-8 and sgs-7. Images PMID:6406838
Identification and characterization of long non-coding RNAs in rainbow trout eggs
USDA-ARS?s Scientific Manuscript database
Long non-coding RNAs (lncRNAs) are in general considered as a diverse class of transcripts longer than 200 nucleotides that structurally resemble mRNAs but do not encode proteins. Recent advances in RNA sequencing (RNA-Seq) and bioinformatics methods have provided an opportunity to indentify and ana...
Seligmann, Hervé; Warthi, Ganesh
2017-01-01
A new codon property, codon directional asymmetry in nucleotide content (CDA), reveals a biologically meaningful genetic code dimension: palindromic codons (first and last nucleotides identical, codon structure XZX) are symmetric (CDA = 0), codons with structures ZXX/XXZ are 5'/3' asymmetric (CDA = - 1/1; CDA = - 0.5/0.5 if Z and X are both purines or both pyrimidines, assigning negative/positive (-/+) signs is an arbitrary convention). Negative/positive CDAs associate with (a) Fujimoto's tetrahedral codon stereo-table; (b) tRNA synthetase class I/II (aminoacylate the 2'/3' hydroxyl group of the tRNA's last ribose, respectively); and (c) high/low antiparallel (not parallel) betasheet conformation parameters. Preliminary results suggest CDA-whole organism associations (body temperature, developmental stability, lifespan). Presumably, CDA impacts spatial kinetics of codon-anticodon interactions, affecting cotranslational protein folding. Some synonymous codons have opposite CDA sign (alanine, leucine, serine, and valine), putatively explaining how synonymous mutations sometimes affect protein function. Correlations between CDA and tRNA synthetase classes are weaker than between CDA and antiparallel betasheet conformation parameters. This effect is stronger for mitochondrial genetic codes, and potentially drives mitochondrial codon-amino acid reassignments. CDA reveals information ruling nucleotide-protein relations embedded in reversed (not reverse-complement) sequences (5'-ZXX-3'/5'-XXZ-3').
Crystal structure of the YDR533c S. cerevisiae protein, a class II member of the Hsp31 family.
Graille, Marc; Quevillon-Cheruel, Sophie; Leulliot, Nicolas; Zhou, Cong-Zhao; Li de la Sierra Gallay, Ines; Jacquamet, Lilian; Ferrer, Jean-Luc; Liger, Dominique; Poupon, Anne; Janin, Joel; van Tilbeurgh, Herman
2004-05-01
The ORF YDR533c from Saccharomyces cerevisiae codes for a 25.5 kDa protein of unknown biochemical function. Transcriptome analysis of yeast has shown that this gene is activated in response to various stress conditions together with proteins belonging to the heat shock family. In order to clarify its biochemical function, we determined the crystal structure of YDR533c to 1.85 A resolution by the single anomalous diffraction method. The protein possesses an alpha/beta hydrolase fold and a putative Cys-His-Glu catalytic triad common to a large enzyme family containing proteases, amidotransferases, lipases, and esterases. The protein has strong structural resemblance with the E. coli Hsp31 protein and the intracellular protease I from Pyrococcus horikoshii, which are considered class I and class III members of the Hsp31 family, respectively. Detailed structural analysis strongly suggests that the YDR533c protein crystal structure is the first one of a class II member of the Hsp31 family.
Pre-calculated protein structure alignments at the RCSB PDB website.
Prlic, Andreas; Bliven, Spencer; Rose, Peter W; Bluhm, Wolfgang F; Bizon, Chris; Godzik, Adam; Bourne, Philip E
2010-12-01
With the continuous growth of the RCSB Protein Data Bank (PDB), providing an up-to-date systematic structure comparison of all protein structures poses an ever growing challenge. Here, we present a comparison tool for calculating both 1D protein sequence and 3D protein structure alignments. This tool supports various applications at the RCSB PDB website. First, a structure alignment web service calculates pairwise alignments. Second, a stand-alone application runs alignments locally and visualizes the results. Third, pre-calculated 3D structure comparisons for the whole PDB are provided and updated on a weekly basis. These three applications allow users to discover novel relationships between proteins available either at the RCSB PDB or provided by the user. A web user interface is available at http://www.rcsb.org/pdb/workbench/workbench.do. The source code is available under the LGPL license from http://www.biojava.org. A source bundle, prepared for local execution, is available from http://source.rcsb.org andreas@sdsc.edu; pbourne@ucsd.edu.
Fan, Ming; Zheng, Bin; Li, Lihua
2015-10-01
Knowledge of the structural class of a given protein is important for understanding its folding patterns. Although a lot of efforts have been made, it still remains a challenging problem for prediction of protein structural class solely from protein sequences. The feature extraction and classification of proteins are the main problems in prediction. In this research, we extended our earlier work regarding these two aspects. In protein feature extraction, we proposed a scheme by calculating the word frequency and word position from sequences of amino acid, reduced amino acid, and secondary structure. For an accurate classification of the structural class of protein, we developed a novel Multi-Agent Ada-Boost (MA-Ada) method by integrating the features of Multi-Agent system into Ada-Boost algorithm. Extensive experiments were taken to test and compare the proposed method using four benchmark datasets in low homology. The results showed classification accuracies of 88.5%, 96.0%, 88.4%, and 85.5%, respectively, which are much better compared with the existing methods. The source code and dataset are available on request.
Liberek, K; Osipiuk, J; Zylicz, M; Ang, D; Skorko, J; Georgopoulos, C
1990-02-25
The process of initiation of lambda DNA replication requires the assembly of the proper nucleoprotein complex at the origin of replication, ori lambda. The complex is composed of both phage and host-coded proteins. The lambda O initiator protein binds specifically to ori lambda. The lambda P initiator protein binds to both lambda O and the host-coded dnaB helicase, giving rise to an ori lambda DNA.lambda O.lambda P.dnaB structure. The dnaK and dnaJ heat shock proteins have been shown capable of dissociating this complex. The thus freed dnaB helicase unwinds the duplex DNA template at the replication fork. In this report, through cross-linking, size chromatography, and protein affinity chromatography, we document some of the protein-protein interactions occurring at ori lambda. Our results show that the dnaK protein specifically interacts with both lambda O and lambda P, and that the dnaJ protein specifically interacts with the dnaB helicase.
2008-05-01
4 ). The three-dimensional spatial orientation of the atoms for these resolved solution structures (Protein Data Bank accession codes: 2gt3...Crystal structure of the Escherichia coli peptide methionine sulphoxide reductase at 1.9 Å resolution . Struct. Fold. Des. 8: 1167 – 1178. 2 . Brot...sources (8). There is a 67% sequence identity between the E.coli and human MsrA ( 2 ). N-terminus C-terminus Figure 2 . Three-dimensional structure
Meyer, Philippe; Liger, Dominique; Leulliot, Nicolas; Quevillon-Cheruel, Sophie; Zhou, Cong-Zhao; Borel, Franck; Ferrer, Jean-Luc; Poupon, Anne; Janin, Joël; van Tilbeurgh, Herman
2005-12-01
We have determined the three-dimensional crystal structure of the protein encoded by the open reading frame YFL030w from Saccharomyces cerevisiae to a resolution of 2.6 A using single wavelength anomalous diffraction. YFL030w is a 385 amino-acid protein with sequence similarity to the aminotransferase family. The structure of the protein reveals a homodimer adopting the fold-type I of pyridoxal 5'-phosphate (PLP)-dependent aminotransferases. The PLP co-factor is covalently bound to the active site in the crystal structure. The protein shows close structural resemblance with the human alanine:glyoxylate aminotransferase (EC 2.6.1.44), an enzyme involved in the hereditary kidney stone disease primary hyperoxaluria type 1. In this paper we show that YFL030w codes for an alanine:glyoxylate aminotransferase, highly specific for its amino donor and acceptor substrates.
Molecular Dynamics Analysis of Lysozyme Protein in Ethanol- Water Mixed Solvent
2012-01-01
molecular dynamics simulations of solvent effect on lysozyme protein, using water, ethanol, and different concentrations of water-ethanol mixtures as...understood. This work focuses on detailed molecular dynamics simulations of solvent effect on lysozyme protein, using water, ethanol, and different...using GROMACS molecular dynamics simulation (MD) code. Compared to water environment, the lysozyme structure showed remarkable changes in water
Chen, Frank; Spano, Anthony; Goodman, Benjamin E.; Blasier, Kiev R.; Sabat, Agnes; Jeffery, Erin; Norris, Andrew; Shabanowitz, Jeffrey; Hunt, Donald F.; Lebedev, Nikolai
2010-01-01
The gene transfer agent of Rhodobacter capsulatus (GTA) is a unique phage-like particle that exchanges genetic information between members of this same species of bacterium. Besides being an excellent tool for genetic mapping, the GTA has a number of advantages for biotechnological and nanoengineering purposes. To facilitate the GTA purification and identify the proteins involved in GTA expression, assembly and regulation, in the present work we construct and transform into R. capsulatus Y262 a gene coding for a C-terminally His-tagged capsid protein. The constructed protein was expressed in the cells, assembled into chimeric GTA particles inside the cells and excreted from the cells into surrounding medium. Transmission electron micrographs of phosphotungstate-stained, NiNTA-purified chimeric GTA confirm that its structure is similar to normal GTA particles, with many particles composed both of a head and a tail. The mass spectrometric proteomic analysis of polypeptides present in the GTA recovered outside the cells shows that GTA is composed of at least 9 proteins represented in the GTA gene cluster including proteins coded for by Orf’s 3, 5, 6–9, 11, 13, and 15. PMID:19105630
Chen, Frank; Spano, Anthony; Goodman, Benjamin E; Blasier, Kiev R; Sabat, Agnes; Jeffery, Erin; Norris, Andrew; Shabanowitz, Jeffrey; Hunt, Donald F; Lebedev, Nikolai
2009-02-01
The gene transfer agent of Rhodobacter capsulatus (GTA) is a unique phage-like particle that exchanges genetic information between members of this same species of bacterium. Besides being an excellent tool for genetic mapping, the GTA has a number of advantages for biotechnological and nanoengineering purposes. To facilitate the GTA purification and identify the proteins involved in GTA expression, assembly and regulation, in the present work we construct and transform into R. capsulatus Y262 a gene coding for a C-terminally His-tagged capsid protein. The constructed protein was expressed in the cells, assembled into chimeric GTA particles inside the cells and excreted from the cells into surrounding medium. Transmission electron micrographs of phosphotungstate-stained, NiNTA-purified chimeric GTA confirm that its structure is similar to normal GTA particles, with many particles composed both of a head and a tail. The mass spectrometric proteomic analysis of polypeptides present in the GTA recovered outside the cells shows that GTA is composed of at least 9 proteins represented in the GTA gene cluster including proteins coded for by Orf's 3, 5, 6-9, 11, 13, and 15.
Guttman, Mitchell; Garber, Manuel; Levin, Joshua Z.; Donaghey, Julie; Robinson, James; Adiconis, Xian; Fan, Lin; Koziol, Magdalena J.; Gnirke, Andreas; Nusbaum, Chad; Rinn, John L.; Lander, Eric S.; Regev, Aviv
2010-01-01
RNA-Seq provides an unbiased way to study a transcriptome, including both coding and non-coding genes. To date, most RNA-Seq studies have critically depended on existing annotations, and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We apply it to mouse embryonic stem cells, neuronal precursor cells, and lung fibroblasts to accurately reconstruct the full-length gene structures for the vast majority of known expressed genes. We identify substantial variation in protein-coding genes, including thousands of novel 5′-start sites, 3′-ends, and internal coding exons. We then determine the gene structures of over a thousand lincRNA and antisense loci. Our results open the way to direct experimental manipulation of thousands of non-coding RNAs, and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes. PMID:20436462
Reuther, Peter; Göpfert, Kristina; Dudek, Alexandra H.; Heiner, Monika; Herold, Susanne; Schwemmle, Martin
2015-01-01
Influenza A viruses (IAV) pose a constant threat to the human population and therefore a better understanding of their fundamental biology and identification of novel therapeutics is of upmost importance. Various reporter-encoding IAV were generated to achieve these goals, however, one recurring difficulty was the genetic instability especially of larger reporter genes. We employed the viral NS segment coding for the non-structural protein 1 (NS1) and nuclear export protein (NEP) for stable expression of diverse reporter proteins. This was achieved by converting the NS segment into a single open reading frame (ORF) coding for NS1, the respective reporter and NEP. To allow expression of individual proteins, the reporter genes were flanked by two porcine Teschovirus-1 2A peptide (PTV-1 2A)-coding sequences. The resulting viruses encoding luciferases, fluorescent proteins or a Cre recombinase are characterized by a high genetic stability in vitro and in mice and can be readily employed for antiviral compound screenings, visualization of infected cells or cells that survived acute infection. PMID:26068081
von Grotthuss, Marcin; Plewczynski, Dariusz; Ginalski, Krzysztof; Rychlewski, Leszek; Shakhnovich, Eugene I
2006-02-06
The number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB). Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using only structural similarity. Here we present the PDB-UF database, a web-accessible collection of predictions of enzymatic properties using structure-function relationship. The assignments were conducted for three-dimensional protein structures of unknown function that come from structural genomics initiatives. We show that 4 hypothetical proteins (with PDB accession codes: 1VH0, 1NS5, 1O6D, and 1TO0), for which standard BLAST tools such as PSI-BLAST or RPS-BLAST failed to assign any function, are probably methyltransferase enzymes. We suggest that the structure-based prediction of an EC number should be conducted having the different similarity score cutoff for different protein folds. Moreover, performing the annotation using two different algorithms can reduce the rate of false positive assignments. We believe, that the presented web-based repository will help to decrease the number of protein structures that have functions marked as "unknown" in the PDB file. http://paradox.harvard.edu/PDB-UF and http://bioinfo.pl/PDB-UF.
Shao, Wei; Liu, Mingxia; Zhang, Daoqiang
2016-01-01
The systematic study of subcellular location pattern is very important for fully characterizing the human proteome. Nowadays, with the great advances in automated microscopic imaging, accurate bioimage-based classification methods to predict protein subcellular locations are highly desired. All existing models were constructed on the independent parallel hypothesis, where the cellular component classes are positioned independently in a multi-class classification engine. The important structural information of cellular compartments is missed. To deal with this problem for developing more accurate models, we proposed a novel cell structure-driven classifier construction approach (SC-PSorter) by employing the prior biological structural information in the learning model. Specifically, the structural relationship among the cellular components is reflected by a new codeword matrix under the error correcting output coding framework. Then, we construct multiple SC-PSorter-based classifiers corresponding to the columns of the error correcting output coding codeword matrix using a multi-kernel support vector machine classification approach. Finally, we perform the classifier ensemble by combining those multiple SC-PSorter-based classifiers via majority voting. We evaluate our method on a collection of 1636 immunohistochemistry images from the Human Protein Atlas database. The experimental results show that our method achieves an overall accuracy of 89.0%, which is 6.4% higher than the state-of-the-art method. The dataset and code can be downloaded from https://github.com/shaoweinuaa/. dqzhang@nuaa.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhati, Mugdha; Lee, Christopher; Nancarrow, Amy L.
2008-09-03
LIM-homeodomain (LIM-HD) transcription factors form a combinatorial 'LIM code' that contributes to the specification of cell types. In the ventral spinal cord, the binary LIM homeobox protein 3 (Lhx3)/LIM domain-binding protein 1 (Ldb1) complex specifies the formation of V2 interneurons. The additional expression of islet-1 (Isl1) in adjacent cells instead specifies the formation of motor neurons through assembly of a ternary complex in which Isl1 contacts both Lhx3 and Ldb1, displacing Lhx3 as the binding partner of Ldb1. However, little is known about how this molecular switch occurs. Here, we have identified the 30-residue Lhx3-binding domain on Isl1 (Isl1{sub LBD}).more » Although the LIM interaction domain of Ldb1 (Ldb1{sub LID}) and Isl1{sub LBD} share low levels of sequence homology, X-ray and NMR structures reveal that they bind Lhx3 in an identical manner, that is, Isl1{sub LBD} mimics Ldb1{sub LID}. These data provide a structural basis for the formation of cell type-specific protein-protein interactions in which unstructured linear motifs with diverse sequences compete to bind protein partners. The resulting alternate protein complexes can target different genes to regulate key biological events.« less
3D Printing of Protein Models in an Undergraduate Laboratory: Leucine Zippers
ERIC Educational Resources Information Center
Meyer, Scott C.
2015-01-01
An upper-division undergraduate laboratory experiment is described that explores the structure/function relationship of protein domains, namely leucine zippers, through a molecular graphics computer program and physical models fabricated by 3D printing. By generating solvent accessible surfaces and color-coding hydrophobic, basic, and acidic amino…
Complete mitochondrial genome of a Asian lion (Panthera leo goojratensis).
Li, Yu-Fei; Wang, Qiang; Zhao, Jian-ning
2016-01-01
The entire mitochondrial genome of this Asian lion (Panthera leo goojratensis) was 17,183 bp in length, gene composition and arrangement conformed to other lions, which contained the typical structure of 22 tRNAs, 2 rRNAs, 13 protein-coding genes and a non-coding region. The characteristic of the mitochondrial genome was analyzed in detail.
Piecemeal Buildup of the Genetic Code, Ribosomes, and Genomes from Primordial tRNA Building Blocks
Caetano-Anollés, Derek; Caetano-Anollés, Gustavo
2016-01-01
The origin of biomolecular machinery likely centered around an ancient and central molecule capable of interacting with emergent macromolecular complexity. tRNA is the oldest and most central nucleic acid molecule of the cell. Its co-evolutionary interactions with aminoacyl-tRNA synthetase protein enzymes define the specificities of the genetic code and those with the ribosome their accurate biosynthetic interpretation. Phylogenetic approaches that focus on molecular structure allow reconstruction of evolutionary timelines that describe the history of RNA and protein structural domains. Here we review phylogenomic analyses that reconstruct the early history of the synthetase enzymes and the ribosome, their interactions with RNA, and the inception of amino acid charging and codon specificities in tRNA that are responsible for the genetic code. We also trace the age of domains and tRNA onto ancient tRNA homologies that were recently identified in rRNA. Our findings reveal a timeline of recruitment of tRNA building blocks for the formation of a functional ribosome, which holds both the biocatalytic functions of protein biosynthesis and the ability to store genetic memory in primordial RNA genomic templates. PMID:27918435
Piecemeal Buildup of the Genetic Code, Ribosomes, and Genomes from Primordial tRNA Building Blocks.
Caetano-Anollés, Derek; Caetano-Anollés, Gustavo
2016-12-02
The origin of biomolecular machinery likely centered around an ancient and central molecule capable of interacting with emergent macromolecular complexity. tRNA is the oldest and most central nucleic acid molecule of the cell. Its co-evolutionary interactions with aminoacyl-tRNA synthetase protein enzymes define the specificities of the genetic code and those with the ribosome their accurate biosynthetic interpretation. Phylogenetic approaches that focus on molecular structure allow reconstruction of evolutionary timelines that describe the history of RNA and protein structural domains. Here we review phylogenomic analyses that reconstruct the early history of the synthetase enzymes and the ribosome, their interactions with RNA, and the inception of amino acid charging and codon specificities in tRNA that are responsible for the genetic code. We also trace the age of domains and tRNA onto ancient tRNA homologies that were recently identified in rRNA. Our findings reveal a timeline of recruitment of tRNA building blocks for the formation of a functional ribosome, which holds both the biocatalytic functions of protein biosynthesis and the ability to store genetic memory in primordial RNA genomic templates.
Dubey, Bhawna; Meganathan, P R; Haque, Ikramul
2012-07-01
This paper reports the complete mitochondrial genome sequence of an endangered Indian snake, Python molurus molurus (Indian Rock Python). A typical snake mitochondrial (mt) genome of 17258 bp length comprising of 37 genes including the 13 protein coding genes, 22 tRNA genes, and 2 ribosomal RNA genes along with duplicate control regions is described herein. The P. molurus molurus mt. genome is relatively similar to other snake mt. genomes with respect to gene arrangement, composition, tRNA structures and skews of AT/GC bases. The nucleotide composition of the genome shows that there are more A-C % than T-G% on the positive strand as revealed by positive AT and CG skews. Comparison of individual protein coding genes, with other snake genomes suggests that ATP8 and NADH3 genes have high divergence rates. Codon usage analysis reveals a preference of NNC codons over NNG codons in the mt. genome of P. molurus. Also, the synonymous and non-synonymous substitution rates (ka/ks) suggest that most of the protein coding genes are under purifying selection pressure. The phylogenetic analyses involving the concatenated 13 protein coding genes of P. molurus molurus conformed to the previously established snake phylogeny.
Kikhno, Irina
2014-01-01
Highly homologous sequences 154–157 bp in length grouped under the name of “conserved non-protein-coding element” (CNE) were revealed in all of the sequenced genomes of baculoviruses belonging to the genus Alphabaculovirus. A CNE alignment led to the detection of a set of highly conserved nucleotide clusters that occupy strictly conserved positions in the CNE sequence. The significant length of the CNE and conservation of both its length and cluster architecture were identified as a combination of characteristics that make this CNE different from known viral non-coding functional sequences. The essential role of the CNE in the Alphabaculovirus life cycle was demonstrated through the use of a CNE-knockout Autographa californica multiple nucleopolyhedrovirus (AcMNPV) bacmid. It was shown that the essential function of the CNE was not mediated by the presumed expression activities of the protein- and non-protein-coding genes that overlap the AcMNPV CNE. On the basis of the presented data, the AcMNPV CNE was categorized as a complex-structured, polyfunctional genomic element involved in an essential DNA transaction that is associated with an undefined function of the baculovirus genome. PMID:24740153
Structure and Expression of Genes for Flavivirus Immunogens.
1985-09-01
the same order in YFV i.e., C-M-E-NSI--- NS3---NS5 and an open reading frame extends at least through the C-M-E-NS1 coding region, consistent with...been determined (Castle et al., 1985). Comparison of these results shows that 1) the six major JEV genes mapped thus far occur in the same order in YFV ...pre-M proteins and 3) the predicted structures of the E, NSI and ns2a proteins of JEV and YFV exhibit a high degree of relatedness. The E proteins
Cloud prediction of protein structure and function with PredictProtein for Debian.
Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Staniewski, Cedric; Rost, Burkhard
2013-01-01
We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome.
Cloud Prediction of Protein Structure and Function with PredictProtein for Debian
Kaján, László; Yachdav, Guy; Vicedo, Esmeralda; Steinegger, Martin; Mirdita, Milot; Angermüller, Christof; Böhm, Ariane; Domke, Simon; Ertl, Julia; Mertes, Christian; Reisinger, Eva; Rost, Burkhard
2013-01-01
We report the release of PredictProtein for the Debian operating system and derivatives, such as Ubuntu, Bio-Linux, and Cloud BioLinux. The PredictProtein suite is available as a standard set of open source Debian packages. The release covers the most popular prediction methods from the Rost Lab, including methods for the prediction of secondary structure and solvent accessibility (profphd), nuclear localization signals (predictnls), and intrinsically disordered regions (norsnet). We also present two case studies that successfully utilize PredictProtein packages for high performance computing in the cloud: the first analyzes protein disorder for whole organisms, and the second analyzes the effect of all possible single sequence variants in protein coding regions of the human genome. PMID:23971032
GPCRdb: an information system for G protein-coupled receptors
Isberg, Vignir; Mordalski, Stefan; Munk, Christian; Rataj, Krzysztof; Harpsøe, Kasper; Hauser, Alexander S.; Vroling, Bas; Bojarski, Andrzej J.; Vriend, Gert; Gloriam, David E.
2016-01-01
Recent developments in G protein-coupled receptor (GPCR) structural biology and pharmacology have greatly enhanced our knowledge of receptor structure-function relations, and have helped improve the scientific foundation for drug design studies. The GPCR database, GPCRdb, serves a dual role in disseminating and enabling new scientific developments by providing reference data, analysis tools and interactive diagrams. This paper highlights new features in the fifth major GPCRdb release: (i) GPCR crystal structure browsing, superposition and display of ligand interactions; (ii) direct deposition by users of point mutations and their effects on ligand binding; (iii) refined snake and helix box residue diagram looks; and (iii) phylogenetic trees with receptor classification colour schemes. Under the hood, the entire GPCRdb front- and back-ends have been re-coded within one infrastructure, ensuring a smooth browsing experience and development. GPCRdb is available at http://www.gpcrdb.org/ and it's open source code at https://bitbucket.org/gpcr/protwis. PMID:26582914
A retroviral oncogene, akt, encoding a serine-threonine kinase containing an SH2-like region.
Bellacosa, A; Testa, J R; Staal, S P; Tsichlis, P N
1991-10-11
The v-akt oncogene codes for a 105-kilodalton fusion phosphoprotein containing Gag sequences at its amino terminus. Sequence analysis of v-akt and biochemical characterization of its product revealed that it codes for a protein kinase C-related serine-threonine kinase whose cellular homolog is expressed in most tissues, with the highest amount found in thymus. Although Akt is a serine-threonine kinase, part of its regulatory region is similar to the Src homology-2 domain, a structural motif characteristic of cytoplasmic tyrosine kinases that functions in protein-protein interactions. This suggests that Akt may form a functional link between tyrosine and serine-threonine phosphorylation pathways.
Lery, Letícia M S; Bitar, Mainá; Costa, Mauricio G S; Rössle, Shaila C S; Bisch, Paulo M
2010-12-22
G. diazotrophicus and A. vinelandii are aerobic nitrogen-fixing bacteria. Although oxygen is essential for the survival of these organisms, it irreversibly inhibits nitrogenase, the complex responsible for nitrogen fixation. Both microorganisms deal with this paradox through compensatory mechanisms. In A. vinelandii a conformational protection mechanism occurs through the interaction between the nitrogenase complex and the FeSII protein. Previous studies suggested the existence of a similar system in G. diazotrophicus, but the putative protein involved was not yet described. This study intends to identify the protein coding gene in the recently sequenced genome of G. diazotrophicus and also provide detailed structural information of nitrogenase conformational protection in both organisms. Genomic analysis of G. diazotrophicus sequences revealed a protein coding ORF (Gdia0615) enclosing a conserved "fer2" domain, typical of the ferredoxin family and found in A. vinelandii FeSII. Comparative models of both FeSII and Gdia0615 disclosed a conserved beta-grasp fold. Cysteine residues that coordinate the 2[Fe-S] cluster are in conserved positions towards the metallocluster. Analysis of solvent accessible residues and electrostatic surfaces unveiled an hydrophobic dimerization interface. Dimers assembled by molecular docking presented a stable behaviour and a proper accommodation of regions possibly involved in binding of FeSII to nitrogenase throughout molecular dynamics simulations in aqueous solution. Molecular modeling of the nitrogenase complex of G. diazotrophicus was performed and models were compared to the crystal structure of A. vinelandii nitrogenase. Docking experiments of FeSII and Gdia0615 with its corresponding nitrogenase complex pointed out in both systems a putative binding site presenting shape and charge complementarities at the Fe-protein/MoFe-protein complex interface. The identification of the putative FeSII coding gene in G. diazotrophicus genome represents a large step towards the understanding of the conformational protection mechanism of nitrogenase against oxygen. In addition, this is the first study regarding the structural complementarities of FeSII-nitrogenase interactions in diazotrophic bacteria. The combination of bioinformatic tools for genome analysis, comparative protein modeling, docking calculations and molecular dynamics provided a powerful strategy for the elucidation of molecular mechanisms and structural features of FeSII-nitrogenase interaction.
Serum amyloid A1: Structure, function and gene polymorphism
Sun, Lei; Ye, Richard D.
2017-01-01
Inducible expression of serum amyloid A (SAA) is a hallmark of the acute-phase response, which is a conserved reaction of vertebrates to environmental challenges such as tissue injury, infection and surgery. Human SAA1 is encoded by one of the four SAA genes and is the best-characterized SAA protein. Initially known as a major precursor of amyloid A (AA), SAA1 has been found to play an important role in lipid metabolism and contributes to bacterial clearance, the regulation of inflammation and tumor pathogenesis. SAA1 has five polymorphic coding alleles (SAA1.1 – SAA1.5) that encode distinct proteins with minor amino acid substitutions. Single nucleotide polymorphism (SNP) has been identified in both the coding and non-coding regions of human SAA1. Despite high levels of sequence homology among these variants, SAA1 polymorphisms have been reported as risk factors of cardiovascular diseases and several types of cancer. A recently solved crystal structure of SAA1.1 reveals a hexameric bundle with each of the SAA1 subunits assuming a 4-helix structure stabilized by the C-terminal tail. Analysis of the native SAA1.1 structure has led to the identification of a competing site for high-density lipoprotein (HDL) and heparin, thus providing the structural basis for a role of heparin and heparan sulfate in the conversion of SAA1 to AA. In this brief review, we compares human SAA1 with other forms of human and mouse SAAs, and discuss how structural and genetic studies of SAA1 have advanced our understanding of the physiological functions of the SAA proteins. PMID:26945629
Origin of sphinx, a young chimeric RNA gene in Drosophila melanogaster
Wang, Wen; Brunet, Frédéric G.; Nevo, Eviatar; Long, Manyuan
2002-01-01
Non-protein-coding RNA genes play an important role in various biological processes. How new RNA genes originated and whether this process is controlled by similar evolutionary mechanisms for the origin of protein-coding genes remains unclear. A young chimeric RNA gene that we term sphinx (spx) provides the first insight into the early stage of evolution of RNA genes. spx originated as an insertion of a retroposed sequence of the ATP synthase chain F gene at the cytological region 60DB since the divergence of Drosophila melanogaster from its sibling species 2–3 million years ago. This retrosequence, which is located at 102F on the fourth chromosome, recruited a nearby exon and intron, thereby evolving a chimeric gene structure. This molecular process suggests that the mechanism of exon shuffling, which can generate protein-coding genes, also plays a role in the origin of RNA genes. The subsequent evolutionary process of spx has been associated with a high nucleotide substitution rate, possibly driven by a continuous positive Darwinian selection for a novel function, as is shown in its sex- and development-specific alternative splicing. To test whether spx has adapted to different environments, we investigated its population genetic structure in the unique “Evolution Canyon” in Israel, revealing a similar haplotype structure in spx, and thus similar evolutionary forces operating on spx between environments. PMID:11904380
Shabalina, Svetlana A.; Ogurtsov, Aleksey Y.; Spiridonov, Nikolay A.; Koonin, Eugene V.
2014-01-01
Alternative splicing (AS), alternative transcription initiation (ATI) and alternative transcription termination (ATT) create the extraordinary complexity of transcriptomes and make key contributions to the structural and functional diversity of mammalian proteomes. Analysis of mammalian genomic and transcriptomic data shows that contrary to the traditional view, the joint contribution of ATI and ATT to the transcriptome and proteome diversity is quantitatively greater than the contribution of AS. Although the mean numbers of protein-coding constitutive and alternative nucleotides in gene loci are nearly identical, their distribution along the transcripts is highly non-uniform. On average, coding exons in the variable 5′ and 3′ transcript ends that are created by ATI and ATT contain approximately four times more alternative nucleotides than core protein-coding regions that diversify exclusively via AS. Short upstream exons that encompass alternative 5′-untranslated regions and N-termini of proteins evolve under strong nucleotide-level selection whereas in 3′-terminal exons that encode protein C-termini, protein-level selection is significantly stronger. The groups of genes that are subject to ATI and ATT show major differences in biological roles, expression and selection patterns. PMID:24792168
tRNA acceptor stem and anticodon bases form independent codes related to protein folding
Carter, Charles W.; Wolfenden, Richard
2015-01-01
Aminoacyl-tRNA synthetases recognize tRNA anticodon and 3′ acceptor stem bases. Synthetase Urzymes acylate cognate tRNAs even without anticodon-binding domains, in keeping with the possibility that acceptor stem recognition preceded anticodon recognition. Representing tRNA identity elements with two bits per base, we show that the anticodon encodes the hydrophobicity of each amino acid side-chain as represented by its water-to-cyclohexane distribution coefficient, and this relationship holds true over the entire temperature range of liquid water. The acceptor stem codes preferentially for the surface area or size of each side-chain, as represented by its vapor-to-cyclohexane distribution coefficient. These orthogonal experimental properties are both necessary to account satisfactorily for the exposed surface area of amino acids in folded proteins. Moreover, the acceptor stem codes correctly for β-branched and carboxylic acid side-chains, whereas the anticodon codes for a wider range of such properties, but not for size or β-branching. These and other results suggest that genetic coding of 3D protein structures evolved in distinct stages, based initially on the size of the amino acid and later on its compatibility with globular folding in water. PMID:26034281
Bioinformatics and variability in drug response: a protein structural perspective
Lahti, Jennifer L.; Tang, Grace W.; Capriotti, Emidio; Liu, Tianyun; Altman, Russ B.
2012-01-01
Marketed drugs frequently perform worse in clinical practice than in the clinical trials on which their approval is based. Many therapeutic compounds are ineffective for a large subpopulation of patients to whom they are prescribed; worse, a significant fraction of patients experience adverse effects more severe than anticipated. The unacceptable risk–benefit profile for many drugs mandates a paradigm shift towards personalized medicine. However, prior to adoption of patient-specific approaches, it is useful to understand the molecular details underlying variable drug response among diverse patient populations. Over the past decade, progress in structural genomics led to an explosion of available three-dimensional structures of drug target proteins while efforts in pharmacogenetics offered insights into polymorphisms correlated with differential therapeutic outcomes. Together these advances provide the opportunity to examine how altered protein structures arising from genetic differences affect protein–drug interactions and, ultimately, drug response. In this review, we first summarize structural characteristics of protein targets and common mechanisms of drug interactions. Next, we describe the impact of coding mutations on protein structures and drug response. Finally, we highlight tools for analysing protein structures and protein–drug interactions and discuss their application for understanding altered drug responses associated with protein structural variants. PMID:22552919
Li de La Sierra-Gallay, Ines; Collinet, Bruno; Graille, Marc; Quevillon-Cheruel, Sophie; Liger, Dominique; Minard, Philippe; Blondeau, Karine; Henckes, Gilles; Aufrère, Robert; Leulliot, Nicolas; Zhou, Cong-Zhao; Sorel, Isabelle; Ferrer, Jean-Luc; Poupon, Anne; Janin, Joël; van Tilbeurgh, Herman
2004-03-01
The protein product of the YGR205w gene of Saccharomyces cerevisiae was targeted as part of our yeast structural genomics project. YGR205w codes for a small (290 amino acids) protein with unknown structure and function. The only recognizable sequence feature is the presence of a Walker A motif (P loop) indicating a possible nucleotide binding/converting function. We determined the three-dimensional crystal structure of Se-methionine substituted protein using multiple anomalous diffraction. The structure revealed a well known mononucleotide fold and strong resemblance to the structure of small metabolite phosphorylating enzymes such as pantothenate and phosphoribulo kinase. Biochemical experiments show that YGR205w binds specifically ATP and, less tightly, ADP. The structure also revealed the presence of two bound sulphate ions, occupying opposite niches in a canyon that corresponds to the active site of the protein. One sulphate is bound to the P-loop in a position that corresponds to the position of beta-phosphate in mononucleotide protein ATP complex, suggesting the protein is indeed a kinase. The nature of the phosphate accepting substrate remains to be determined. Copyright 2004 Wiley-Liss, Inc.
An ambiguity principle for assigning protein structural domains.
Postic, Guillaume; Ghouzam, Yassine; Chebrek, Romain; Gelly, Jean-Christophe
2017-01-01
Ambiguity is the quality of being open to several interpretations. For an image, it arises when the contained elements can be delimited in two or more distinct ways, which may cause confusion. We postulate that it also applies to the analysis of protein three-dimensional structure, which consists in dividing the molecule into subunits called domains. Because different definitions of what constitutes a domain can be used to partition a given structure, the same protein may have different but equally valid domain annotations. However, knowledge and experience generally displace our ability to accept more than one way to decompose the structure of an object-in this case, a protein. This human bias in structure analysis is particularly harmful because it leads to ignoring potential avenues of research. We present an automated method capable of producing multiple alternative decompositions of protein structure (web server and source code available at www.dsimb.inserm.fr/sword/). Our innovative algorithm assigns structural domains through the hierarchical merging of protein units, which are evolutionarily preserved substructures that describe protein architecture at an intermediate level, between domain and secondary structure. To validate the use of these protein units for decomposing protein structures into domains, we set up an extensive benchmark made of expert annotations of structural domains and including state-of-the-art domain parsing algorithms. The relevance of our "multipartitioning" approach is shown through numerous examples of applications covering protein function, evolution, folding, and structure prediction. Finally, we introduce a measure for the structural ambiguity of protein molecules.
Bain, Peter A; Papanicolaou, Alexie; Kumar, Anupama
2015-01-01
Murray-Darling rainbowfish (Melanotaenia fluviatilis [Castelnau, 1878]; Atheriniformes: Melanotaeniidae) is a small-bodied teleost currently under development in Australasia as a test species for aquatic toxicological studies. To date, efforts towards the development of molecular biomarkers of contaminant exposure have been hindered by the lack of available sequence data. To address this, we sequenced messenger RNA from brain, liver and gonads of mature male and female fish and generated a high-quality draft transcriptome using a de novo assembly approach. 149,742 clusters of putative transcripts were obtained, encompassing 43,841 non-redundant protein-coding regions. Deduced amino acid sequences were annotated by functional inference based on similarity with sequences from manually curated protein sequence databases. The draft assembly contained protein-coding regions homologous to 95.7% of the complete cohort of predicted proteins from the taxonomically related species, Oryzias latipes (Japanese medaka). The mean length of rainbowfish protein-coding sequences relative to their medaka homologues was 92.1%, indicating that despite the limited number of tissues sampled a large proportion of the total expected number of protein-coding genes was captured in the study. Because of our interest in the effects of environmental contaminants on endocrine pathways, we manually curated subsets of coding regions for putative nuclear receptors and steroidogenic enzymes in the rainbowfish transcriptome, revealing 61 candidate nuclear receptors encompassing all known subfamilies, and 41 putative steroidogenic enzymes representing all major steroidogenic enzymes occurring in teleosts. The transcriptome presented here will be a valuable resource for researchers interested in biomarker development, protein structure and function, and contaminant-response genomics in Murray-Darling rainbowfish.
Quaternionic representation of the genetic code.
Carlevaro, C Manuel; Irastorza, Ramiro M; Vericat, Fernando
2016-03-01
A heuristic diagram of the evolution of the standard genetic code is presented. It incorporates, in a way that resembles the energy levels of an atom, the physical notion of broken symmetry and it is consistent with original ideas by Crick on the origin and evolution of the code as well as with the chronological order of appearance of the amino acids along the evolution as inferred from work that mixtures known experimental results with theoretical speculations. Suggested by the diagram we propose a Hamilton quaternions based mathematical representation of the code as it stands now-a-days. The central object in the description is a codon function that assigns to each amino acid an integer quaternion in such a way that the observed code degeneration is preserved. We emphasize the advantages of a quaternionic representation of amino acids taking as an example the folding of proteins. With this aim we propose an algorithm to go from the quaternions sequence to the protein three dimensional structure which can be compared with the corresponding experimental one stored at the Protein Data Bank. In our criterion the mathematical representation of the genetic code in terms of quaternions merits to be taken into account because it describes not only most of the known properties of the genetic code but also opens new perspectives that are mainly derived from the close relationship between quaternions and rotations. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Pressure Studies of Protein Dynamics.
1987-02-20
applicable ) Office of Naval Research ONR N00014-86-K-0270 kc. ADDRESS (City, State,and ZIP Code) 10. SOURCE OF FUNDING NUMBERS - PROGRAM PROJECT I TASK IWORK...Pressure Studies of Protein Dynamics 12. PERSONAL AUTHOR(S) Hans Frauenfelder and Robert D. Young 13a. TYPE OF REPORT |13b. TIME COVERED 114 DATE OF...relatioihbetween dynamic structure and function of protein protein dyna -bsey observing the phenomena induced by flash photolysis using near ultravfilet
Romero, Roberto; Tarca, Adi L; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S; Kalita, Cynthia A; Cai, Juan; Yeo, Lami; Lipovich, Leonard
2014-09-01
To identify differentially expressed long non-coding RNA (lncRNA) genes in human myometrium in women with spontaneous labor at term. Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n = 19) and women in spontaneous labor at term (n = 20). RNA was extracted and profiled using an Illumina® microarray platform. We have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. We identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an experimental method completely independent of the microarray analysis. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site, that lacked evolutionary conservation beyond primates. We provide, for the first time, evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term.
The primary structure of the Saccharomyces cerevisiae gene for 3-phosphoglycerate kinase.
Hitzeman, R A; Hagie, F E; Hayflick, J S; Chen, C Y; Seeburg, P H; Derynck, R
1982-01-01
The DNA sequence of the gene for the yeast glycolytic enzyme, 3-phosphoglycerate kinase (PGK), has been obtained by sequencing part of a 3.1 kbp HindIII fragment obtained from the yeast genome. The structural gene sequence corresponds to a reading frame of 1251 bp coding for 416 amino acids with no intervening DNA sequences. The amino acid sequence is approximately 65 percent homologous with human and horse PGK protein sequences and is in general agreement with the published protein sequence for yeast PGK. As for other highly expressed structural genes in yeast, the coding sequence is highly codon biased with 95 percent of the amino acids coded for by a select 25 codons (out of 61 possible). Besides structural DNA sequence, 291 bp of 5'-flanking sequence and 286 bp of 3'-flanking sequence were determined. Transcription starts 36 nucleotides upstream from the translational start and stops 86-93 nucleotides downstream from the translational stop. These results suggest a non-polyadenylated mRNA length of 1373 to 1380 nucleotides, which is consistent with the observed length of 1500 nucleotides for polyadenylated PGK mRNA. A sequence TATATATAAA is found at 145 nucleotides upstream from the translational start. This sequence resembles the TATAAA box that is possibly associated with RNA polymerase II binding. Images PMID:6296791
A 'periodic table' for protein structures.
Taylor, William R
2002-04-11
Current structural genomics programs aim systematically to determine the structures of all proteins coded in both human and other genomes, providing a complete picture of the number and variety of protein structures that exist. In the past, estimates have been made on the basis of the incomplete sample of structures currently known. These estimates have varied greatly (between 1,000 and 10,000; see for example refs 1 and 2), partly because of limited sample size but also owing to the difficulties of distinguishing one structure from another. This distinction is usually topological, based on the fold of the protein; however, in strict topological terms (neglecting to consider intra-chain cross-links), protein chains are open strings and hence are all identical. To avoid this trivial result, topologies are determined by considering secondary links in the form of intra-chain hydrogen bonds (secondary structure) and tertiary links formed by the packing of secondary structures. However, small additions to or loss of structure can make large changes to these perceived topologies and such subjective solutions are neither robust nor amenable to automation. Here I formalize both secondary and tertiary links to allow the rigorous and automatic definition of protein topology.
Possenti, Andrea; Vendruscolo, Michele; Camilloni, Carlo; Tiana, Guido
2018-05-23
Proteins employ the information stored in the genetic code and translated into their sequences to carry out well-defined functions in the cellular environment. The possibility to encode for such functions is controlled by the balance between the amount of information supplied by the sequence and that left after that the protein has folded into its structure. We study the amount of information necessary to specify the protein structure, providing an estimate that keeps into account the thermodynamic properties of protein folding. We thus show that the information remaining in the protein sequence after encoding for its structure (the 'information gap') is very close to what needed to encode for its function and interactions. Then, by predicting the information gap directly from the protein sequence, we show that it may be possible to use these insights from information theory to discriminate between ordered and disordered proteins, to identify unknown functions, and to optimize artificially-designed protein sequences. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.
2012-01-01
Background Pseudoscorpions are chelicerates and have historically been viewed as being most closely related to solifuges, harvestmen, and scorpions. No mitochondrial genomes of pseudoscorpions have been published, but the mitochondrial genomes of some lineages of Chelicerata possess unusual features, including short rRNA genes and tRNA genes that lack sequence to encode arms of the canonical cloverleaf-shaped tRNA. Additionally, some chelicerates possess an atypical guanine-thymine nucleotide bias on the major coding strand of their mitochondrial genomes. Results We sequenced the mitochondrial genomes of two divergent taxa from the chelicerate order Pseudoscorpiones. We find that these genomes possess unusually short tRNA genes that do not encode cloverleaf-shaped tRNA structures. Indeed, in one genome, all 22 tRNA genes lack sequence to encode canonical cloverleaf structures. We also find that the large ribosomal RNA genes are substantially shorter than those of most arthropods. We inferred secondary structures of the LSU rRNAs from both pseudoscorpions, and find that they have lost multiple helices. Based on comparisons with the crystal structure of the bacterial ribosome, two of these helices were likely contact points with tRNA T-arms or D-arms as they pass through the ribosome during protein synthesis. The mitochondrial gene arrangements of both pseudoscorpions differ from the ancestral chelicerate gene arrangement. One genome is rearranged with respect to the location of protein-coding genes, the small rRNA gene, and at least 8 tRNA genes. The other genome contains 6 tRNA genes in novel locations. Most chelicerates with rearranged mitochondrial genes show a genome-wide reversal of the CA nucleotide bias typical for arthropods on their major coding strand, and instead possess a GT bias. Yet despite their extensive rearrangement, these pseudoscorpion mitochondrial genomes possess a CA bias on the major coding strand. Phylogenetic analyses of all 13 mitochondrial protein-coding gene sequences consistently yield trees that place pseudoscorpions as sister to acariform mites. Conclusion The well-supported phylogenetic placement of pseudoscorpions as sister to Acariformes differs from some previous analyses based on morphology. However, these two lineages share multiple molecular evolutionary traits, including substantial mitochondrial genome rearrangements, extensive nucleotide substitution, and loss of helices in their inferred tRNA and rRNA structures. PMID:22409411
Fragger: a protein fragment picker for structural queries.
Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J
2017-01-01
Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.
@TOME-2: a new pipeline for comparative modeling of protein-ligand complexes.
Pons, Jean-Luc; Labesse, Gilles
2009-07-01
@TOME 2.0 is new web pipeline dedicated to protein structure modeling and small ligand docking based on comparative analyses. @TOME 2.0 allows fold recognition, template selection, structural alignment editing, structure comparisons, 3D-model building and evaluation. These tasks are routinely used in sequence analyses for structure prediction. In our pipeline the necessary software is efficiently interconnected in an original manner to accelerate all the processes. Furthermore, we have also connected comparative docking of small ligands that is performed using protein-protein superposition. The input is a simple protein sequence in one-letter code with no comment. The resulting 3D model, protein-ligand complexes and structural alignments can be visualized through dedicated Web interfaces or can be downloaded for further studies. These original features will aid in the functional annotation of proteins and the selection of templates for molecular modeling and virtual screening. Several examples are described to highlight some of the new functionalities provided by this pipeline. The server and its documentation are freely available at http://abcis.cbs.cnrs.fr/AT2/
RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336”
Kumar, Ranjit; Lawrence, Mark L.; Watt, James; Cooksey, Amanda M.; Burgess, Shane C.; Nanduri, Bindu
2012-01-01
Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify “novel” genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method. The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations. PMID:22276113
RNA-seq based transcriptional map of bovine respiratory disease pathogen "Histophilus somni 2336".
Kumar, Ranjit; Lawrence, Mark L; Watt, James; Cooksey, Amanda M; Burgess, Shane C; Nanduri, Bindu
2012-01-01
Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify "novel" genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method.The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations.
Schaeffer, E; Sninsky, J J
1984-01-01
Proteins that are related evolutionarily may have diverged at the level of primary amino acid sequence while maintaining similar secondary structures. Computer analysis has been used to compare the open reading frames of the hepatitis B virus to those of the woodchuck hepatitis virus at the level of amino acid sequence, and to predict the relative hydrophilic character and the secondary structure of putative polypeptides. Similarity is seen at the levels of relative hydrophilicity and secondary structure, in the absence of sequence homology. These data reinforce the proposal that these open reading frames encode viral proteins. Computer analysis of this type can be more generally used to establish structural similarities between proteins that do not share obvious sequence homology as well as to assess whether an open reading frame is fortuitous or codes for a protein. PMID:6585835
2015-10-26
amount of the product J and the unreacted N/C in the same lane. Figure 2. Crystal structure of Tip1-Tip1lig ( PDB code: 3IDW). (A...stability (able to retain its trimeric quaternary structure in solutions after boiling in SDS buffer). We reasoned that its very strong...109, 10 is a flexible polyanionic linker and was incorporated as the midblock for water retention. Mixing of the two protein block copolymers
Thermodynamic Effects of Noncoded and Coded Methionine Substitutions in Calmodulin
Yamniuk, Aaron P.; Ishida, Hiroaki; Lippert, Dustin; Vogel, Hans J.
2009-01-01
The methionine residues in the calcium (Ca2+) regulatory protein calmodulin (CaM) are structurally and functionally important. They are buried within the N- and C-domains of apo-CaM but become solvent-exposed in Ca2+-CaM, where they interact with numerous target proteins. Previous structural studies have shown that methionine substitutions to the noncoded amino acids selenomethionine, ethionine, or norleucine, or mutation to leucine do not impact the main chain structure of CaM. Here we used differential scanning calorimetry to show that these substitutions enhance the stability of both domains, with the largest increase in melting temperature (19–26°C) achieved with leucine or norleucine in the apo-C-domain. Nuclear magnetic resonance spectroscopy experiments also revealed the loss of a slow conformational exchange process in the Leu-substituted apo-C-domain. In addition, isothermal titration calorimetry experiments revealed considerable changes in the enthalpy and entropy of target binding to apo-CaM and Ca2+-CaM, but the free energy of binding was largely unaffected due to enthalpy-entropy compensation. Collectively, these results demonstrate that noncoded and coded methionine substitutions can be accommodated in CaM because of the structural plasticity of the protein. However, adjustments in side-chain packing and dynamics lead to significant differences in protein stability and the thermodynamics of target binding. PMID:19217866
Feng, X; Happ, G M
1996-11-14
The cDNA for Sp23, a structural protein of the spermatophore of Tenebrio molitor, had been previously cloned and characterized (Paesen, G.C., Schwartz, M.B., Peferoen, M., Weyda, F. and Happ, G.M. (1992a) Amino acid sequence of Sp23, a structure protein of the spermatophore of the mealworm beetle, Tenebrio molitor. J. Biol. Chem. 257, 18852-18857). Using the labeled cDNA for Sp23 as a probe to screen a library of genomic DNA from Tenebrio molitor, we isolated a genomic clone for Sp23. A 5373-base pair (bp) restriction fragment containing the Sp23 gene was sequenced. The coding region is separated by a 55-bp intron which is located close to the translation start site. Three putative ecdysone response elements (EcRE) are identified in the 5' flanking region of the Sp23 gene. Comparison of the flanking regions of the Sp23 gene with those of the D-protein gene expressed in the accessory glands of Tenebrio reveals similar sequences present in the flanking regions of the two genes. The genomic organization of the coding region of the Sp23 gene shares similarities with that of the D-protein gene, three Drosophila accessory gland genes and two Drosophila 20-OH ecdysone-responsive genes.
Question 6: coevolution theory of the genetic code: a proven theory.
Wong, Jeffrey Tze-Fei
2007-10-01
The coevolution theory proposes that primordial proteins consisted only of those amino acids readily obtainable from the prebiotic environment, representing about half the twenty encoded amino acids of today, and the missing amino acids entered the system as the code expanded along with pathways of amino acid biosynthesis. The isolation of genetic code mutants, and the antiquity of pretran synthesis revealed by the comparative genomics of tRNAs and aminoacyl-tRNA synthetases, have combined to provide a rigorous proof of the four fundamental tenets of the theory, thus solving the riddle of the structure of the universal genetic code.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xiang, Dao Feng; Patskovsky, Yury; Xu, Chengfu
2010-12-07
Two uncharacterized enzymes from the amidohydrolase superfamily belonging to cog1228 were cloned, expressed, and purified to homogeneity. The two proteins, Sgx9260c (gi|44242006) and Sgx9260b (gi|44479596), were derived from environmental DNA samples originating from the Sargasso Sea. The catalytic function and substrate profiles for Sgx9260c and Sgx9260b were determined using a comprehensive library of dipeptides and N-acyl derivative of L-amino acids. Sgx9260c catalyzes the hydrolysis of Gly-L-Pro, L-Ala-L-Pro, and N-acyl derivatives of L-Pro. The best substrate identified to date is N-acetyl-L-Pro with a value of k{sub cat}/K{sub m} of 3 x 10{sup 5} M{sup -1} s{sup -1}. Sgx9260b catalyzes the hydrolysismore » of L-hydrophobic L-Pro dipeptides and N-acyl derivatives of L-Pro. The best substrate identified to date is N-propionyl-L-Pro with a value of k{sub cat}/K{sub m} of 1 x 10{sup 5} M{sup -1} s{sup -1}. Three-dimensional structures of both proteins were determined by X-ray diffraction methods (PDB codes 3MKV and 3FEQ). These proteins fold as distorted ({beta}/{alpha})8-barrels with two divalent cations in the active site. The structure of Sgx9260c was also determined as a complex with the N-methylphosphonate derivative of L-Pro (PDB code 3N2C). In this structure the phosphonate moiety bridges the binuclear metal center, and one oxygen atom interacts with His-140. The {alpha}-carboxylate of the inhibitor interacts with Tyr-231. The proline side chain occupies a small substrate binding cavity formed by residues contributed from the loop that follows {beta}-strand 7 within the ({beta}/{alpha})8-barrel. A total of 38 other proteins from cog1228 are predicted to have the same substrate profile based on conservation of the substrate binding residues. The structure of an evolutionarily related protein, Cc2672 from Caulobacter crecentus, was determined as a complex with the N-methylphosphonate derivative of L-arginine (PDB code 3MTW).« less
An ambiguity principle for assigning protein structural domains
Postic, Guillaume; Ghouzam, Yassine; Chebrek, Romain; Gelly, Jean-Christophe
2017-01-01
Ambiguity is the quality of being open to several interpretations. For an image, it arises when the contained elements can be delimited in two or more distinct ways, which may cause confusion. We postulate that it also applies to the analysis of protein three-dimensional structure, which consists in dividing the molecule into subunits called domains. Because different definitions of what constitutes a domain can be used to partition a given structure, the same protein may have different but equally valid domain annotations. However, knowledge and experience generally displace our ability to accept more than one way to decompose the structure of an object—in this case, a protein. This human bias in structure analysis is particularly harmful because it leads to ignoring potential avenues of research. We present an automated method capable of producing multiple alternative decompositions of protein structure (web server and source code available at www.dsimb.inserm.fr/sword/). Our innovative algorithm assigns structural domains through the hierarchical merging of protein units, which are evolutionarily preserved substructures that describe protein architecture at an intermediate level, between domain and secondary structure. To validate the use of these protein units for decomposing protein structures into domains, we set up an extensive benchmark made of expert annotations of structural domains and including state-of-the-art domain parsing algorithms. The relevance of our “multipartitioning” approach is shown through numerous examples of applications covering protein function, evolution, folding, and structure prediction. Finally, we introduce a measure for the structural ambiguity of protein molecules. PMID:28097215
Cipriano, Andrea; Ballarino, Monica
2018-01-01
The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as “junk” DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs), which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins) endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years. PMID:29560353
Mixture models for protein structure ensembles.
Hirsch, Michael; Habeck, Michael
2008-10-01
Protein structure ensembles provide important insight into the dynamics and function of a protein and contain information that is not captured with a single static structure. However, it is not clear a priori to what extent the variability within an ensemble is caused by internal structural changes. Additional variability results from overall translations and rotations of the molecule. And most experimental data do not provide information to relate the structures to a common reference frame. To report meaningful values of intrinsic dynamics, structural precision, conformational entropy, etc., it is therefore important to disentangle local from global conformational heterogeneity. We consider the task of disentangling local from global heterogeneity as an inference problem. We use probabilistic methods to infer from the protein ensemble missing information on reference frames and stable conformational sub-states. To this end, we model a protein ensemble as a mixture of Gaussian probability distributions of either entire conformations or structural segments. We learn these models from a protein ensemble using the expectation-maximization algorithm. Our first model can be used to find multiple conformers in a structure ensemble. The second model partitions the protein chain into locally stable structural segments or core elements and less structured regions typically found in loops. Both models are simple to implement and contain only a single free parameter: the number of conformers or structural segments. Our models can be used to analyse experimental ensembles, molecular dynamics trajectories and conformational change in proteins. The Python source code for protein ensemble analysis is available from the authors upon request.
Transcripts with in silico predicted RNA structure are enriched everywhere in the mouse brain
2012-01-01
Background Post-transcriptional control of gene expression is mostly conducted by specific elements in untranslated regions (UTRs) of mRNAs, in collaboration with specific binding proteins and RNAs. In several well characterized cases, these RNA elements are known to form stable secondary structures. RNA secondary structures also may have major functional implications for long noncoding RNAs (lncRNAs). Recent transcriptional data has indicated the importance of lncRNAs in brain development and function. However, no methodical efforts to investigate this have been undertaken. Here, we aim to systematically analyze the potential for RNA structure in brain-expressed transcripts. Results By comprehensive spatial expression analysis of the adult mouse in situ hybridization data of the Allen Mouse Brain Atlas, we show that transcripts (coding as well as non-coding) associated with in silico predicted structured probes are highly and significantly enriched in almost all analyzed brain regions. Functional implications of these RNA structures and their role in the brain are discussed in detail along with specific examples. We observe that mRNAs with a structure prediction in their UTRs are enriched for binding, transport and localization gene ontology categories. In addition, after manual examination we observe agreement between RNA binding protein interaction sites near the 3’ UTR structures and correlated expression patterns. Conclusions Our results show a potential use for RNA structures in expressed coding as well as noncoding transcripts in the adult mouse brain, and describe the role of structured RNAs in the context of intracellular signaling pathways and regulatory networks. Based on this data we hypothesize that RNA structure is widely involved in transcriptional and translational regulatory mechanisms in the brain and ultimately plays a role in brain function. PMID:22651826
Structural Relationships Between Minor and Major Proteins of Hepatitis B Surface Antigen
Stibbe, Werner; Gerlich, Wolfram H.
1983-01-01
The minor glycoproteins from hepatitis B surface antigen, GP33 and GP36, contain at their carboxy-terminal part the sequence of the major protein P24. They have 55 additional amino acids at the amino-terminal part which are coded by the pre-S region of the viral DNA. Images PMID:6842680
Origin and evolution of the long non-coding genes in the X-inactivation center.
Romito, Antonio; Rougeulle, Claire
2011-11-01
Random X chromosome inactivation (XCI), the eutherian mechanism of X-linked gene dosage compensation, is controlled by a cis-acting locus termed the X-inactivation center (Xic). One of the striking features that characterize the Xic landscape is the abundance of loci transcribing non-coding RNAs (ncRNAs), including Xist, the master regulator of the inactivation process. Recent comparative genomic analyses have depicted the evolutionary scenario behind the origin of the X-inactivation center, revealing that this locus evolved from a region harboring protein-coding genes. During mammalian radiation, this ancestral protein-coding region was disrupted in the marsupial group, whilst it provided in eutherian lineage the starting material for the non-translated RNAs of the X-inactivation center. The emergence of non-coding genes occurred by a dual mechanism involving loss of protein-coding function of the pre-existing genes and integration of different classes of mobile elements, some of which modeled the structure and sequence of the non-coding genes in a species-specific manner. The rising genes started to produce transcripts that acquired function in regulating the epigenetic status of the X chromosome, as shown for Xist, its antisense Tsix, Jpx, and recently suggested for Ftx. Thus, the appearance of the Xic, which occurred after the divergence between eutherians and marsupials, was the basis for the evolution of random X inactivation as a strategy to achieve dosage compensation. Copyright © 2011. Published by Elsevier Masson SAS.
CONFOLD2: improved contact-driven ab initio protein structure modeling.
Adhikari, Badri; Cheng, Jianlin
2018-01-25
Contact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted contacts need to be developed. We develop an improved contact-driven protein modelling method, CONFOLD2, and study how it may be effectively used for ab initio protein structure prediction with predicted contacts as input. It builds models using various subsets of input contacts to explore the fold space under the guidance of a soft square energy function, and then clusters the models to obtain the top five models. CONFOLD2 obtains an average reconstruction accuracy of 0.57 TM-score for the 150 proteins in the PSICOV contact prediction dataset. When benchmarked on the CASP11 contacts predicted using CONSIP2 and CASP12 contacts predicted using Raptor-X, CONFOLD2 achieves a mean TM-score of 0.41 on both datasets. CONFOLD2 allows to quickly generate top five structural models for a protein sequence when its secondary structures and contacts predictions at hand. The source code of CONFOLD2 is publicly available at https://github.com/multicom-toolbox/CONFOLD2/ .
Fey, G; Lewis, J B; Grodzicker, T; Bothwell, A
1979-01-01
The adenovirus type 2-simian virus 40 (SV40) hybrid virus Ad2+ND1 dp2 (E. Lukanidin, manuscript in preparation) specified two proteins (molecular weights, 24,000 and 23,000) that are, in part, products of an insertion of SV40 early DNA sequences. This was demonstrated by translation in vitro from viral mRNA that had been selected by hybridization to SV40 DNA. These two phosphorylated, nonvirion proteins were produced late in infection in amounts similar to adenovirus 2 structural proteins and were closely related to each other in tryptic peptide composition. The portion of SV40 DNA (map units 0.17 to 0.22 on the SV40 genome) coding for these proteins was joined to sequences coding for the amino-terminal part of the adenovirus type 2 structural protein IV (fiber). The Ad2+ND1 dp2 23,000- and 24,000-molecular-weight proteins were hybrid polypeptides, with about two-thirds of their tryptic peptides contributed by the fiber protein and the remainder contributed by SV40 T-antigen. They shared with T-antigen (molecular weight, 96,000) a carboxy-terminal proline-rich tryptic peptide. Together, the tryptic peptide composition of these proteins and the known SV40 DNA sequences suggested the reading frame for the translation of T-antigen. The carboxy terminus for T-anigen would then be located on the SV40 genome map next to the TAA terminator triplet at position 0.175, 910 bases away from the cleavage site of the restriction endonuclease EcoRI. Seven host range mutants from Ad2+ND1 dp2 were isolated that had lost the capacity to propagate on monkey cells. They did not induce detectable levels of the hybrid proteins. Three of these mutants had lost the SV40 DNA insertion that codes in part for these proteins. Thus, in analogy to the Ad2+ND1 30,000-molecular-weight protein, the presence of these proteins correlates with the presence of the helper function for adenovirus replication on monkey cells. Images PMID:225516
Recent advances in Hepatitis E virus.
Meng, X J
2010-03-01
Hepatitis E virus (HEV), the causative agent of hepatitis E, belongs to the family Hepeviridae. At least four major genotypes of HEV have been recognized: genotypes 1 and 2 are restricted to humans and associated with epidemics in developing countries, whereas genotypes 3 and 4 are zoonotic and infect humans and several other animals in both developing and industrialized countries. Besides humans, strains of HEV have been genetically identified from swine, chickens, sika deer, mongeese, and rabbits. The genome of HEV consists of three open reading frames (ORFs): ORF1 codes for nonstructural proteins, ORF2 codes for capsid protein, and ORF3 codes for a small multifunctional protein. The ORF2 and ORF3 proteins are translated from a single bicistronic mRNA and overlap each other but neither overlaps ORF1. The recent determination of the 3D crystal structure of the HEV capsid protein should facilitate the development of vaccines and antivirals. The identification and characterization of animal strains of HEV from pigs and chickens and the demonstrated ability of cross-species infection by swine HEV raise public health concerns for zoonosis. Accumulating evidence indicated that hepatitis E is a zoonotic disease and pigs and more likely other animal species are reservoirs for HEV. This article provides an overview of the recent advances in hepatitis E and its causative agent, including nomenclature and genomic organization, gene expression and functions, 3D structure of the virions, changing perspectives on higher mortality during pregnancy and chronic hepatitis E, animal reservoirs, zoonotic risk, food safety, and novel animal models.
Genetic code expansion for multiprotein complex engineering.
Koehler, Christine; Sauter, Paul F; Wawryszyn, Mirella; Girona, Gemma Estrada; Gupta, Kapil; Landry, Jonathan J M; Fritz, Markus Hsi-Yang; Radic, Ksenija; Hoffmann, Jan-Erik; Chen, Zhuo A; Zou, Juan; Tan, Piau Siong; Galik, Bence; Junttila, Sini; Stolt-Bergner, Peggy; Pruneri, Giancarlo; Gyenesei, Attila; Schultz, Carsten; Biskup, Moritz Bosse; Besir, Hueseyin; Benes, Vladimir; Rappsilber, Juri; Jechlinger, Martin; Korbel, Jan O; Berger, Imre; Braese, Stefan; Lemke, Edward A
2016-12-01
We present a baculovirus-based protein engineering method that enables site-specific introduction of unique functionalities in a eukaryotic protein complex recombinantly produced in insect cells. We demonstrate the versatility of this efficient and robust protein production platform, 'MultiBacTAG', (i) for the fluorescent labeling of target proteins and biologics using click chemistries, (ii) for glycoengineering of antibodies, and (iii) for structure-function studies of novel eukaryotic complexes using single-molecule Förster resonance energy transfer as well as site-specific crosslinking strategies.
From Genomes to Protein Models and Back
NASA Astrophysics Data System (ADS)
Tramontano, Anna; Giorgetti, Alejandro; Orsini, Massimiliano; Raimondo, Domenico
2007-12-01
The alternative splicing mechanism allows genes to generate more than one product. When the splicing events occur within protein coding regions they can modify the biological function of the protein. Alternative splicing has been suggested as one way for explaining the discrepancy between the number of human genes and functional complexity. We analysed the putative structure of the alternatively spliced gene products annotated in the ENCODE pilot project and discovered that many of the potential alternative gene products will be unlikely to produce stable functional proteins.
Nucleic acids encoding plant glutamine phenylpyruvate transaminase (GPT) and uses thereof
Unkefer, Pat J.; Anderson, Penelope S.; Knight, Thomas J.
2016-03-29
Glutamine phenylpyruvate transaminase (GPT) proteins, nucleic acid molecules encoding GPT proteins, and uses thereof are disclosed. Provided herein are various GPT proteins and GPT gene coding sequences isolated from a number of plant species. As disclosed herein, GPT proteins share remarkable structural similarity within plant species, and are active in catalyzing the synthesis of 2-hydroxy-5-oxoproline (2-oxoglutaramate), a powerful signal metabolite which regulates the function of a large number of genes involved in the photosynthesis apparatus, carbon fixation and nitrogen metabolism.
Methylation of miRNA genes and oncogenesis.
Loginov, V I; Rykov, S V; Fridman, M V; Braga, E A
2015-02-01
Interaction between microRNA (miRNA) and messenger RNA of target genes at the posttranscriptional level provides fine-tuned dynamic regulation of cell signaling pathways. Each miRNA can be involved in regulating hundreds of protein-coding genes, and, conversely, a number of different miRNAs usually target a structural gene. Epigenetic gene inactivation associated with methylation of promoter CpG-islands is common to both protein-coding genes and miRNA genes. Here, data on functions of miRNAs in development of tumor-cell phenotype are reviewed. Genomic organization of promoter CpG-islands of the miRNA genes located in inter- and intragenic areas is discussed. The literature and our own results on frequency of CpG-island methylation in miRNA genes from tumors are summarized, and data regarding a link between such modification and changed activity of miRNA genes and, consequently, protein-coding target genes are presented. Moreover, the impact of miRNA gene methylation on key oncogenetic processes as well as affected signaling pathways is discussed.
Dimitrieva, Slavica; Anisimova, Maria
2014-01-01
In protein-coding genes, synonymous mutations are often thought not to affect fitness and therefore are not subject to natural selection. Yet increasingly, cases of non-neutral evolution at certain synonymous sites were reported over the last decade. To evaluate the extent and the nature of site-specific selection on synonymous codons, we computed the site-to-site synonymous rate variation (SRV) and identified gene properties that make SRV more likely in a large database of protein-coding gene families and protein domains. To our knowledge, this is the first study that explores the determinants and patterns of the SRV in real data. We show that the SRV is widespread in the evolution of protein-coding sequences, putting in doubt the validity of the synonymous rate as a standard neutral proxy. While protein domains rarely undergo adaptive evolution, the SRV appears to play important role in optimizing the domain function at the level of DNA. In contrast, protein families are more likely to evolve by positive selection, but are less likely to exhibit SRV. Stronger SRV was detected in genes with stronger codon bias and tRNA reusage, those coding for proteins with larger number of interactions or forming larger number of structures, located in intracellular components and those involved in typically conserved complex processes and functions. Genes with extreme SRV show higher expression levels in nearly all tissues. This indicates that codon bias in a gene, which often correlates with gene expression, may often be a site-specific phenomenon regulating the speed of translation along the sequence, consistent with the co-translational folding hypothesis. Strikingly, genes with SRV were strongly overrepresented for metabolic pathways and those associated with several genetic diseases, particularly cancers and diabetes.
The TDP-43 N-terminal domain structure at high resolution.
Mompeán, Miguel; Romano, Valentina; Pantoja-Uceda, David; Stuani, Cristiana; Baralle, Francisco E; Buratti, Emanuele; Laurents, Douglas V
2016-04-01
Transactive response DNA-binding protein 43 kDa (TDP-43) is an RNA transporting and processing protein whose aberrant aggregates are implicated in neurodegenerative diseases. The C-terminal domain of this protein plays a key role in mediating this process. However, the N-terminal domain (residues 1-77) is needed to effectively recruit TDP-43 monomers into this aggregate. In the present study, we report, for the first time, the essentially complete (1) H, (15) N and (13) C NMR assignments and the structure of the N-terminal domain determined on the basis of 26 hydrogen-bond, 60 torsion angle and 1058 unambiguous NOE structural restraints. The structure consists of an α-helix and six β-strands. Two β-strands form a β-hairpin not seen in the ubiquitin fold. All Pro residues are in the trans conformer and the two Cys are reduced and distantly separated on the surface of the protein. The domain has a well defined hydrophobic core composed of F35, Y43, W68, Y73 and 17 aliphatic side chains. The fold is topologically similar to the reported structure of axin 1. The protein is stable and no denatured species are observed at pH 4 and 25 °C. At 4 kcal·mol(-1) , the conformational stability of the domain, as measured by hydrogen/deuterium exchange, is comparable to ubiquitin (6 kcal·mol(-1) ). The β-strands, α-helix, and three of four turns are generally rigid, although the loop formed by residues 47-53 is mobile, as determined by model-free analysis of the (15) N{(1) H}NOE, as well as the translational and transversal relaxation rates. Structural data have been deposited in the Protein Data Bank under accession code: 2n4p. The NMR assignments have been deposited in the BMRB database under access code: 25675. © 2016 Federation of European Biochemical Societies.
Structure of Lmaj006129AAA, a hypothetical protein from Leishmania major
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arakaki, Tracy; Le Trong, Isolde; Structural Genomics of Pathogenic Protozoa
2006-03-01
The crystal structure of a conserved hypothetical protein from L. major, Pfam sequence family PF04543, structural genomics target ID Lmaj006129AAA, has been determined at a resolution of 1.6 Å. The gene product of structural genomics target Lmaj006129 from Leishmania major codes for a 164-residue protein of unknown function. When SeMet expression of the full-length gene product failed, several truncation variants were created with the aid of Ginzu, a domain-prediction method. 11 truncations were selected for expression, purification and crystallization based upon secondary-structure elements and disorder. The structure of one of these variants, Lmaj006129AAH, was solved by multiple-wavelength anomalous diffraction (MAD)more » using ELVES, an automatic protein crystal structure-determination system. This model was then successfully used as a molecular-replacement probe for the parent full-length target, Lmaj006129AAA. The final structure of Lmaj006129AAA was refined to an R value of 0.185 (R{sub free} = 0.229) at 1.60 Å resolution. Structure and sequence comparisons based on Lmaj006129AAA suggest that proteins belonging to Pfam sequence families PF04543 and PF01878 may share a common ligand-binding motif.« less
The Ubiquitin–Proteasome System of Saccharomyces cerevisiae
Finley, Daniel; Ulrich, Helle D.; Sommer, Thomas; Kaiser, Peter
2012-01-01
Protein modifications provide cells with exquisite temporal and spatial control of protein function. Ubiquitin is among the most important modifiers, serving both to target hundreds of proteins for rapid degradation by the proteasome, and as a dynamic signaling agent that regulates the function of covalently bound proteins. The diverse effects of ubiquitylation reflect the assembly of structurally distinct ubiquitin chains on target proteins. The resulting ubiquitin code is interpreted by an extensive family of ubiquitin receptors. Here we review the components of this regulatory network and its effects throughout the cell. PMID:23028185
Conserved thioredoxin fold is present in Pisum sativum L. sieve element occlusion-1 protein
Umate, Pavan; Tuteja, Renu
2010-01-01
Homology-based three-dimensional model for Pisum sativum sieve element occlusion 1 (Ps.SEO1) (forisomes) protein was constructed. A stretch of amino acids (residues 320 to 456) which is well conserved in all known members of forisomes proteins was used to model the 3D structure of Ps.SEO1. The structural prediction was done using Protein Homology/analogY Recognition Engine (PHYRE) web server. Based on studies of local sequence alignment, the thioredoxin-fold containing protein [Structural Classification of Proteins (SCOP) code d1o73a_], a member of the glutathione peroxidase family was selected as a template for modeling the spatial structure of Ps.SEO1. Selection was based on comparison of primary sequence, higher match quality and alignment accuracy. Motif 1 (EVF) is conserved in Ps.SEO1, Vicia faba (Vf.For1) and Medicago truncatula (MT.SEO3); motif 2 (KKED) is well conserved across all forisomes proteins and motif 3 (IGYIGNP) is conserved in Ps.SEO1 and Vf.For1. PMID:20404566
Sequencing proteins with transverse ionic transport in nanochannels.
Boynton, Paul; Di Ventra, Massimiliano
2016-05-03
De novo protein sequencing is essential for understanding cellular processes that govern the function of living organisms and all sequence modifications that occur after a protein has been constructed from its corresponding DNA code. By obtaining the order of the amino acids that compose a given protein one can then determine both its secondary and tertiary structures through structure prediction, which is used to create models for protein aggregation diseases such as Alzheimer's Disease. Here, we propose a new technique for de novo protein sequencing that involves translocating a polypeptide through a synthetic nanochannel and measuring the ionic current of each amino acid through an intersecting perpendicular nanochannel. We find that the distribution of ionic currents for each of the 20 proteinogenic amino acids encoded by eukaryotic genes is statistically distinct, showing this technique's potential for de novo protein sequencing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kozbial, Piotr; Xu, Qingping; Chiu, Hsiu-Ju
2009-08-28
To extend the structural coverage of proteins with unknown functions, we targeted a novel protein family (Pfam accession number PF08807, DUF1798) for which we proposed and determined the structures of two representative members. The MW1337R gene of Staphylococcus aureus subsp. aureus Rosenbach (Wood 46) encodes a protein with a molecular weight of 13.8 kDa (residues 1-116) and a calculated isoelectric point of 5.15. The lin2004 gene of the nonspore-forming bacterium Listeria innocua Clip11262 encodes a protein with a molecular weight of 14.6 kDa (residues 1-121) and a calculated isoelectric point of 5.45. MW1337R and lin2004, as well as their homologs,more » which, so far, have been found only in Bacillus, Staphylococcus, Listeria, and related genera (Geobacillus, Exiguobacterium, and Oceanobacillus), have unknown functions and are annotated as hypothetical proteins. The genomic contexts of MW1337R and lin2004 are similar and conserved in related species. In prokaryotic genomes, most often, functionally interacting proteins are coded by genes, which are colocated in conserved operons. Proteins from the same operon as MW1337R and lin2004 either have unknown functions (i.e., belong to DUF1273, Pfam accession number PF06908) or are similar to ypsB from Bacillus subtilis. The function of ypsB is unclear, although it has a strong similarity to the N-terminal region of DivIVA, which was characterized as a bifunctional protein with distinct roles during vegetative growth and sporulation. In addition, members of the DUF1273 family display distant sequence similarity with the DprA/Smf protein, which acts downstream of the DNA uptake machinery, possibly in conjunction with RecA. The RecA activities in Bacillus subtilis are modulated by RecU Holliday-junction resolvase. In all analyzed cases, the gene coding for RecU is in the vicinity of MW1337R, lin2004, or their orthologs, but on a different operon located in the complementary DNA strand. Here, we report the crystal structures of MW1337R and lin2004, which were determined using the semiautomated, high-throughput pipeline of the Joint Center for Structural Genomics (JCSG), part of the National Institute of General Medical Sciences Protein Structure Initiative.« less
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition
Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina
2007-01-01
Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145
Firth, Andrew E; Atkins, John F
2009-01-01
Japanese encephalitis, West Nile, Usutu and Murray Valley encephalitis viruses form a tight subgroup within the larger Flavivirus genus. These viruses utilize a single-polyprotein expression strategy, resulting in ~10 mature proteins. Plotting the conservation at synonymous sites along the polyprotein coding sequence reveals strong conservation peaks at the very 5' end of the coding sequence, and also at the 5' end of the sequence encoding the NS2A protein. Such peaks are generally indicative of functionally important non-coding sequence elements. The second peak corresponds to a predicted stable pseudoknot structure whose biological importance is supported by compensatory mutations that preserve the structure. The pseudoknot is preceded by a conserved slippery heptanucleotide (Y CCU UUU), thus forming a classical stimulatory motif for -1 ribosomal frameshifting. We hypothesize, therefore, that the functional importance of the pseudoknot is to stimulate a portion of ribosomes to shift -1 nt into a short (45 codon), conserved, overlapping open reading frame, termed foo. Since cleavage at the NS1-NS2A boundary is known to require synthesis of NS2A in cis, the resulting transframe fusion protein is predicted to be NS1-NS2AN-term-FOO. We hypothesize that this may explain the origin of the previously identified NS1 'extension' protein in JEV-group flaviviruses, known as NS1'. PMID:19196463
Zhang, Xu; Wang, Fengshan; Sheng, Juzheng
2016-06-16
Heparan sulfate (HS) is widely distributed in mammalian tissues in the form of HS proteoglycans, which play essential roles in various physiological and pathological processes. In contrast to the template-guided processes involved in the synthesis of DNA and proteins, HS biosynthesis is not believed to involve a template. However, it appears that the final structure of HS chains was strictly regulated. Herein, we report research based hypothesis that two major steps, namely "coding" and "decoding" steps, are involved in the biosynthesis of HS, which strictly regulate its chemical structure and biological activity. The "coding" process in this context is based on the distribution of sulfate moieties on the amino groups of the glucosamine residues in the HS chains. The sulfation of these amine groups is catalyzed by N-deacetylase/N-sulfotransferase, which has four isozymes. The composition and distribution of sulfate groups and iduronic acid residues on the glycan chains of HS are determined by several other modification enzymes, which can recognize these coding sequences (i.e., the "decoding" process). The degree and pattern of the sulfation and epimerization in the HS chains determines the extent of their interactions with several different protein factors, which further influences their biological activity. Copyright © 2016 Elsevier Ltd. All rights reserved.
Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana.
Mayer, K; Schüller, C; Wambutt, R; Murphy, G; Volckaert, G; Pohl, T; Düsterhöft, A; Stiekema, W; Entian, K D; Terryn, N; Harris, B; Ansorge, W; Brandt, P; Grivell, L; Rieger, M; Weichselgartner, M; de Simone, V; Obermaier, B; Mache, R; Müller, M; Kreis, M; Delseny, M; Puigdomenech, P; Watson, M; Schmidtheini, T; Reichert, B; Portatelle, D; Perez-Alonso, M; Boutry, M; Bancroft, I; Vos, P; Hoheisel, J; Zimmermann, W; Wedler, H; Ridley, P; Langham, S A; McCullagh, B; Bilham, L; Robben, J; Van der Schueren, J; Grymonprez, B; Chuang, Y J; Vandenbussche, F; Braeken, M; Weltjens, I; Voet, M; Bastiaens, I; Aert, R; Defoor, E; Weitzenegger, T; Bothe, G; Ramsperger, U; Hilbert, H; Braun, M; Holzer, E; Brandt, A; Peters, S; van Staveren, M; Dirske, W; Mooijman, P; Klein Lankhorst, R; Rose, M; Hauf, J; Kötter, P; Berneiser, S; Hempel, S; Feldpausch, M; Lamberth, S; Van den Daele, H; De Keyser, A; Buysshaert, C; Gielen, J; Villarroel, R; De Clercq, R; Van Montagu, M; Rogers, J; Cronin, A; Quail, M; Bray-Allen, S; Clark, L; Doggett, J; Hall, S; Kay, M; Lennard, N; McLay, K; Mayes, R; Pettett, A; Rajandream, M A; Lyne, M; Benes, V; Rechmann, S; Borkova, D; Blöcker, H; Scharfe, M; Grimm, M; Löhnert, T H; Dose, S; de Haan, M; Maarse, A; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Fartmann, B; Granderath, K; Dauner, D; Herzl, A; Neumann, S; Argiriou, A; Vitale, D; Liguori, R; Piravandi, E; Massenet, O; Quigley, F; Clabauld, G; Mündlein, A; Felber, R; Schnabl, S; Hiller, R; Schmidt, W; Lecharny, A; Aubourg, S; Chefdor, F; Cooke, R; Berger, C; Montfort, A; Casacuberta, E; Gibbons, T; Weber, N; Vandenbol, M; Bargues, M; Terol, J; Torres, A; Perez-Perez, A; Purnelle, B; Bent, E; Johnson, S; Tacon, D; Jesse, T; Heijnen, L; Schwarz, S; Scholler, P; Heber, S; Francs, P; Bielke, C; Frishman, D; Haase, D; Lemcke, K; Mewes, H W; Stocker, S; Zaccaria, P; Bevan, M; Wilson, R K; de la Bastide, M; Habermann, K; Parnell, L; Dedhia, N; Gnoj, L; Schutz, K; Huang, E; Spiegel, L; Sehkon, M; Murray, J; Sheet, P; Cordes, M; Abu-Threideh, J; Stoneking, T; Kalicki, J; Graves, T; Harmon, G; Edwards, J; Latreille, P; Courtney, L; Cloud, J; Abbott, A; Scott, K; Johnson, D; Minx, P; Bentley, D; Fulton, B; Miller, N; Greco, T; Kemp, K; Kramer, J; Fulton, L; Mardis, E; Dante, M; Pepin, K; Hillier, L; Nelson, J; Spieth, J; Ryan, E; Andrews, S; Geisel, C; Layman, D; Du, H; Ali, J; Berghoff, A; Jones, K; Drone, K; Cotton, M; Joshu, C; Antonoiu, B; Zidanic, M; Strong, C; Sun, H; Lamar, B; Yordan, C; Ma, P; Zhong, J; Preston, R; Vil, D; Shekher, M; Matero, A; Shah, R; Swaby, I K; O'Shaughnessy, A; Rodriguez, M; Hoffmann, J; Till, S; Granat, S; Shohdy, N; Hasegawa, A; Hameed, A; Lodhi, M; Johnson, A; Chen, E; Marra, M; Martienssen, R; McCombie, W R
1999-12-16
The higher plant Arabidopsis thaliana (Arabidopsis) is an important model for identifying plant genes and determining their function. To assist biological investigations and to define chromosome structure, a coordinated effort to sequence the Arabidopsis genome was initiated in late 1996. Here we report one of the first milestones of this project, the sequence of chromosome 4. Analysis of 17.38 megabases of unique sequence, representing about 17% of the genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements. Heterochromatic regions surrounding the putative centromere, which has not yet been completely sequenced, are characterized by an increased frequency of a variety of repeats, new repeats, reduced recombination, lowered gene density and lowered gene expression. Roughly 60% of the predicted protein-coding genes have been functionally characterized on the basis of their homology to known genes. Many genes encode predicted proteins that are homologous to human and Caenorhabditis elegans proteins.
USDA-ARS?s Scientific Manuscript database
Molecular epidemiology and evolution of foot-and-mouth disease virus (FMDV) are widely studied using genomic sequences encoding VP1, the capsid protein containing the most relevant antigenic domains. Although sequencing of the full viral genome is not used as a routine diagnostic or surveillance too...
Hazes, Bart
2014-02-28
Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.
The eukaryotic genome is structurally and functionally more like a social insect colony than a book.
Qiu, Guo-Hua; Yang, Xiaoyan; Zheng, Xintian; Huang, Cuiqin
2017-11-01
Traditionally, the genome has been described as the 'book of life'. However, the metaphor of a book may not reflect the dynamic nature of the structure and function of the genome. In the eukaryotic genome, the number of centrally located protein-coding sequences is relatively constant across species, but the amount of noncoding DNA increases considerably with the increase of organismal evolutional complexity. Therefore, it has been hypothesized that the abundant peripheral noncoding DNA protects the genome and the central protein-coding sequences in the eukaryotic genome. Upon comparison with the habitation, sociality and defense mechanisms of a social insect colony, it is found that the genome is similar to a social insect colony in various aspects. A social insect colony may thus be a better metaphor than a book to describe the spatial organization and physical functions of the genome. The potential implications of the metaphor are also discussed.
Okamoto, Kiyoko; Ami, Yasushi; Suzaki, Yuriko; Otsuki, Noriyuki; Sakata, Masafumi; Takeda, Makoto; Mori, Yoshio
2016-04-01
The marker of Japanese domestic rubella vaccines is their lack of immunogenicity in guinea pigs. This has long been thought to be related to the temperature sensitivity of the viruses, but supporting evidence has not been described. In this study, we generated infectious clones of TO-336.vac, a Japanese domestic vaccine, TO-336.GMK5, the parental virus of TO-336.vac, and their mutants, and determined the molecular bases of their temperature sensitivity and immunogenicity in guinea pigs. The results revealed that Ser(1159) in the non-structural protein-coding region was responsible for the temperature sensitivity of TO-336.vac dominantly, while the structural protein-coding region affected the temperature sensitivity subordinately. The findings further suggested that the temperature sensitivity of TO-336.vac affected the antibody induction in guinea pigs after subcutaneous inoculation. Copyright © 2016 Elsevier Inc. All rights reserved.
Kobayashi, Shintaro; Yoshii, Kentaro; Hirano, Minato; Muto, Memi; Kariwa, Hiroaki
2017-02-01
Reverse genetics systems facilitate investigation of many aspects of the life cycle and pathogenesis of viruses. However, genetic instability in Escherichia coli has hampered development of a reverse genetics system for West Nile virus (WNV). In this study, we developed a novel reverse genetics system for WNV based on homologous recombination in mammalian cells. Introduction of the DNA fragment coding for the WNV structural protein together with a DNA-based replicon resulted in the release of infectious WNV. The growth rate and plaque size of the recombinant virus were almost identical to those of the parent WNV. Furthermore, chimeric WNV was produced by introducing the DNA fragment coding for the structural protein and replicon plasmid derived from various strains. Here, we report development of a novel system that will facilitate research into WNV infection. Copyright © 2016 Elsevier B.V. All rights reserved.
Pharmacophore screening of the protein data bank for specific binding site chemistry.
Campagna-Slater, Valérie; Arrowsmith, Andrew G; Zhao, Yong; Schapira, Matthieu
2010-03-22
A simple computational approach was developed to screen the Protein Data Bank (PDB) for putative pockets possessing a specific binding site chemistry and geometry. The method employs two commonly used 3D screening technologies, namely identification of cavities in protein structures and pharmacophore screening of chemical libraries. For each protein structure, a pocket finding algorithm is used to extract potential binding sites containing the correct types of residues, which are then stored in a large SDF-formatted virtual library; pharmacophore filters describing the desired binding site chemistry and geometry are then applied to screen this virtual library and identify pockets matching the specified structural chemistry. As an example, this approach was used to screen all human protein structures in the PDB and identify sites having chemistry similar to that of known methyl-lysine binding domains that recognize chromatin methylation marks. The selected genes include known readers of the histone code as well as novel binding pockets that may be involved in epigenetic signaling. Putative allosteric sites were identified on the structures of TP53BP1, L3MBTL3, CHEK1, KDM4A, and CREBBP.
Sadhukhan, Ratan; Chowdhury, Priyanka; Ghosh, Sourav; Ghosh, Utpal
2018-06-01
Telomere DNA can form specialized nucleoprotein structure with telomere-associated proteins to hide free DNA ends or G-quadruplex structures under certain conditions especially in presence of G-quadruplex ligand. Telomere DNA is transcribed to form non-coding telomere repeat-containing RNA (TERRA) whose biogenesis and function is poorly understood. Our aim was to find the role of telomere-associated proteins and telomere structures in TERRA transcription. We silenced four [two shelterin (TRF1, TRF2) and two non-shelterin (PARP-1, SLX4)] telomere-associated genes using siRNA and verified depletion in protein level. Knocking down of one gene modulated expression of other telomere-associated genes and increased TERRA from 10q, 15q, XpYp and XqYq chromosomes in A549 cells. Telomere was destabilized or damaged by G-quadruplex ligand pyridostatin (PDS) and bleomycin. Telomere dysfunction-induced foci (TIFs) were observed for each case of depletion of proteins, treatment with PDS or bleomycin. TERRA level was elevated by PDS and bleomycin treatment alone or in combination with depletion of telomere-associated proteins.
Ikonomidis, Alexandros; Grapsa, Anastasia; Pavlioglou, Charikleia; Demiri, Antonia; Batarli, Alexandra; Panopoulou, Maria
2016-12-01
Fifty-six Staphylococcus epidermidis clinical isolates, showing high-level linezolid resistance and causing bacteremia in critically ill patients, were studied. All isolates belonged to ST22 clone and carried the T2504A and C2534T mutations in gene coding for 23SrRNA as well as the C189A, G208A, C209T and G384C missense mutations in L3 protein which resulted in Asp159Tyr, Gly152Asp and Leu94Val substitutions. Other silent mutations were also detected in genes coding for ribosomal proteins L3 and L22. In silico analysis of missense mutations showed that although L3 protein retained the sequence of secondary motifs, the tertiary structure was influenced. The observed alteration in L3 protein folding provides an indication on the putative role of L3-coding gene mutations in high-level linezolid resistance. Furthermore, linezolid pressure in health care settings where linezolid consumption is of high rates might lead to the selection of resistant mutants possessing L3 mutations that might confer high-level linezolid resistance.
Hidden Markov model approach for identifying the modular framework of the protein backbone.
Camproux, A C; Tuffery, P; Chevrolat, J P; Boisvieux, J F; Hazout, S
1999-12-01
The hidden Markov model (HMM) was used to identify recurrent short 3D structural building blocks (SBBs) describing protein backbones, independently of any a priori knowledge. Polypeptide chains are decomposed into a series of short segments defined by their inter-alpha-carbon distances. Basically, the model takes into account the sequentiality of the observed segments and assumes that each one corresponds to one of several possible SBBs. Fitting the model to a database of non-redundant proteins allowed us to decode proteins in terms of 12 distinct SBBs with different roles in protein structure. Some SBBs correspond to classical regular secondary structures. Others correspond to a significant subdivision of their bounding regions previously considered to be a single pattern. The major contribution of the HMM is that this model implicitly takes into account the sequential connections between SBBs and thus describes the most probable pathways by which the blocks are connected to form the framework of the protein structures. Validation of the SBBs code was performed by extracting SBB series repeated in recoding proteins and examining their structural similarities. Preliminary results on the sequence specificity of SBBs suggest promising perspectives for the prediction of SBBs or series of SBBs from the protein sequences.
A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes.
Hezroni, Hadas; Ben-Tov Perry, Rotem; Meir, Zohar; Housman, Gali; Lubelsky, Yoav; Ulitsky, Igor
2017-08-30
Only a small portion of human long non-coding RNAs (lncRNAs) appear to be conserved outside of mammals, but the events underlying the birth of new lncRNAs in mammals remain largely unknown. One potential source is remnants of protein-coding genes that transitioned into lncRNAs. We systematically compare lncRNA and protein-coding loci across vertebrates, and estimate that up to 5% of conserved mammalian lncRNAs are derived from lost protein-coding genes. These lncRNAs have specific characteristics, such as broader expression domains, that set them apart from other lncRNAs. Fourteen lncRNAs have sequence similarity with the loci of the contemporary homologs of the lost protein-coding genes. We propose that selection acting on enhancer sequences is mostly responsible for retention of these regions. As an example of an RNA element from a protein-coding ancestor that was retained in the lncRNA, we describe in detail a short translated ORF in the JPX lncRNA that was derived from an upstream ORF in a protein-coding gene and retains some of its functionality. We estimate that ~ 55 annotated conserved human lncRNAs are derived from parts of ancestral protein-coding genes, and loss of coding potential is thus a non-negligible source of new lncRNAs. Some lncRNAs inherited regulatory elements influencing transcription and translation from their protein-coding ancestors and those elements can influence the expression breadth and functionality of these lncRNAs.
A probabilistic model for detecting rigid domains in protein structures.
Nguyen, Thach; Habeck, Michael
2016-09-01
Large-scale conformational changes in proteins are implicated in many important biological functions. These structural transitions can often be rationalized in terms of relative movements of rigid domains. There is a need for objective and automated methods that identify rigid domains in sets of protein structures showing alternative conformational states. We present a probabilistic model for detecting rigid-body movements in protein structures. Our model aims to approximate alternative conformational states by a few structural parts that are rigidly transformed under the action of a rotation and a translation. By using Bayesian inference and Markov chain Monte Carlo sampling, we estimate all parameters of the model, including a segmentation of the protein into rigid domains, the structures of the domains themselves, and the rigid transformations that generate the observed structures. We find that our Gibbs sampling algorithm can also estimate the optimal number of rigid domains with high efficiency and accuracy. We assess the power of our method on several thousand entries of the DynDom database and discuss applications to various complex biomolecular systems. The Python source code for protein ensemble analysis is available at: https://github.com/thachnguyen/motion_detection : mhabeck@gwdg.de. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Rewiring protein synthesis: From natural to synthetic amino acids.
Fan, Yongqiang; Evans, Christopher R; Ling, Jiqiang
2017-11-01
The protein synthesis machinery uses 22 natural amino acids as building blocks that faithfully decode the genetic information. Such fidelity is controlled at multiple steps and can be compromised in nature and in the laboratory to rewire protein synthesis with natural and synthetic amino acids. This review summarizes the major quality control mechanisms during protein synthesis, including aminoacyl-tRNA synthetases, elongation factors, and the ribosome. We will discuss evolution and engineering of such components that allow incorporation of natural and synthetic amino acids at positions that deviate from the standard genetic code. The protein synthesis machinery is highly selective, yet not fixed, for the correct amino acids that match the mRNA codons. Ambiguous translation of a codon with multiple amino acids or complete reassignment of a codon with a synthetic amino acid diversifies the proteome. Expanding the genetic code with synthetic amino acids through rewiring protein synthesis has broad applications in synthetic biology and chemical biology. Biochemical, structural, and genetic studies of the translational quality control mechanisms are not only crucial to understand the physiological role of translational fidelity and evolution of the genetic code, but also enable us to better design biological parts to expand the proteomes of synthetic organisms. This article is part of a Special Issue entitled "Biochemistry of Synthetic Biology - Recent Developments" Guest Editor: Dr. Ilka Heinemann and Dr. Patrick O'Donoghue. Copyright © 2017 Elsevier B.V. All rights reserved.
Exploration of sequence space as the basis of viral RNA genome segmentation.
Moreno, Elena; Ojosnegros, Samuel; García-Arriaza, Juan; Escarmís, Cristina; Domingo, Esteban; Perales, Celia
2014-05-06
The mechanisms of viral RNA genome segmentation are unknown. On extensive passage of foot-and-mouth disease virus in baby hamster kidney-21 cells, the virus accumulated multiple point mutations and underwent a transition akin to genome segmentation. The standard single RNA genome molecule was replaced by genomes harboring internal in-frame deletions affecting the L- or capsid-coding region. These genomes were infectious and killed cells by complementation. Here we show that the point mutations in the nonstructural protein-coding region (P2, P3) that accumulated in the standard genome before segmentation increased the relative fitness of the segmented version relative to the standard genome. Fitness increase was documented by intracellular expression of virus-coded proteins and infectious progeny production by RNAs with the internal deletions placed in the sequence context of the parental and evolved genome. The complementation activity involved several viral proteins, one of them being the leader proteinase L. Thus, a history of genetic drift with accumulation of point mutations was needed to allow a major variation in the structure of a viral genome. Thus, exploration of sequence space by a viral genome (in this case an unsegmented RNA) can reach a point of the space in which a totally different genome structure (in this case, a segmented RNA) is favored over the form that performed the exploration.
Ribosomes: Ribozymes that Survived Evolution Pressures but Is Paralyzed by Tiny Antibiotics
NASA Astrophysics Data System (ADS)
Yonath, Ada
An impressive number of crystal structures of ribosomes, the universal cellular machines that translate the genetic code into proteins, emerged during the last decade. The determination of ribosome high resolution structure, which was widely considered formidable, led to novel insights into the ribosomal function, namely, fidelity, catalytic mechanism, and polymerize activities. They also led to suggestions concerning its origin and shed light on the action, selectivity and synergism of ribosomal antibiotics; illuminated mechanisms acquiring bacterial resistance and provided structural information for drug improvement and design. These studies required the pioneering and implementation of advanced technologies, which directly influenced the remarkable increase of the number of structures deposited in the Protein Data Bank.
Sikorav, J L; Duval, N; Anselmet, A; Bon, S; Krejci, E; Legay, C; Osterlund, M; Reimund, B; Massoulié, J
1988-01-01
In this paper, we show the existence of alternative splicing in the 3' region of the coding sequence of Torpedo acetylcholinesterase (AChE). We describe two cDNA structures which both diverge from the previously described coding sequence of the catalytic subunit of asymmetric (A) forms (Schumacher et al., 1986; Sikorav et al., 1987). They both contain a coding sequence followed by a non-coding sequence and a poly(A) stretch. Both of these structures were shown to exist in poly(A)+ RNAs, by S1 mapping experiments. The divergent region encoded by the first sequence corresponds to the precursor of the globular dimeric form (G2a), since it contains the expected C-terminal amino acids, Ala-Cys. These amino acids are followed by a 29 amino acid extension which contains a hydrophobic segment and must be replaced by a glycolipid in the mature protein. Analyses of intact G2a AChE showed that the common domain of the protein contains intersubunit disulphide bonds. The divergent region of the second type of cDNA consists of an adjacent genomic sequence, which is removed as an intron in A and Ga mRNAs, but may encode a distinct, less abundant catalytic subunit. The structures of the cDNA clones indicate that they are derived from minor mRNAs, shorter than the three major transcripts which have been described previously (14.5, 10.5 and 5.5 kb). Oligonucleotide probes specific for the asymmetric and globular terminal regions hybridize with the three major transcripts, indicating that their size is determined by 3'-untranslated regions which are not related to the differential splicing leading to A and Ga forms. Images PMID:3181125
PDB-Explorer: a web-based interactive map of the protein data bank in shape space.
Jin, Xian; Awale, Mahendra; Zasso, Michaël; Kostro, Daniel; Patiny, Luc; Reymond, Jean-Louis
2015-10-23
The RCSB Protein Data Bank (PDB) provides public access to experimentally determined 3D-structures of biological macromolecules (proteins, peptides and nucleic acids). While various tools are available to explore the PDB, options to access the global structural diversity of the entire PDB and to perceive relationships between PDB structures remain very limited. A 136-dimensional atom pair 3D-fingerprint for proteins (3DP) counting categorized atom pairs at increasing through-space distances was designed to represent the molecular shape of PDB-entries. Nearest neighbor searches examples were reported exemplifying the ability of 3DP-similarity to identify closely related biomolecules from small peptides to enzyme and large multiprotein complexes such as virus particles. The principle component analysis was used to obtain the visualization of PDB in 3DP-space. The 3DP property space groups proteins and protein assemblies according to their 3D-shape similarity, yet shows exquisite ability to distinguish between closely related structures. An interactive website called PDB-Explorer is presented featuring a color-coded interactive map of PDB in 3DP-space. Each pixel of the map contains one or more PDB-entries which are directly visualized as ribbon diagrams when the pixel is selected. The PDB-Explorer website allows performing 3DP-nearest neighbor searches of any PDB-entry or of any structure uploaded as protein-type PDB file. All functionalities on the website are implemented in JavaScript in a platform-independent manner and draw data from a server that is updated daily with the latest PDB additions, ensuring complete and up-to-date coverage. The essentially instantaneous 3DP-similarity search with the PDB-Explorer provides results comparable to those of much slower 3D-alignment algorithms, and automatically clusters proteins from the same superfamilies in tight groups. A chemical space classification of PDB based on molecular shape was obtained using a new atom-pair 3D-fingerprint for proteins and implemented in a web-based database exploration tool comprising an interactive color-coded map of the PDB chemical space and a nearest neighbor search tool. The PDB-Explorer website is freely available at www.cheminfo.org/pdbexplorer and represents an unprecedented opportunity to interactively visualize and explore the structural diversity of the PDB. ᅟ
cncRNAs: Bi-functional RNAs with protein coding and non-coding functions
Kumari, Pooja; Sampath, Karuna
2015-01-01
For many decades, the major function of mRNA was thought to be to provide protein-coding information embedded in the genome. The advent of high-throughput sequencing has led to the discovery of pervasive transcription of eukaryotic genomes and opened the world of RNA-mediated gene regulation. Many regulatory RNAs have been found to be incapable of protein coding and are hence termed as non-coding RNAs (ncRNAs). However, studies in recent years have shown that several previously annotated non-coding RNAs have the potential to encode proteins, and conversely, some coding RNAs have regulatory functions independent of the protein they encode. Such bi-functional RNAs, with both protein coding and non-coding functions, which we term as ‘cncRNAs’, have emerged as new players in cellular systems. Here, we describe the functions of some cncRNAs identified from bacteria to humans. Because the functions of many RNAs across genomes remains unclear, we propose that RNAs be classified as coding, non-coding or both only after careful analysis of their functions. PMID:26498036
Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier
2016-01-04
The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Liu, Ye; Li, Nan; Zhang, Shoufeng; Zhang, Fei; Lian, Hai; Wang, Ying; Zhang, Jinxia; Hu, Rongliang
2013-12-01
The genome of Irkut virus, isolate IRKV-THChina12, the first non-rabies lyssavirus from China (of bat origin), has been completely sequenced. In general, coding and non-coding regions of this viral genome are similar to those of other lyssaviruses. However, alignment of the deduced amino acid sequences of the structural proteins of IRKV-THChina12 with those of other lyssavirus representatives revealed significant variability between viral species. The nucleoprotein and matrix protein were found to be the most conserved, followed by the large protein, glycoprotein and phosphoprotein. Differences in the antigenic sites in glycoprotein may result in only partial protection of the available rabies biologics against Irkut virus, which is of particular concern for pre- and post-exposure rabies prophylaxis. Copyright © 2013 Elsevier Inc. All rights reserved.
Krzeminska, Urszula; Wilson, Robyn; Rahman, Sadequr; Song, Beng Kah; Seneviratne, Sampath; Gan, Han Ming; Austin, Christopher M
2016-07-01
The complete mitochondrial genomes of two jungle crows (Corvus macrorhynchos) were sequenced. DNA was extracted from tissue samples obtained from shed feathers collected in the field in Sri Lanka and sequenced using the Illumina MiSeq Personal Sequencer. Jungle crow mitogenomes have a structural organization typical of the genus Corvus and are 16,927 bp and 17,066 bp in length, both comprising 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal subunit genes, and a non-coding control region. In addition, we complement already available house crow (Corvus spelendens) mitogenome resources by sequencing an individual from Singapore. A phylogenetic tree constructed from Corvidae family mitogenome sequences available on GenBank is presented. We confirm the monophyly of the genus Corvus and propose to use complete mitogenome resources for further intra- and interspecies genetic studies.
Ultrafast protein structure-based virtual screening with Panther
NASA Astrophysics Data System (ADS)
Niinivehmas, Sanna P.; Salokas, Kari; Lätti, Sakari; Raunio, Hannu; Pentikäinen, Olli T.
2015-10-01
Molecular docking is by far the most common method used in protein structure-based virtual screening. This paper presents Panther, a novel ultrafast multipurpose docking tool. In Panther, a simple shape-electrostatic model of the ligand-binding area of the protein is created by utilizing the protein crystal structure. The features of the possible ligands are then compared to the model by using a similarity search algorithm. On average, one ligand can be processed in a few minutes by using classical docking methods, whereas using Panther processing takes <1 s. The presented Panther protocol can be used in several applications, such as speeding up the early phases of drug discovery projects, reducing the number of failures in the clinical phase of the drug development process, and estimating the environmental toxicity of chemicals. Panther-code is available in our web pages (http://www.jyu.fi/panther) free of charge after registration.
Ultrafast protein structure-based virtual screening with Panther.
Niinivehmas, Sanna P; Salokas, Kari; Lätti, Sakari; Raunio, Hannu; Pentikäinen, Olli T
2015-10-01
Molecular docking is by far the most common method used in protein structure-based virtual screening. This paper presents Panther, a novel ultrafast multipurpose docking tool. In Panther, a simple shape-electrostatic model of the ligand-binding area of the protein is created by utilizing the protein crystal structure. The features of the possible ligands are then compared to the model by using a similarity search algorithm. On average, one ligand can be processed in a few minutes by using classical docking methods, whereas using Panther processing takes <1 s. The presented Panther protocol can be used in several applications, such as speeding up the early phases of drug discovery projects, reducing the number of failures in the clinical phase of the drug development process, and estimating the environmental toxicity of chemicals. Panther-code is available in our web pages (http://www.jyu.fi/panther) free of charge after registration.
Sinha, Sangita; Rappu, Pekka; Lange, S. C.; Mäntsälä, Pekka; Zalkin, Howard; Smith, Janet L.
1999-01-01
The yabJ gene in Bacillus subtilis is required for adenine-mediated repression of purine biosynthetic genes in vivo and codes for an acid-soluble, 14-kDa protein. The molecular mechanism of YabJ is unknown. YabJ is a member of a large, widely distributed family of proteins of unknown biochemical function. The 1.7-Å crystal structure of YabJ reveals a trimeric organization with extensive buried hydrophobic surface and an internal water-filled cavity. The most important finding in the structure is a deep, narrow cleft between subunits lined with nine side chains that are invariant among the 25 most similar homologs. This conserved site is proposed to be a binding or catalytic site for a ligand or substrate that is common to YabJ and other members of the YER057c/YjgF/UK114 family of proteins. PMID:10557275
Meza-Aguilar, J. Domingo; Fromme, Petra; Torres-Larios, Alfredo; Mendoza-Hernández, Guillermo; Hernandez-Chiñas, Ulises; Monteros, Roberto A. Arreguin-Espinosa de los; Campos, Carlos A. Eslava; Fromme, Raimund
2014-01-01
Autotransporters (ATs) represent a superfamily of proteins produced by a variety of pathogenic bacteria, which include the pathogenic groups of Escherichia coli (E. coli) associated with gastrointestinal and urinary tract infections. We present the first X-ray structure of the passenger domain from the Plasmid-encoded toxin (Pet) a 100 kDa protein at 2.3 Å resolution which is a cause of acute diarrhea in both developing and industrialized countries. Pet is a cytoskeleton-altering toxin that induces loss of actin stress fibers. While Pet (pdb code: 4OM9) shows only a sequence identity of 50 % compared to the closest related protein sequence, extracellular serine protease plasmid (EspP) the structural features of both proteins are conserved. A closer structural look reveals that Pet contains a β-pleaded sheet at the sequence region of residues 181-190, the corresponding structural domain in EspP consists of a coiled loop. Secondary, the Pet passenger domain features a more pronounced beta sheet between residues 135-143 compared to the structure of EspP. PMID:24530907
Sivakolundu, Sivashankar G; Mabrouk, Patricia Ann
2003-05-01
The complete solution structure of ferrocytochrome c in 30% acetonitrile/70% water has been determined using high-field 1D and 2D (1)H NMR methods and deposited in the Protein Data Bank with codes 1LC1 and 1LC2. This is the first time a complete solution protein structure has been determined for a protein in nonaqueous media. Ferrocyt c retains a native protein secondary structure (five alpha-helices and two omega loops) in 30% acetonitrile. H18 and M80 residues are the axial heme ligands, as in aqueous solution. Residues believed to be axial heme ligands in the alkaline-like conformers of ferricyt c, specifically H33 and K72, are positioned close to the heme iron. The orientations of both heme propionates are markedly different in 30% acetonitrile/70% water. Comparative structural analysis of reduced cyt c in 30% acetonitrile/70% water solution with cyt c in different environments has given new insight into the cyt c folding mechanism, the electron transfer pathway, and cell apoptosis.
PLIP: fully automated protein-ligand interaction profiler.
Salentin, Sebastian; Schreiber, Sven; Haupt, V Joachim; Adasme, Melissa F; Schroeder, Michael
2015-07-01
The characterization of interactions in protein-ligand complexes is essential for research in structural bioinformatics, drug discovery and biology. However, comprehensive tools are not freely available to the research community. Here, we present the protein-ligand interaction profiler (PLIP), a novel web service for fully automated detection and visualization of relevant non-covalent protein-ligand contacts in 3D structures, freely available at projects.biotec.tu-dresden.de/plip-web. The input is either a Protein Data Bank structure, a protein or ligand name, or a custom protein-ligand complex (e.g. from docking). In contrast to other tools, the rule-based PLIP algorithm does not require any structure preparation. It returns a list of detected interactions on single atom level, covering seven interaction types (hydrogen bonds, hydrophobic contacts, pi-stacking, pi-cation interactions, salt bridges, water bridges and halogen bonds). PLIP stands out by offering publication-ready images, PyMOL session files to generate custom images and parsable result files to facilitate successive data processing. The full python source code is available for download on the website. PLIP's command-line mode allows for high-throughput interaction profiling. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Molecular mimicry between protein and tRNA.
Nakamura, Y
2001-01-01
Mimicry is a sophisticated development in animals, fish, and plants that allows them to fool others by imitating a shape or color for diverse purposes, such as to prey, evade, lure, pollinate, or threaten. This is not restricted to the macro-world, but extends to the micro-world as molecular mimicry. Recent advances in structural and molecular biology uncovered a set of translation factors that resembles a tRNA shape and, in one case, even mimics a tRNA function for deciphering the genetic code. Nature must have evolved this art of molecular mimicry between protein and ribonucleic acid by using different protein structures until the translation factors sat in the cockpit of a ribosome machine, on behalf of tRNA, and achieved diverse actions. Structural, functional, and evolutionary aspects of molecular mimicry will be discussed.
Polar bears, antibiotics, and the evolving ribosome (Nobel Lecture).
Yonath, Ada
2010-06-14
High-resolution structures of ribosomes, the cellular machines that translate the genetic code into proteins, revealed the decoding mechanism, detected the mRNA path, identified the sites of the tRNA molecules in the ribosome, elucidated the position and the nature of the nascent proteins exit tunnel, illuminated the interactions of the ribosome with non-ribosomal factors, such as the initiation, release and recycling factors, and provided valuable information on ribosomal antibiotics, their binding sites, modes of action, principles of selectivity and the mechanisms leading to their resistance. Notably, these structures proved that the ribosome is a ribozyme whose active site, namely where the peptide bonds are being formed, is situated within a universal symmetrical region that is embedded in the otherwise asymmetric ribosome structure. As this symmetrical region is highly conserved and provides the machinery required for peptide bond formation and for ribosome polymerase activity, it may be the remnant of the proto-ribosome, a dimeric prebiotic machine that formed peptide bonds and non-coded polypeptide chains. Structures of complexes of ribosomes with antibiotics targeting them revealed the principles allowing for their clinical use, identified resistance mechanisms and showed the structural bases for discriminating pathogenic bacteria from hosts, hence providing valuable structural information for antibiotics improvement and for the design of novel compounds that can serve as antibiotics.
Khedkar, Santosh A; Malde, Alpeshkumar K; Coutinho, Evans C
2005-01-01
Mycobacterium tuberculosis (Mtb) is a successful pathogen that overcomes the numerous challenges presented by the immune system of the host. In the last 40 years few anti-TB drugs have been developed, while the drug-resistance problem is increasing; there is thus a pressing need to develop new anti-TB drugs active against both the acute and chronic growth phases of the mycobacterium. Methionine S-adenosyltransferase (MAT) is an enzyme involved in the synthesis of S-adenosylmethionine (SAM), a methyl donor essential for mycolipid biosynthesis. As an anti-TB drug target, Mtb-MAT has been well validated. A homology model of MAT has been constructed using the X-ray structures of E. coli MAT (PDB code: 1MXA) and rat MAT (PDB code: 1QM4) as templates, by comparative protein modeling principles. The resulting model has the correct stereochemistry as gauged from the Ramachandran plot and good three-dimensional (3D) structure compatibility as assessed by the Profiles-3D score. The structurally and functionally important residues (active site) of Mtb-MAT have been identified using the E. coli and rat MAT crystal structures and the reported point mutation data. The homology model conserves the topological and active site features of the MAT family of proteins. The differences in the molecular electrostatic potentials (MEP) of Mtb and human MAT provide evidences that selective and specific Mtb-MAT inhibitors can be designed using the homology model, by the structure-based drug design approaches.
Mutations that Cause Human Disease: A Computational/Experimental Approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beernink, P; Barsky, D; Pesavento, B
International genome sequencing projects have produced billions of nucleotides (letters) of DNA sequence data, including the complete genome sequences of 74 organisms. These genome sequences have created many new scientific opportunities, including the ability to identify sequence variations among individuals within a species. These genetic differences, which are known as single nucleotide polymorphisms (SNPs), are particularly important in understanding the genetic basis for disease susceptibility. Since the report of the complete human genome sequence, over two million human SNPs have been identified, including a large-scale comparison of an entire chromosome from twenty individuals. Of the protein coding SNPs (cSNPs), approximatelymore » half leads to a single amino acid change in the encoded protein (non-synonymous coding SNPs). Most of these changes are functionally silent, while the remainder negatively impact the protein and sometimes cause human disease. To date, over 550 SNPs have been found to cause single locus (monogenic) diseases and many others have been associated with polygenic diseases. SNPs have been linked to specific human diseases, including late-onset Parkinson disease, autism, rheumatoid arthritis and cancer. The ability to predict accurately the effects of these SNPs on protein function would represent a major advance toward understanding these diseases. To date several attempts have been made toward predicting the effects of such mutations. The most successful of these is a computational approach called ''Sorting Intolerant From Tolerant'' (SIFT). This method uses sequence conservation among many similar proteins to predict which residues in a protein are functionally important. However, this method suffers from several limitations. First, a query sequence must have a sufficient number of relatives to infer sequence conservation. Second, this method does not make use of or provide any information on protein structure, which can be used to understand how an amino acid change affects the protein. The experimental methods that provide the most detailed structural information on proteins are X-ray crystallography and NMR spectroscopy. However, these methods are labor intensive and currently cannot be carried out on a genomic scale. Nonetheless, Structural Genomics projects are being pursued by more than a dozen groups and consortia worldwide and as a result the number of experimentally determined structures is rising exponentially. Based on the expectation that protein structures will continue to be determined at an ever-increasing rate, reliable structure prediction schemes will become increasingly valuable, leading to information on protein function and disease for many different proteins. Given known genetic variability and experimentally determined protein structures, can we accurately predict the effects of single amino acid substitutions? An objective assessment of this question would involve comparing predicted and experimentally determined structures, which thus far has not been rigorously performed. The completed research leveraged existing expertise at LLNL in computational and structural biology, as well as significant computing resources, to address this question.« less
Fan, SiGang; Hu, ChaoQun; Wen, Jing; Zhang, LvPing
2011-05-01
The complete mitochondrial DNA sequence contains useful information for phylogenetic analyses of metazoa. In this study, the complete mitochondrial DNA sequence of sea cucumber Stichopus horrens (Holothuroidea: Stichopodidae: Stichopus) is presented. The complete sequence was determined using normal and long PCRs. The mitochondrial genome of Stichopus horrens is a circular molecule 16257 bps long, composed of 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes. Most of these genes are coded on the heavy strand except for one protein-coding gene (nad6) and five tRNA genes (tRNA ( Ser(UCN) ), tRNA ( Gln ), tRNA ( Ala ), tRNA ( Val ), tRNA ( Asp )) which are coded on the light strand. The composition of the heavy strand is 30.8% A, 23.7% C, 16.2% G, and 29.3% T bases (AT skew=0.025; GC skew=-0.188). A non-coding region of 675 bp was identified as a putative control region because of its location and AT richness. The intergenic spacers range from 1 to 50 bp in size, totaling 227 bp. A total of 25 overlapping nucleotides, ranging from 1 to 10 bp in size, exist among 11 genes. All 13 protein-coding genes are initiated with an ATG. The TAA codon is used as the stop codon in all the protein coding genes except nad3 and nad4 that use TAG as their termination codon. The most frequently used amino acids are Leu (16.29%), Ser (10.34%) and Phe (8.37%). All of the tRNA genes have the potential to fold into typical cloverleaf secondary structures. We also compared the order of the genes in the mitochondrial DNA from the five holothurians that are now available and found a novel gene arrangement in the mitochondrial DNA of Stichopus horrens.
Genomics of Clostridium taeniosporum, an organism which forms endospores with ribbon-like appendages
Cambridge, Joshua M.; Blinkova, Alexandra L.; Salvador Rocha, Erick I.; Bode Hernández, Addys; Moreno, Maday; Ginés-Candelaria, Edwin; Goetz, Benjamin M.; Hunicke-Smith, Scott; Satterwhite, Ed; Tucker, Haley O.
2018-01-01
Clostridium taeniosporum, a non-pathogenic anaerobe closely related to the C. botulinum Group II members, was isolated from Crimean lake silt about 60 years ago. Its endospores are surrounded by an encasement layer which forms a trunk at one spore pole to which about 12–14 large, ribbon-like appendages are attached. The genome consists of one 3,264,813 bp, circular chromosome (with 26.6% GC) and three plasmids. The chromosome contains 2,892 potential protein coding sequences: 2,124 have specific functions, 147 have general functions, 228 are conserved but without known function and 393 are hypothetical based on the fact that no statistically significant orthologs were found. The chromosome also contains 101 genes for stable RNAs, including 7 rRNA clusters. Over 84% of the protein coding sequences and 96% of the stable RNA coding regions are oriented in the same direction as replication. The three known appendage genes are located within a single cluster with five other genes, the protein products of which are closely related, in terms of sequence, to the known appendage proteins. The relatedness of the deduced protein products suggests that all or some of the closely related genes might code for minor appendage proteins or assembly factors. The appendage genes might be unique among the known clostridia; no statistically significant orthologs were found within other clostridial genomes for which sequence data are available. The C. taeniosporum chromosome contains two functional prophages, one Siphoviridae and one Myoviridae, and one defective prophage. Three plasmids of 5.9, 69.7 and 163.1 Kbp are present. These data are expected to contribute to future studies of developmental, structural and evolutionary biology and to potential industrial applications of this organism. PMID:29293521
Cambridge, Joshua M; Blinkova, Alexandra L; Salvador Rocha, Erick I; Bode Hernández, Addys; Moreno, Maday; Ginés-Candelaria, Edwin; Goetz, Benjamin M; Hunicke-Smith, Scott; Satterwhite, Ed; Tucker, Haley O; Walker, James R
2018-01-01
Clostridium taeniosporum, a non-pathogenic anaerobe closely related to the C. botulinum Group II members, was isolated from Crimean lake silt about 60 years ago. Its endospores are surrounded by an encasement layer which forms a trunk at one spore pole to which about 12-14 large, ribbon-like appendages are attached. The genome consists of one 3,264,813 bp, circular chromosome (with 26.6% GC) and three plasmids. The chromosome contains 2,892 potential protein coding sequences: 2,124 have specific functions, 147 have general functions, 228 are conserved but without known function and 393 are hypothetical based on the fact that no statistically significant orthologs were found. The chromosome also contains 101 genes for stable RNAs, including 7 rRNA clusters. Over 84% of the protein coding sequences and 96% of the stable RNA coding regions are oriented in the same direction as replication. The three known appendage genes are located within a single cluster with five other genes, the protein products of which are closely related, in terms of sequence, to the known appendage proteins. The relatedness of the deduced protein products suggests that all or some of the closely related genes might code for minor appendage proteins or assembly factors. The appendage genes might be unique among the known clostridia; no statistically significant orthologs were found within other clostridial genomes for which sequence data are available. The C. taeniosporum chromosome contains two functional prophages, one Siphoviridae and one Myoviridae, and one defective prophage. Three plasmids of 5.9, 69.7 and 163.1 Kbp are present. These data are expected to contribute to future studies of developmental, structural and evolutionary biology and to potential industrial applications of this organism.
Wang, Jiajia; Li, Hu; Dai, Renhuai
2017-12-01
Here, we describe the first complete mitochondrial genome (mitogenome) sequence of the leafhopper Taharana fasciana (Coelidiinae). The mitogenome sequence contains 15,161 bp with an A + T content of 77.9%. It includes 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and one non-coding (A + T-rich) region; in addition, a repeat region is also present (GenBank accession no. KY886913). These genes/regions are in the same order as in the inferred insect ancestral mitogenome. All protein-coding genes have ATN as the start codon, and TAA or single T as the stop codons, except the gene ND3, which ends with TAG. Furthermore, we predicted the secondary structures of the rRNAs in T. fasciana. Six domains (domain III is absent in arthropods) and 41 helices were predicted for 16S rRNA, and 12S rRNA comprised three structural domains and 24 helices. Phylogenetic tree analysis confirmed that T. fasciana and other members of the Cicadellidae are clustered into a clade, and it identified the relationships among the subfamilies Deltocephalinae, Coelidiinae, Idiocerinae, Cicadellinae, and Typhlocybinae.
Effects of GWAS-Associated Genetic Variants on lncRNAs within IBD and T1D Candidate Loci
Brorsson, Caroline A.; Pociot, Flemming
2014-01-01
Long non-coding RNAs are a new class of non-coding RNAs that are at the crosshairs in many human diseases such as cancers, cardiovascular disorders, inflammatory and autoimmune disease like Inflammatory Bowel Disease (IBD) and Type 1 Diabetes (T1D). Nearly 90% of the phenotype-associated single-nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) lie outside of the protein coding regions, and map to the non-coding intervals. However, the relationship between phenotype-associated loci and the non-coding regions including the long non-coding RNAs (lncRNAs) is poorly understood. Here, we systemically identified all annotated IBD and T1D loci-associated lncRNAs, and mapped nominally significant GWAS/ImmunoChip SNPs for IBD and T1D within these lncRNAs. Additionally, we identified tissue-specific cis-eQTLs, and strong linkage disequilibrium (LD) signals associated with these SNPs. We explored sequence and structure based attributes of these lncRNAs, and also predicted the structural effects of mapped SNPs within them. We also identified lncRNAs in IBD and T1D that are under recent positive selection. Our analysis identified putative lncRNA secondary structure-disruptive SNPs within and in close proximity (+/−5 kb flanking regions) of IBD and T1D loci-associated candidate genes, suggesting that these RNA conformation-altering polymorphisms might be associated with diseased-phenotype. Disruption of lncRNA secondary structure due to presence of GWAS SNPs provides valuable information that could be potentially useful for future structure-function studies on lncRNAs. PMID:25144376
Seligmann, Hervé
2013-05-07
GenBank's EST database includes RNAs matching exactly human mitochondrial sequences assuming systematic asymmetric nucleotide exchange-transcription along exchange rules: A→G→C→U/T→A (12 ESTs), A→U/T→C→G→A (4 ESTs), C→G→U/T→C (3 ESTs), and A→C→G→U/T→A (1 EST), no RNAs correspond to other potential asymmetric exchange rules. Hypothetical polypeptides translated from nucleotide-exchanged human mitochondrial protein coding genes align with numerous GenBank proteins, predicted secondary structures resemble their putative GenBank homologue's. Two independent methods designed to detect overlapping genes (one based on nucleotide contents analyses in relation to replicative deamination gradients at third codon positions, and circular code analyses of codon contents based on frame redundancy), confirm nucleotide-exchange-encrypted overlapping genes. Methods converge on which genes are most probably active, and which not, and this for the various exchange rules. Mean EST lengths produced by different nucleotide exchanges are proportional to (a) extents that various bioinformatics analyses confirm the protein coding status of putative overlapping genes; (b) known kinetic chemistry parameters of the corresponding nucleotide substitutions by the human mitochondrial DNA polymerase gamma (nucleotide DNA misinsertion rates); (c) stop codon densities in predicted overlapping genes (stop codon readthrough and exchanging polymerization regulate gene expression by counterbalancing each other). Numerous rarely expressed proteins seem encoded within regular mitochondrial genes through asymmetric nucleotide exchange, avoiding lengthening genomes. Intersecting evidence between several independent approaches confirms the working hypothesis status of gene encryption by systematic nucleotide exchanges. Copyright © 2013 Elsevier Ltd. All rights reserved.
Caetano-Anollés, Gustavo; Kim, Kyung Mo; Caetano-Anollés, Derek
2012-02-01
The complexity of modern biochemistry developed gradually on early Earth as new molecules and structures populated the emerging cellular systems. Here, we generate a historical account of the gradual discovery of primordial proteins, cofactors, and molecular functions using phylogenomic information in the sequence of 420 genomes. We focus on structural and functional annotations of the 54 most ancient protein domains. We show how primordial functions are linked to folded structures and how their interaction with cofactors expanded the functional repertoire. We also reveal protocell membranes played a crucial role in early protein evolution and show translation started with RNA and thioester cofactor-mediated aminoacylation. Our findings allow elaboration of an evolutionary model of early biochemistry that is firmly grounded in phylogenomic information and biochemical, biophysical, and structural knowledge. The model describes how primordial α-helical bundles stabilized membranes, how these were decorated by layered arrangements of β-sheets and α-helices, and how these arrangements became globular. Ancient forms of aminoacyl-tRNA synthetase (aaRS) catalytic domains and ancient non-ribosomal protein synthetase (NRPS) modules gave rise to primordial protein synthesis and the ability to generate a code for specificity in their active sites. These structures diversified producing cofactor-binding molecular switches and barrel structures. Accretion of domains and molecules gave rise to modern aaRSs, NRPS, and ribosomal ensembles, first organized around novel emerging cofactors (tRNA and carrier proteins) and then more complex cofactor structures (rRNA). The model explains how the generation of protein structures acted as scaffold for nucleic acids and resulted in crystallization of modern translation.
Yong, Hoi-Sen; Song, Sze-Looi; Lim, Phaik-Eem; Chan, Kok-Gan; Chow, Wan-Loo; Eamsobhana, Praphathip
2015-01-01
The whole mitochondrial genome of the pest fruit fly Bactrocera arecae was obtained from next-generation sequencing of genomic DNA. It had a total length of 15,900 bp, consisting of 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a non-coding region (A + T-rich control region). The control region (952 bp) was flanked by rrnS and trnI genes. The start codons included 6 ATG, 3 ATT and 1 each of ATA, ATC, GTG and TCG. Eight TAA, two TAG, one incomplete TA and two incomplete T stop codons were represented in the protein-coding genes. The cloverleaf structure for trnS1 lacked the D-loop, and that of trnN and trnF lacked the TΨC-loop. Molecular phylogeny based on 13 protein-coding genes was concordant with 37 mitochondrial genes, with B. arecae having closest genetic affinity to B. tryoni. The subgenus Bactrocera of Dacini tribe and the Dacinae subfamily (Dacini and Ceratitidini tribes) were monophyletic. The whole mitogenome of B. arecae will serve as a useful dataset for studying the genetics, systematics and phylogenetic relationships of the many species of Bactrocera genus in particular, and tephritid fruit flies in general. PMID:26472633
Single Amino Acid Repeats in the Proteome World: Structural, Functional, and Evolutionary Insights
Kumar, Amitha Sampath; Sowpati, Divya Tej; Mishra, Rakesh K.
2016-01-01
Microsatellites or simple sequence repeats (SSR) are abundant, highly diverse stretches of short DNA repeats present in all genomes. Tandem mono/tri/hexanucleotide repeats in the coding regions contribute to single amino acids repeats (SAARs) in the proteome. While SSRs in the coding region always result in amino acid repeats, a majority of SAARs arise due to a combination of various codons representing the same amino acid and not as a consequence of SSR events. Certain amino acids are abundant in repeat regions indicating a positive selection pressure behind the accumulation of SAARs. By analysing 22 proteomes including the human proteome, we explored the functional and structural relationship of amino acid repeats in an evolutionary context. Only ~15% of repeats are present in any known functional domain, while ~74% of repeats are present in the disordered regions, suggesting that SAARs add to the functionality of proteins by providing flexibility, stability and act as linker elements between domains. Comparison of SAAR containing proteins across species reveals that while shorter repeats are conserved among orthologs, proteins with longer repeats, >15 amino acids, are unique to the respective organism. Lysine repeats are well conserved among orthologs with respect to their length and number of occurrences in a protein. Other amino acids such as glutamic acid, proline, serine and alanine repeats are generally conserved among the orthologs with varying repeat lengths. These findings suggest that SAARs have accumulated in the proteome under positive selection pressure and that they provide flexibility for optimal folding of functional/structural domains of proteins. The insights gained from our observations can help in effective designing and engineering of proteins with novel features. PMID:27893794
Liu, Kaiyu; Li, Yi; Jousset, Françoise-Xavière; Zadori, Zoltan; Szelei, Jozsef; Yu, Qian; Pham, Hanh Thi; Lépine, François; Bergoin, Max; Tijssen, Peter
2011-01-01
The Acheta domesticus densovirus (AdDNV), isolated from crickets, has been endemic in Europe for at least 35 years. Severe epizootics have also been observed in American commercial rearings since 2009 and 2010. The AdDNV genome was cloned and sequenced for this study. The transcription map showed that splicing occurred in both the nonstructural (NS) and capsid protein (VP) multicistronic RNAs. The splicing pattern of NS mRNA predicted 3 nonstructural proteins (NS1 [576 codons], NS2 [286 codons], and NS3 [213 codons]). The VP gene cassette contained two VP open reading frames (ORFs), of 597 (ORF-A) and 268 (ORF-B) codons. The VP2 sequence was shown by N-terminal Edman degradation and mass spectrometry to correspond with ORF-A. Mass spectrometry, sequencing, and Western blotting of baculovirus-expressed VPs versus native structural proteins demonstrated that the VP1 structural protein was generated by joining ORF-A and -B via splicing (splice II), eliminating the N terminus of VP2. This splice resulted in a nested set of VP1 (816 codons), VP3 (467 codons), and VP4 (429 codons) structural proteins. In contrast, the two splices within ORF-B (Ia and Ib) removed the donor site of intron II and resulted in VP2, VP3, and VP4 expression. ORF-B may also code for several nonstructural proteins, of 268, 233, and 158 codons. The small ORF-B contains the coding sequence for a phospholipase A2 motif found in VP1, which was shown previously to be critical for cellular uptake of the virus. These splicing features are unique among parvoviruses and define a new genus of ambisense densoviruses. PMID:21775445
Crystal structure of a two-subunit TrkA octameric gating ring assembly
Deller, Marc C.; Johnson, Hope A.; Miller, Mitchell D.; ...
2015-03-31
The TM1088 locus of T. maritima codes for two proteins designated TM1088A and TM1088B, which combine to form the cytosolic portion of a putative Trk K⁺ transporter. We report the crystal structure of this assembly to a resolution of 3.45 Å. The high resolution crystal structures of the components of the assembly, TM1088A and TM1088B, were also determined independently to 1.50 Å and 1.55 Å, respectively. The TM1088 proteins are structurally homologous to each other and to other K⁺ transporter proteins, such as TrkA. These proteins form a cytosolic gating ring assembly that controls the flow of K⁺ ions acrossmore » the membrane. TM1088 represents the first structure of a two-subunit Trk assembly. Despite the atypical genetics and chain organization of the TM1088 assembly, it shares significant structural homology and an overall quaternary organization with other single-subunit K⁺ gating ring assemblies. This structure provides the first structural insights into what may be an evolutionary ancestor of more modern single-subunit K⁺ gating ring assemblies.« less
Investigation of protein folding by coarse-grained molecular dynamics with the UNRES force field.
Maisuradze, Gia G; Senet, Patrick; Czaplewski, Cezary; Liwo, Adam; Scheraga, Harold A
2010-04-08
Coarse-grained molecular dynamics simulations offer a dramatic extension of the time-scale of simulations compared to all-atom approaches. In this article, we describe the use of the physics-based united-residue (UNRES) force field, developed in our laboratory, in protein-structure simulations. We demonstrate that this force field offers about a 4000-times extension of the simulation time scale; this feature arises both from averaging out the fast-moving degrees of freedom and reduction of the cost of energy and force calculations compared to all-atom approaches with explicit solvent. With massively parallel computers, microsecond folding simulation times of proteins containing about 1000 residues can be obtained in days. A straightforward application of canonical UNRES/MD simulations, demonstrated with the example of the N-terminal part of the B-domain of staphylococcal protein A (PDB code: 1BDD, a three-alpha-helix bundle), discerns the folding mechanism and determines kinetic parameters by parallel simulations of several hundred or more trajectories. Use of generalized-ensemble techniques, of which the multiplexed replica exchange method proved to be the most effective, enables us to compute thermodynamics of folding and carry out fully physics-based prediction of protein structure, in which the predicted structure is determined as a mean over the most populated ensemble below the folding-transition temperature. By using principal component analysis of the UNRES folding trajectories of the formin-binding protein WW domain (PDB code: 1E0L; a three-stranded antiparallel beta-sheet) and 1BDD, we identified representative structures along the folding pathways and demonstrated that only a few (low-indexed) principal components can capture the main structural features of a protein-folding trajectory; the potentials of mean force calculated along these essential modes exhibit multiple minima, as opposed to those along the remaining modes that are unimodal. In addition, a comparison between the structures that are representative of the minima in the free-energy profile along the essential collective coordinates of protein folding (computed by principal component analysis) and the free-energy profile projected along the virtual-bond dihedral angles gamma of the backbone revealed the key residues involved in the transitions between the different basins of the folding free-energy profile, in agreement with existing experimental data for 1E0L .
Brown, Peter; Pullan, Wayne; Yang, Yuedong; Zhou, Yaoqi
2016-02-01
The three dimensional tertiary structure of a protein at near atomic level resolution provides insight alluding to its function and evolution. As protein structure decides its functionality, similarity in structure usually implies similarity in function. As such, structure alignment techniques are often useful in the classifications of protein function. Given the rapidly growing rate of new, experimentally determined structures being made available from repositories such as the Protein Data Bank, fast and accurate computational structure comparison tools are required. This paper presents SPalignNS, a non-sequential protein structure alignment tool using a novel asymmetrical greedy search technique. The performance of SPalignNS was evaluated against existing sequential and non-sequential structure alignment methods by performing trials with commonly used datasets. These benchmark datasets used to gauge alignment accuracy include (i) 9538 pairwise alignments implied by the HOMSTRAD database of homologous proteins; (ii) a subset of 64 difficult alignments from set (i) that have low structure similarity; (iii) 199 pairwise alignments of proteins with similar structure but different topology; and (iv) a subset of 20 pairwise alignments from the RIPC set. SPalignNS is shown to achieve greater alignment accuracy (lower or comparable root-mean squared distance with increased structure overlap coverage) for all datasets, and the highest agreement with reference alignments from the challenging dataset (iv) above, when compared with both sequentially constrained alignments and other non-sequential alignments. SPalignNS was implemented in C++. The source code, binary executable, and a web server version is freely available at: http://sparks-lab.org yaoqi.zhou@griffith.edu.au. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Sanchez-Martinez, M; Crehuet, R
2014-12-21
We present a method based on the maximum entropy principle that can re-weight an ensemble of protein structures based on data from residual dipolar couplings (RDCs). The RDCs of intrinsically disordered proteins (IDPs) provide information on the secondary structure elements present in an ensemble; however even two sets of RDCs are not enough to fully determine the distribution of conformations, and the force field used to generate the structures has a pervasive influence on the refined ensemble. Two physics-based coarse-grained force fields, Profasi and Campari, are able to predict the secondary structure elements present in an IDP, but even after including the RDC data, the re-weighted ensembles differ between both force fields. Thus the spread of IDP ensembles highlights the need for better force fields. We distribute our algorithm in an open-source Python code.
Four RNA families with functional transient structures
Zhu, Jing Yun A; Meyer, Irmtraud M
2015-01-01
Protein-coding and non-coding RNA transcripts perform a wide variety of cellular functions in diverse organisms. Several of their functional roles are expressed and modulated via RNA structure. A given transcript, however, can have more than a single functional RNA structure throughout its life, a fact which has been previously overlooked. Transient RNA structures, for example, are only present during specific time intervals and cellular conditions. We here introduce four RNA families with transient RNA structures that play distinct and diverse functional roles. Moreover, we show that these transient RNA structures are structurally well-defined and evolutionarily conserved. Since Rfam annotates one structure for each family, there is either no annotation for these transient structures or no such family. Thus, our alignments either significantly update and extend the existing Rfam families or introduce a new RNA family to Rfam. For each of the four RNA families, we compile a multiple-sequence alignment based on experimentally verified transient and dominant (dominant in terms of either the thermodynamic stability and/or attention received so far) RNA secondary structures using a combination of automated search via covariance model and manual curation. The first alignment is the Trp operon leader which regulates the operon transcription in response to tryptophan abundance through alternative structures. The second alignment is the HDV ribozyme which we extend to the 5′ flanking sequence. This flanking sequence is involved in the regulation of the transcript's self-cleavage activity. The third alignment is the 5′ UTR of the maturation protein from Levivirus which contains a transient structure that temporarily postpones the formation of the final inhibitory structure to allow translation of maturation protein. The fourth and last alignment is the SAM riboswitch which regulates the downstream gene expression by assuming alternative structures upon binding of SAM. All transient and dominant structures are mapped to our new alignments introduced here. PMID:25751035
Four RNA families with functional transient structures.
Zhu, Jing Yun A; Meyer, Irmtraud M
2015-01-01
Protein-coding and non-coding RNA transcripts perform a wide variety of cellular functions in diverse organisms. Several of their functional roles are expressed and modulated via RNA structure. A given transcript, however, can have more than a single functional RNA structure throughout its life, a fact which has been previously overlooked. Transient RNA structures, for example, are only present during specific time intervals and cellular conditions. We here introduce four RNA families with transient RNA structures that play distinct and diverse functional roles. Moreover, we show that these transient RNA structures are structurally well-defined and evolutionarily conserved. Since Rfam annotates one structure for each family, there is either no annotation for these transient structures or no such family. Thus, our alignments either significantly update and extend the existing Rfam families or introduce a new RNA family to Rfam. For each of the four RNA families, we compile a multiple-sequence alignment based on experimentally verified transient and dominant (dominant in terms of either the thermodynamic stability and/or attention received so far) RNA secondary structures using a combination of automated search via covariance model and manual curation. The first alignment is the Trp operon leader which regulates the operon transcription in response to tryptophan abundance through alternative structures. The second alignment is the HDV ribozyme which we extend to the 5' flanking sequence. This flanking sequence is involved in the regulation of the transcript's self-cleavage activity. The third alignment is the 5' UTR of the maturation protein from Levivirus which contains a transient structure that temporarily postpones the formation of the final inhibitory structure to allow translation of maturation protein. The fourth and last alignment is the SAM riboswitch which regulates the downstream gene expression by assuming alternative structures upon binding of SAM. All transient and dominant structures are mapped to our new alignments introduced here.
Liwo, Adam; Ołdziej, Stanisław; Czaplewski, Cezary; Kleinerman, Dana S.; Blood, Philip; Scheraga, Harold A.
2010-01-01
We report the implementation of our united-residue UNRES force field for simulations of protein structure and dynamics with massively parallel architectures. In addition to coarse-grained parallelism already implemented in our previous work, in which each conformation was treated by a different task, we introduce a fine-grained level in which energy and gradient evaluation are split between several tasks. The Message Passing Interface (MPI) libraries have been utilized to construct the parallel code. The parallel performance of the code has been tested on a professional Beowulf cluster (Xeon Quad Core), a Cray XT3 supercomputer, and two IBM BlueGene/P supercomputers with canonical and replica-exchange molecular dynamics. With IBM BlueGene/P, about 50 % efficiency and 120-fold speed-up of the fine-grained part was achieved for a single trajectory of a 767-residue protein with use of 256 processors/trajectory. Because of averaging over the fast degrees of freedom, UNRES provides an effective 1000-fold speed-up compared to the experimental time scale and, therefore, enables us to effectively carry out millisecond-scale simulations of proteins with 500 and more amino-acid residues in days of wall-clock time. PMID:20305729
Dynamics of Lysozyme in a Glycerol-Water system
NASA Astrophysics Data System (ADS)
Ghatty, Pavan; Carri, Gustavo
2007-03-01
Bio-preservation of proteins is of great commercial and academic interest. A variety of sugars have been found to be effective in preserving the structure of proteins. This has been attributed and in some cases proved to their ability to form strong hydrogen bonds with proteins thus restricting their motion. The work presented here explores the hypothesis that glycerol, a tri-alcohol curbs the motion of protein. We have carried out a 10ns Molecular Dynamics simulation to study the phenomenon. The structure of Lysozyme (PDB code 193L) has been studied in three solutions of 10, 20 and 30 % by weight of glycerol in water. Glycerol molecules in all three solutions have shown a tendency to agglomerate around the protein. Strong hydrogen bonding has also been observed between glycerol molecules and the protein. With increasing time, the g(r) of glycerol molecules around proteins shows two peaks with increasing prominence suggesting the movement of glycerol cluster to positions closer to the protein surface.
Wagner, Tristan; Alexandre, Matthieu; Duran, Rosario; Barilone, Nathalie; Wehenkel, Annemarie; Alzari, Pedro M; Bellinzoni, Marco
2015-05-01
Signal transduction mediated by Ser/Thr phosphorylation in Mycobacterium tuberculosis has been intensively studied in the last years, as its genome harbors eleven genes coding for eukaryotic-like Ser/Thr kinases. Here we describe the crystal structure and the autophosphorylation sites of the catalytic domain of PknA, one of two protein kinases essential for pathogen's survival. The structure of the ligand-free kinase domain shows an auto-inhibited conformation similar to that observed in human Tyr kinases of the Src-family. These results reinforce the high conservation of structural hallmarks and regulation mechanisms between prokaryotic and eukaryotic protein kinases. © 2015 Wiley Periodicals, Inc.
Protein structural similarity search by Ramachandran codes
Lo, Wei-Cheng; Huang, Po-Jung; Chang, Chih-Hung; Lyu, Ping-Chiang
2007-01-01
Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era. PMID:17716377
@TOME-2: a new pipeline for comparative modeling of protein–ligand complexes
Pons, Jean-Luc; Labesse, Gilles
2009-01-01
@TOME 2.0 is new web pipeline dedicated to protein structure modeling and small ligand docking based on comparative analyses. @TOME 2.0 allows fold recognition, template selection, structural alignment editing, structure comparisons, 3D-model building and evaluation. These tasks are routinely used in sequence analyses for structure prediction. In our pipeline the necessary software is efficiently interconnected in an original manner to accelerate all the processes. Furthermore, we have also connected comparative docking of small ligands that is performed using protein–protein superposition. The input is a simple protein sequence in one-letter code with no comment. The resulting 3D model, protein–ligand complexes and structural alignments can be visualized through dedicated Web interfaces or can be downloaded for further studies. These original features will aid in the functional annotation of proteins and the selection of templates for molecular modeling and virtual screening. Several examples are described to highlight some of the new functionalities provided by this pipeline. The server and its documentation are freely available at http://abcis.cbs.cnrs.fr/AT2/ PMID:19443448
Analysis and recognition of 5′ UTR intron splice sites in human pre-mRNA
Eden, E.; Brunak, S.
2004-01-01
Prediction of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition. We perform a rigorous analysis of such splice sites embedded in human 5′ untranslated regions (UTRs), and investigate correlations between this class of splice sites and other features found in the adjacent exons and introns. By restricting the training of neural network algorithms to ‘pure’ UTRs (not extending partially into protein coding regions), we for the first time investigate the predictive power of the splicing signal proper, in contrast to conventional splice site prediction, which typically relies on the change in sequence at the transition from protein coding to non-coding. By doing so, the algorithms were able to pick up subtler splicing signals that were otherwise masked by ‘coding’ noise, thus enhancing significantly the prediction of 5′ UTR splice sites. For example, the non-coding splice site predicting networks pick up compositional and positional bias in the 3′ ends of non-coding exons and 5′ non-coding intron ends, where cytosine and guanine are over-represented. This compositional bias at the true UTR donor sites is also visible in the synaptic weights of the neural networks trained to identify UTR donor sites. Conventional splice site prediction methods perform poorly in UTRs because the reading frame pattern is absent. The NetUTR method presented here performs 2–3-fold better compared with NetGene2 and GenScan in 5′ UTRs. We also tested the 5′ UTR trained method on protein coding regions, and discovered, surprisingly, that it works quite well (although it cannot compete with NetGene2). This indicates that the local splicing pattern in UTRs and coding regions is largely the same. The NetUTR method is made publicly available at www.cbs.dtu.dk/services/NetUTR. PMID:14960723
The solution structure of the pentatricopeptide repeat protein PPR10 upon binding atpH RNA
Gully, Benjamin S.; Cowieson, Nathan; Stanley, Will A.; Shearston, Kate; Small, Ian D.; Barkan, Alice; Bond, Charles S.
2015-01-01
The pentatricopeptide repeat (PPR) protein family is a large family of RNA-binding proteins that is characterized by tandem arrays of a degenerate 35-amino-acid motif which form an α-solenoid structure. PPR proteins influence the editing, splicing, translation and stability of specific RNAs in mitochondria and chloroplasts. Zea mays PPR10 is amongst the best studied PPR proteins, where sequence-specific binding to two RNA transcripts, atpH and psaJ, has been demonstrated to follow a recognition code where the identity of two amino acids per repeat determines the base-specificity. A recently solved ZmPPR10:psaJ complex crystal structure suggested a homodimeric complex with considerably fewer sequence-specific protein–RNA contacts than inferred previously. Here we describe the solution structure of the ZmPPR10:atpH complex using size-exclusion chromatography-coupled synchrotron small-angle X-ray scattering (SEC-SY-SAXS). Our results support prior evidence that PPR10 binds RNA as a monomer, and that it does so in a manner that is commensurate with a canonical and predictable RNA-binding mode across much of the RNA–protein interface. PMID:25609698
How the Sequence of a Gene Specifies Structural Symmetry in Proteins
Shen, Xiaojuan; Huang, Tongcheng; Wang, Guanyu; Li, Guanglin
2015-01-01
Internal symmetry is commonly observed in the majority of fundamental protein folds. Meanwhile, sufficient evidence suggests that nascent polypeptide chains of proteins have the potential to start the co-translational folding process and this process allows mRNA to contain additional information on protein structure. In this paper, we study the relationship between gene sequences and protein structures from the viewpoint of symmetry to explore how gene sequences code for structural symmetry in proteins. We found that, for a set of two-fold symmetric proteins from left-handed beta-helix fold, intragenic symmetry always exists in their corresponding gene sequences. Meanwhile, codon usage bias and local mRNA structure might be involved in modulating translation speed for the formation of structural symmetry: a major decrease of local codon usage bias in the middle of the codon sequence can be identified as a common feature; and major or consecutive decreases in local mRNA folding energy near the boundaries of the symmetric substructures can also be observed. The results suggest that gene duplication and fusion may be an evolutionarily conserved process for this protein fold. In addition, the usage of rare codons and the formation of higher order of secondary structure near the boundaries of symmetric substructures might have coevolved as conserved mechanisms to slow down translation elongation and to facilitate effective folding of symmetric substructures. These findings provide valuable insights into our understanding of the mechanisms of translation and its evolution, as well as the design of proteins via symmetric modules. PMID:26641668
Ezkurdia, Iakes; del Pozo, Angela; Frankish, Adam; Rodriguez, Jose Manuel; Harrow, Jennifer; Ashman, Keith; Valencia, Alfonso; Tress, Michael L.
2012-01-01
Advances in high-throughput mass spectrometry are making proteomics an increasingly important tool in genome annotation projects. Peptides detected in mass spectrometry experiments can be used to validate gene models and verify the translation of putative coding sequences (CDSs). Here, we have identified peptides that cover 35% of the genes annotated by the GENCODE consortium for the human genome as part of a comprehensive analysis of experimental spectra from two large publicly available mass spectrometry databases. We detected the translation to protein of “novel” and “putative” protein-coding transcripts as well as transcripts annotated as pseudogenes and nonsense-mediated decay targets. We provide a detailed overview of the population of alternatively spliced protein isoforms that are detectable by peptide identification methods. We found that 150 genes expressed multiple alternative protein isoforms. This constitutes the largest set of reliably confirmed alternatively spliced proteins yet discovered. Three groups of genes were highly overrepresented. We detected alternative isoforms for 10 of the 25 possible heterogeneous nuclear ribonucleoproteins, proteins with a key role in the splicing process. Alternative isoforms generated from interchangeable homologous exons and from short indels were also significantly enriched, both in human experiments and in parallel analyses of mouse and Drosophila proteomics experiments. Our results show that a surprisingly high proportion (almost 25%) of the detected alternative isoforms are only subtly different from their constitutive counterparts. Many of the alternative splicing events that give rise to these alternative isoforms are conserved in mouse. It was striking that very few of these conserved splicing events broke Pfam functional domains or would damage globular protein structures. This evidence of a strong bias toward subtle differences in CDS and likely conserved cellular function and structure is remarkable and strongly suggests that the translation of alternative transcripts may be subject to selective constraints. PMID:22446687
Comparison of stochastic optimization methods for all-atom folding of the Trp-Cage protein.
Schug, Alexander; Herges, Thomas; Verma, Abhinav; Lee, Kyu Hwan; Wenzel, Wolfgang
2005-12-09
The performances of three different stochastic optimization methods for all-atom protein structure prediction are investigated and compared. We use the recently developed all-atom free-energy force field (PFF01), which was demonstrated to correctly predict the native conformation of several proteins as the global optimum of the free energy surface. The trp-cage protein (PDB-code 1L2Y) is folded with the stochastic tunneling method, a modified parallel tempering method, and the basin-hopping technique. All the methods correctly identify the native conformation, and their relative efficiency is discussed.
Nmf9 Encodes a Highly Conserved Protein Important to Neurological Function in Mice and Flies.
Zhang, Shuxiao; Ross, Kevin D; Seidner, Glen A; Gorman, Michael R; Poon, Tiffany H; Wang, Xiaobo; Keithley, Elizabeth M; Lee, Patricia N; Martindale, Mark Q; Joiner, William J; Hamilton, Bruce A
2015-07-01
Many protein-coding genes identified by genome sequencing remain without functional annotation or biological context. Here we define a novel protein-coding gene, Nmf9, based on a forward genetic screen for neurological function. ENU-induced and genome-edited null mutations in mice produce deficits in vestibular function, fear learning and circadian behavior, which correlated with Nmf9 expression in inner ear, amygdala, and suprachiasmatic nuclei. Homologous genes from unicellular organisms and invertebrate animals predict interactions with small GTPases, but the corresponding domains are absent in mammalian Nmf9. Intriguingly, homozygotes for null mutations in the Drosophila homolog, CG45058, show profound locomotor defects and premature death, while heterozygotes show striking effects on sleep and activity phenotypes. These results link a novel gene orthology group to discrete neurological functions, and show conserved requirement across wide phylogenetic distance and domain level structural changes.
F2Dock: Fast Fourier Protein-Protein Docking
Bajaj, Chandrajit; Chowdhury, Rezaul; Siddavanahalli, Vinay
2009-01-01
The functions of proteins is often realized through their mutual interactions. Determining a relative transformation for a pair of proteins and their conformations which form a stable complex, reproducible in nature, is known as docking. It is an important step in drug design, structure determination and understanding function and structure relationships. In this paper we extend our non-uniform fast Fourier transform docking algorithm to include an adaptive search phase (both translational and rotational) and thereby speed up its execution. We have also implemented a multithreaded version of the adaptive docking algorithm for even faster execution on multicore machines. We call this protein-protein docking code F2Dock (F2 = Fast Fourier). We have calibrated F2Dock based on an extensive experimental study on a list of benchmark complexes and conclude that F2Dock works very well in practice. Though all docking results reported in this paper use shape complementarity and Coulombic potential based scores only, F2Dock is structured to incorporate Lennard-Jones potential and re-ranking docking solutions based on desolvation energy. PMID:21071796
Miller, Andrew D
2015-02-01
A sense peptide can be defined as a peptide whose sequence is coded by the nucleotide sequence (read 5' → 3') of the sense (positive) strand of DNA. Conversely, an antisense (complementary) peptide is coded by the corresponding nucleotide sequence (read 5' → 3') of the antisense (negative) strand of DNA. Research has been accumulating steadily to suggest that sense peptides are capable of specific interactions with their corresponding antisense peptides. Unfortunately, although more and more examples of specific sense-antisense peptide interactions are emerging, the very idea of such interactions does not conform to standard biology dogma and so there remains a sizeable challenge to lift this concept from being perceived as a peripheral phenomenon if not worse, into becoming part of the scientific mainstream. Specific interactions have now been exploited for the inhibition of number of widely different protein-protein and protein-receptor interactions in vitro and in vivo. Further, antisense peptides have also been used to induce the production of antibodies targeted to specific receptors or else the production of anti-idiotypic antibodies targeted against auto-antibodies. Such illustrations of utility would seem to suggest that observed sense-antisense peptide interactions are not just the consequence of a sequence of coincidental 'lucky-hits'. Indeed, at the very least, one might conclude that sense-antisense peptide interactions represent a potentially new and different source of leads for drug discovery. But could there be more to come from studies in this area? Studies on the potential mechanism of sense-antisense peptide interactions suggest that interactions may be driven by amino acid residue interactions specified from the genetic code. If so, such specified amino acid residue interactions could form the basis for an even wider amino acid residue interaction code (proteomic code) that links gene sequences to actual protein structure and function, even entire genomes to entire proteomes. The possibility that such a proteomic code should exist is discussed. So too the potential implications for biology and pharmaceutical science are also discussed were such a code to exist.
APPRIS 2017: principal isoforms for multiple gene sets
Rodriguez-Rivas, Juan; Di Domenico, Tomás; Vázquez, Jesús; Valencia, Alfonso
2018-01-01
Abstract The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the ‘principal’ isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants. PMID:29069475
Crystal structure of reverse gyrase: insights into the positive supercoiling of DNA
Rodríguez, A.Chapin; Stock, Daniela
2002-01-01
Reverse gyrase is the only topoisomerase known to positively supercoil DNA. The protein appears to be unique to hyperthermophiles, where its activity is believed to protect the genome from denaturation. The 120 kDa enzyme is the only member of the type I topoisomerase family that requires ATP, which is bound and hydrolysed by a helicase-like domain. We have determined the crystal structure of reverse gyrase from Archaeoglobus fulgidus in the presence and absence of nucleotide cofactor. The structure provides the first view of an intact supercoiling enzyme, explains mechanistic differences from other type I topoisomerases and suggests a model for how the two domains of the protein cooperate to positively supercoil DNA. Coordinates have been deposited in the Protein Data Bank under accession codes 1GKU and 1GL9. PMID:11823434
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bienz, K.; Egger, D.; Troxler, M.
1990-03-01
Transcriptionally active replication complexes bound to smooth membrane vesicles were isolated from poliovirus-infected cells. In electron microscopic, negatively stained preparations, the replication complex appeared as an irregularly shaped, oblong structure attached to several virus-induced vesicles of a rosettelike arrangement. Electron microscopic immunocytochemistry of such preparations demonstrated that the poliovirus replication complex contains the proteins coded by the P2 genomic region (P2 proteins) in a membrane-associated form. In addition, the P2 proteins are also associated with viral RNA, and they can be cross-linked to viral RNA by UV irradiation. Guanidine hydrochloride prevented the P2 proteins from becoming membrane bound but didmore » not change their association with viral RNA. The findings allow the conclusion that the protein 2C or 2C-containing precursor(s) is responsible for the attachment of the viral RNA to the vesicular membrane and for the spatial organization of the replication complex necessary for its proper functioning in viral transcription. A model for the structure of the viral replication complex and for the function of the 2C-containing P2 protein(s) and the vesicular membranes is proposed.« less
A DEK Domain-Containing Protein Modulates Chromatin Structure and Function in Arabidopsis[W][OPEN
Waidmann, Sascha; Kusenda, Branislav; Mayerhofer, Juliane; Mechtler, Karl; Jonak, Claudia
2014-01-01
Chromatin is a major determinant in the regulation of virtually all DNA-dependent processes. Chromatin architectural proteins interact with nucleosomes to modulate chromatin accessibility and higher-order chromatin structure. The evolutionarily conserved DEK domain-containing protein is implicated in important chromatin-related processes in animals, but little is known about its DNA targets and protein interaction partners. In plants, the role of DEK has remained elusive. In this work, we identified DEK3 as a chromatin-associated protein in Arabidopsis thaliana. DEK3 specifically binds histones H3 and H4. Purification of other proteins associated with nuclear DEK3 also established DNA topoisomerase 1α and proteins of the cohesion complex as in vivo interaction partners. Genome-wide mapping of DEK3 binding sites by chromatin immunoprecipitation followed by deep sequencing revealed enrichment of DEK3 at protein-coding genes throughout the genome. Using DEK3 knockout and overexpressor lines, we show that DEK3 affects nucleosome occupancy and chromatin accessibility and modulates the expression of DEK3 target genes. Furthermore, functional levels of DEK3 are crucial for stress tolerance. Overall, data indicate that DEK3 contributes to modulation of Arabidopsis chromatin structure and function. PMID:25387881
Romero, Roberto; Tarca, Adi; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S.; Kalita, Cynthia A.; Cai, Juan; Yeo, Lami; Lipovich, Leonard
2014-01-01
Objective The mechanisms responsible for normal and abnormal parturition are poorly understood. Myometrial activation leading to regular uterine contractions is a key component of labor. Dysfunctional labor (arrest of dilatation and/or descent) is a leading indication for cesarean delivery. Compelling evidence suggests that most of these disorders are functional in nature, and not the result of cephalopelvic disproportion. The methodology and the datasets afforded by the post-genomic era provide novel opportunities to understand and target gene functions in these disorders. In 2012, the ENCODE Consortium elucidated the extraordinary abundance and functional complexity of long non-coding RNA genes in the human genome. The purpose of the study was to identify differentially expressed long non-coding RNA genes in human myometrium in women in spontaneous labor at term. Materials and Methods Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n=19) and women in spontaneous labor at term (n=20). RNA was extracted and profiled using an Illumina® microarray platform. The analysis of the protein coding genes from this study has been previously reported. Here, we have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. Results Upon considering more than 18,498 distinct lncRNA genes compiled nonredundantly from public experimental data sources, and interrogating 2,634 that matched Illumina microarray probes, we identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an independent experimental method. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site that lacked evolutionary conservation beyond primates. Conclusions We provide for the first time evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known, as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term. PMID:24168098
Haas, Brian J; Salzberg, Steven L; Zhu, Wei; Pertea, Mihaela; Allen, Jonathan E; Orvis, Joshua; White, Owen; Buell, C Robin; Wortman, Jennifer R
2008-01-01
EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation. PMID:18190707
DOE Office of Scientific and Technical Information (OSTI.GOV)
Domingo Meza-Aguilar, J.; Laboratorio de Patogenicidad Bacteriana, Unidad de Hemato Oncología e Investigación, Hospital Infantil de México Federico Gómez 06720, D.F.; Fromme, Petra
Highlights: • X-ray crystal structure of the passenger domain of Plasmid encoded toxin at 2.3 Å. • Structural differences between Pet passenger domain and EspP protein are described. • High flexibility of the C-terminal beta helix is structurally assigned. - Abstract: Autotransporters (ATs) represent a superfamily of proteins produced by a variety of pathogenic bacteria, which include the pathogenic groups of Escherichia coli (E. coli) associated with gastrointestinal and urinary tract infections. We present the first X-ray structure of the passenger domain from the Plasmid-encoded toxin (Pet) a 100 kDa protein at 2.3 Å resolution which is a cause ofmore » acute diarrhea in both developing and industrialized countries. Pet is a cytoskeleton-altering toxin that induces loss of actin stress fibers. While Pet (pdb code: 4OM9) shows only a sequence identity of 50% compared to the closest related protein sequence, extracellular serine protease plasmid (EspP) the structural features of both proteins are conserved. A closer structural look reveals that Pet contains a β-pleaded sheet at the sequence region of residues 181–190, the corresponding structural domain in EspP consists of a coiled loop. Secondary, the Pet passenger domain features a more pronounced beta sheet between residues 135 and 143 compared to the structure of EspP.« less
Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors.
Feinstein, Wei P; Moreno, Juana; Jarrell, Mark; Brylinski, Michal
2015-06-01
Intel Xeon Phi is a new addition to the family of powerful parallel accelerators. The range of its potential applications in computationally driven research is broad; however, at present, the repository of scientific codes is still relatively limited. In this study, we describe the development and benchmarking of a parallel version of eFindSite, a structural bioinformatics algorithm for the prediction of ligand-binding sites in proteins. Implemented for the Intel Xeon Phi platform, the parallelization of the structure alignment portion of eFindSite using pragma-based OpenMP brings about the desired performance improvements, which scale well with the number of computing cores. Compared to a serial version, the parallel code runs 11.8 and 10.1 times faster on the CPU and the coprocessor, respectively; when both resources are utilized simultaneously, the speedup is 17.6. For example, ligand-binding predictions for 501 benchmarking proteins are completed in 2.1 hours on a single Stampede node equipped with the Intel Xeon Phi card compared to 3.1 hours without the accelerator and 36.8 hours required by a serial version. In addition to the satisfactory parallel performance, porting existing scientific codes to the Intel Xeon Phi architecture is relatively straightforward with a short development time due to the support of common parallel programming models by the coprocessor. The parallel version of eFindSite is freely available to the academic community at www.brylinski.org/efindsite.
PROVAT: a tool for Voronoi tessellation analysis of protein structures and complexes.
Gore, Swanand P; Burke, David F; Blundell, Tom L
2005-08-01
Voronoi tessellation has proved to be a useful tool in protein structure analysis. We have developed PROVAT, a versatile public domain software that enables computation and visualization of Voronoi tessellations of proteins and protein complexes. It is a set of Python scripts that integrate freely available specialized software (Qhull, Pymol etc.) into a pipeline. The calculation component of the tool computes Voronoi tessellation of a given protein system in a way described by a user-supplied XML recipe and stores resulting neighbourhood information as text files with various styles. The Python pickle file generated in the process is used by the visualization component, a Pymol plug-in, that offers a GUI to explore the tessellation visually. PROVAT source code can be downloaded from http://raven.bioc.cam.ac.uk/~swanand/Provat1, which also provides a webserver for its calculation component, documentation and examples.
Molecular dynamics simulation of the structure and dynamics of 5-HT3 serotonin receptor
NASA Astrophysics Data System (ADS)
Antonov, M. Yu.; Popinako, A. V.; Prokopiev, G. A.
2016-10-01
In this work, we investigated structure, dynamics and ion transportation in transmembrane domain of the 5-HT3 serotonin receptor. High-resolution (0.35 nm) structure of the 5-HT3 receptor in complex with stabilizing nanobodies was determined by protein crystallography in 2014 (Protein data bank (PDB) code 4PIR). Transmembrane domain of the structure was prepared in complex with explicit membrane environment (1-palmitoyl-2-oleoyl-sn-glycero-3-phosphatidylcholine (POPC)) and solvent (TIP3P water model). Molecular dynamics protocols for simulation and stabilization of the transmembrane domain of the 5-HT3 receptor model were developed and 60 ns simulation of the structure was conducted in order to explore structural parameters of the system. We estimated the mean force profile for Na+ ions using umbrella sampling method.
Identification and characterization of a novel zebrafish (Danio rerio) pentraxin-carbonic anhydrase.
Patrikainen, Maarit S; Tolvanen, Martti E E; Aspatwar, Ashok; Barker, Harlan R; Ortutay, Csaba; Jänis, Janne; Laitaoja, Mikko; Hytönen, Vesa P; Azizi, Latifeh; Manandhar, Prajwol; Jáger, Edit; Vullo, Daniela; Kukkurainen, Sampo; Hilvo, Mika; Supuran, Claudiu T; Parkkila, Seppo
2017-01-01
Carbonic anhydrases (CAs) are ubiquitous, essential enzymes which catalyze the conversion of carbon dioxide and water to bicarbonate and H + ions. Vertebrate genomes generally contain gene loci for 15-21 different CA isoforms, three of which are enzymatically inactive. CA VI is the only secretory protein of the enzymatically active isoforms. We discovered that non-mammalian CA VI contains a C-terminal pentraxin (PTX) domain, a novel combination for both CAs and PTXs. We isolated and sequenced zebrafish ( Danio rerio ) CA VI cDNA, complete with the sequence coding for the PTX domain, and produced the recombinant CA VI-PTX protein. Enzymatic activity and kinetic parameters were measured with a stopped-flow instrument. Mass spectrometry, analytical gel filtration and dynamic light scattering were used for biophysical characterization. Sequence analyses and Bayesian phylogenetics were used in generating hypotheses of protein structure and CA VI gene evolution. A CA VI-PTX antiserum was produced, and the expression of CA VI protein was studied by immunohistochemistry. A knock-down zebrafish model was constructed, and larvae were observed up to five days post-fertilization (dpf). The expression of ca6 mRNA was quantitated by qRT-PCR in different developmental times in morphant and wild-type larvae and in different adult fish tissues. Finally, the swimming behavior of the morphant fish was compared to that of wild-type fish. The recombinant enzyme has a very high carbonate dehydratase activity. Sequencing confirms a 530-residue protein identical to one of the predicted proteins in the Ensembl database (ensembl.org). The protein is pentameric in solution, as studied by gel filtration and light scattering, presumably joined by the PTX domains. Mass spectrometry confirms the predicted signal peptide cleavage and disulfides, and N-glycosylation in two of the four observed glycosylation motifs. Molecular modeling of the pentamer is consistent with the modifications observed in mass spectrometry. Phylogenetics and sequence analyses provide a consistent hypothesis of the evolutionary history of domains associated with CA VI in mammals and non-mammals. Briefly, the evidence suggests that ancestral CA VI was a transmembrane protein, the exon coding for the cytoplasmic domain was replaced by one coding for PTX domain, and finally, in the therian lineage, the PTX-coding exon was lost. We knocked down CA VI expression in zebrafish embryos with antisense morpholino oligonucleotides, resulting in phenotype features of decreased buoyancy and swim bladder deflation in 4 dpf larvae. These findings provide novel insights into the evolution, structure, and function of this unique CA form.
Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures
Stark, Alexander; Lin, Michael F.; Kheradpour, Pouya; Pedersen, Jakob S.; Parts, Leopold; Carlson, Joseph W.; Crosby, Madeline A.; Rasmussen, Matthew D.; Roy, Sushmita; Deoras, Ameya N.; Ruby, J. Graham; Brennecke, Julius; Hodges, Emily; Hinrichs, Angie S.; Caspi, Anat; Paten, Benedict; Park, Seung-Won; Han, Mira V.; Maeder, Morgan L.; Polansky, Benjamin J.; Robson, Bryanne E.; Aerts, Stein; van Helden, Jacques; Hassan, Bassem; Gilbert, Donald G.; Eastman, Deborah A.; Rice, Michael; Weir, Michael; Hahn, Matthew W.; Park, Yongkyu; Dewey, Colin N.; Pachter, Lior; Kent, W. James; Haussler, David; Lai, Eric C.; Bartel, David P.; Hannon, Gregory J.; Kaufman, Thomas C.; Eisen, Michael B.; Clark, Andrew G.; Smith, Douglas; Celniker, Susan E.; Gelbart, William M.; Kellis, Manolis
2008-01-01
Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies. PMID:17994088
DOE Office of Scientific and Technical Information (OSTI.GOV)
White, Richard A.; Brown, Joseph M.; Colby, Sean M.
ATLAS (Automatic Tool for Local Assembly Structures) is a comprehensive multiomics data analysis pipeline that is massively parallel and scalable. ATLAS contains a modular analysis pipeline for assembly, annotation, quantification and genome binning of metagenomics and metatranscriptomics data and a framework for reference metaproteomic database construction. ATLAS transforms raw sequence data into functional and taxonomic data at the microbial population level and provides genome-centric resolution through genome binning. ATLAS provides robust taxonomy based on majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS provides robust taxonomy based onmore » majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS is user-friendly, easy install through bioconda maintained as open-source on GitHub, and is implemented in Snakemake for modular customizable workflows.« less
Structural Genomics of Bacterial Virulence Factors
2006-05-01
positioned in the unit cell by Molecular Replacement (Protein Data Bank ( PDB ) ID code 1acc)6 using MOLREP, and refined with REFMAC version 5.0 (ref. 24...increase our understanding of the molecular mechanisms of pathogenicity, putting us in a stronger position to anticipate and react to emerging...term, the accumulated structural information will generate important and testable hypotheses that will increase our understanding of the molecular
Greenwald, Scott H.; Kuchenbecker, James A.; Rowlan, Jessica S.; Neitz, Jay; Neitz, Maureen
2017-01-01
Purpose Human long (L) and middle (M) wavelength cone opsin genes are highly variable due to intermixing. Two L/M cone opsin interchange mutants, designated LIAVA and LVAVA, are associated with clinical diagnoses, including red-green color vision deficiency, blue cone monochromacy, cone degeneration, myopia, and Bornholm Eye Disease. Because the protein and splicing codes are carried by the same nucleotides, intermixing L and M genes can cause disease by affecting protein structure and splicing. Methods Genetically engineered mice were created to allow investigation of the consequences of altered protein structure alone, and the effects on cone morphology were examined using immunohistochemistry. In humans and mice, cone function was evaluated using the electroretinogram (ERG) under L/M- or short (S) wavelength cone isolating conditions. Effects of LIAVA and LVAVA genes on splicing were evaluated using a minigene assay. Results ERGs and histology in mice revealed protein toxicity for the LVAVA but not for the LIAVA opsin. Minigene assays showed that the dominant messenger RNA (mRNA) was aberrantly spliced for both variants; however, the LVAVA gene produced a small but significant amount of full-length mRNA and LVAVA subjects had correspondingly reduced ERG amplitudes. In contrast, the LIAVA subject had no L/M cone ERG. Conclusions Dramatic differences in phenotype can result from seemingly minor differences in genotype through divergent effects on the dual amino acid and splicing codes. Translational Relevance The mechanism by which individual mutations contribute to clinical phenotypes provides valuable information for diagnosis and prognosis of vision disorders associated with L/M interchange mutations, and it informs strategies for developing therapies. PMID:28516000
Prediction of plant lncRNA by ensemble machine learning classifiers.
Simopoulos, Caitlin M A; Weretilnyk, Elizabeth A; Golding, G Brian
2018-05-02
In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.
Houston, Simon; Lithgow, Karen Vivien; Osbak, Kara Krista; Kenyon, Chris Richard; Cameron, Caroline E
2018-05-16
Syphilis continues to be a major global health threat with 11 million new infections each year, and a global burden of 36 million cases. The causative agent of syphilis, Treponema pallidum subspecies pallidum, is a highly virulent bacterium, however the molecular mechanisms underlying T. pallidum pathogenesis remain to be definitively identified. This is due to the fact that T. pallidum is currently uncultivatable, inherently fragile and thus difficult to work with, and phylogenetically distinct with no conventional virulence factor homologs found in other pathogens. In fact, approximately 30% of its predicted protein-coding genes have no known orthologs or assigned functions. Here we employed a structural bioinformatics approach using Phyre2-based tertiary structure modeling to improve our understanding of T. pallidum protein function on a proteome-wide scale. Phyre2-based tertiary structure modeling generated high-confidence predictions for 80% of the T. pallidum proteome (780/978 predicted proteins). Tertiary structure modeling also inferred the same function as primary structure-based annotations from genome sequencing pipelines for 525/605 proteins (87%), which represents 54% (525/978) of all T. pallidum proteins. Of the 175 T. pallidum proteins modeled with high confidence that were not assigned functions in the previously annotated published proteome, 167 (95%) were able to be assigned predicted functions. Twenty-one of the 175 hypothetical proteins modeled with high confidence were also predicted to exhibit significant structural similarity with proteins experimentally confirmed to be required for virulence in other pathogens. Phyre2-based structural modeling is a powerful bioinformatics tool that has provided insight into the potential structure and function of the majority of T. pallidum proteins and helped validate the primary structure-based annotation of more than 50% of all T. pallidum proteins with high confidence. This work represents the first T. pallidum proteome-wide structural modeling study and is one of few studies to apply this approach for the functional annotation of a whole proteome.
InterPred: A pipeline to identify and model protein-protein interactions.
Mirabello, Claudio; Wallner, Björn
2017-06-01
Protein-protein interactions (PPI) are crucial for protein function. There exist many techniques to identify PPIs experimentally, but to determine the interactions in molecular detail is still difficult and very time-consuming. The fact that the number of PPIs is vastly larger than the number of individual proteins makes it practically impossible to characterize all interactions experimentally. Computational approaches that can bridge this gap and predict PPIs and model the interactions in molecular detail are greatly needed. Here we present InterPred, a fully automated pipeline that predicts and model PPIs from sequence using structural modeling combined with massive structural comparisons and molecular docking. A key component of the method is the use of a novel random forest classifier that integrate several structural features to distinguish correct from incorrect protein-protein interaction models. We show that InterPred represents a major improvement in protein-protein interaction detection with a performance comparable or better than experimental high-throughput techniques. We also show that our full-atom protein-protein complex modeling pipeline performs better than state of the art protein docking methods on a standard benchmark set. In addition, InterPred was also one of the top predictors in the latest CAPRI37 experiment. InterPred source code can be downloaded from http://wallnerlab.org/InterPred Proteins 2017; 85:1159-1170. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Ligand Binding Site Detection by Local Structure Alignment and Its Performance Complementarity
Lee, Hui Sun; Im, Wonpil
2013-01-01
Accurate determination of potential ligand binding sites (BS) is a key step for protein function characterization and structure-based drug design. Despite promising results of template-based BS prediction methods using global structure alignment (GSA), there is a room to improve the performance by properly incorporating local structure alignment (LSA) because BS are local structures and often similar for proteins with dissimilar global folds. We present a template-based ligand BS prediction method using G-LoSA, our LSA tool. A large benchmark set validation shows that G-LoSA predicts drug-like ligands’ positions in single-chain protein targets more precisely than TM-align, a GSA-based method, while the overall success rate of TM-align is better. G-LoSA is particularly efficient for accurate detection of local structures conserved across proteins with diverse global topologies. Recognizing the performance complementarity of G-LoSA to TM-align and a non-template geometry-based method, fpocket, a robust consensus scoring method, CMCS-BSP (Complementary Methods and Consensus Scoring for ligand Binding Site Prediction), is developed and shows improvement on prediction accuracy. The G-LoSA source code is freely available at http://im.bioinformatics.ku.edu/GLoSA. PMID:23957286
Pokkuluri, P Raj; Dwulit-Smith, Jeff; Duke, Norma E; Wilton, Rosemarie; Mack, Jamey C; Bearden, Jessica; Rakowski, Ella; Babnigg, Gyorgy; Szurmant, Hendrik; Joachimiak, Andrzej; Schiffer, Marianne
2013-01-01
Anaeromyxobacter dehalogenans is a δ-proteobacterium found in diverse soils and sediments. It is of interest in bioremediation efforts due to its dechlorination and metal-reducing capabilities. To gain an understanding on A. dehalogenans' abilities to adapt to diverse environments we analyzed its signal transduction proteins. The A. dehalogenans genome codes for a large number of sensor histidine kinases (HK) and methyl-accepting chemotaxis proteins (MCP); among these 23 HK and 11 MCP proteins have a sensor domain in the periplasm. These proteins most likely contribute to adaptation to the organism's surroundings. We predicted their three-dimensional folds and determined the structures of two of the periplasmic sensor domains by X-ray diffraction. Most of the domains are predicted to have either PAS-like or helical bundle structures, with two predicted to have solute-binding protein fold, and another predicted to have a 6-phosphogluconolactonase like fold. Atomic structures of two sensor domains confirmed the respective fold predictions. The Adeh_2942 sensor (HK) was found to have a helical bundle structure, and the Adeh_3718 sensor (MCP) has a PAS-like structure. Interestingly, the Adeh_3718 sensor has an acetate moiety bound in a binding site typical for PAS-like domains. Future work is needed to determine whether Adeh_3718 is involved in acetate sensing by A. dehalogenans. PMID:23897711
Computational RNomics of Drosophilids
Rose, Dominic; Hackermüller, Jörg; Washietl, Stefan; Reiche, Kristin; Hertel, Jana; Findeiß, Sven; Stadler, Peter F; Prohaska, Sonja J
2007-01-01
Background Recent experimental and computational studies have provided overwhelming evidence for a plethora of diverse transcripts that are unrelated to protein-coding genes. One subclass consists of those RNAs that require distinctive secondary structure motifs to exert their biological function and hence exhibit distinctive patterns of sequence conservation characteristic for positive selection on RNA secondary structure. The deep-sequencing of 12 drosophilid species coordinated by the NHGRI provides an ideal data set of comparative computational approaches to determine those genomic loci that code for evolutionarily conserved RNA motifs. This class of loci includes the majority of the known small ncRNAs as well as structured RNA motifs in mRNAs. We report here on a genome-wide survey using RNAz. Results We obtain 16 000 high quality predictions among which we recover the majority of the known ncRNAs. Taking a pessimistically estimated false discovery rate of 40% into account, this implies that at least some ten thousand loci in the Drosophila genome show the hallmarks of stabilizing selection action of RNA structure, and hence are most likely functional at the RNA level. A subset of RNAz predictions overlapping with TRF1 and BRF binding sites [Isogai et al., EMBO J. 26: 79–89 (2007)], which are plausible candidates of Pol III transcripts, have been studied in more detail. Among these sequences we identify several "clusters" of ncRNA candidates with striking structural similarities. Conclusion The statistical evaluation of the RNAz predictions in comparison with a similar analysis of vertebrate genomes [Washietl et al., Nat. Biotech. 23: 1383–1390 (2005)] shows that qualitatively similar fractions of structured RNAs are found in introns, UTRs, and intergenic regions. The intergenic RNA structures, however, are concentrated much more closely around known protein-coding loci, suggesting that flies have significantly smaller complement of independent structured ncRNAs compared to mammals. PMID:17996037
Graf, Louis; Kim, Yae Jin; Cho, Ga Youn; Miller, Kathy Ann
2017-01-01
Coccophora langsdorfii (Turner) Greville (Fucales) is an intertidal brown alga that is endemic to Northeast Asia and increasingly endangered by habitat loss and climate change. We sequenced the complete circular plastid and mitochondrial genomes of C. langsdorfii. The circular plastid genome is 124,450 bp and contains 139 protein-coding, 28 tRNA and 6 rRNA genes. The circular mitochondrial genome is 35,660 bp and contains 38 protein-coding, 25 tRNA and 3 rRNA genes. The structure and gene content of the C. langsdorfii plastid genome is similar to those of other species in the Fucales. The plastid genomes of brown algae in other orders share similar gene content but exhibit large structural recombination. The large in-frame insert in the cox2 gene in the mitochondrial genome of C. langsdorfii is typical of other brown algae. We explored the effect of this insertion on the structure and function of the cox2 protein. We estimated the usefulness of 135 plastid genes and 35 mitochondrial genes for developing molecular markers. This study shows that 29 organellar genes will prove efficient for resolving brown algal phylogeny. In addition, we propose a new molecular marker suitable for the study of intraspecific genetic diversity that should be tested in a large survey of populations of C. langsdorfii. PMID:29095864
Complete genome sequence of Enterobacter aerogenes KCTC 2190.
Shin, Sang Heum; Kim, Sewhan; Kim, Jae Young; Lee, Soojin; Um, Youngsoon; Oh, Min-Kyu; Kim, Young-Rok; Lee, Jinwon; Yang, Kap-Seok
2012-05-01
This is the first complete genome sequence of the Enterobacter aerogenes species. Here we present the genome sequence of E. aerogenes KCTC 2190, which contains 5,280,350 bp with a G + C content of 54.8 mol%, 4,912 protein-coding genes, and 109 structural RNAs.
The Evolution and Expression Pattern of Human Overlapping lncRNA and Protein-coding Gene Pairs.
Ning, Qianqian; Li, Yixue; Wang, Zhen; Zhou, Songwen; Sun, Hong; Yu, Guangjun
2017-03-27
Long non-coding RNA overlapping with protein-coding gene (lncRNA-coding pair) is a special type of overlapping genes. Protein-coding overlapping genes have been well studied and increasing attention has been paid to lncRNAs. By studying lncRNA-coding pairs in human genome, we showed that lncRNA-coding pairs were more likely to be generated by overprinting and retaining genes in lncRNA-coding pairs were given higher priority than non-overlapping genes. Besides, the preference of overlapping configurations preserved during evolution was based on the origin of lncRNA-coding pairs. Further investigations showed that lncRNAs promoting the splicing of their embedded protein-coding partners was a unilateral interaction, but the existence of overlapping partners improving the gene expression was bidirectional and the effect was decreased with the increased evolutionary age of genes. Additionally, the expression of lncRNA-coding pairs showed an overall positive correlation and the expression correlation was associated with their overlapping configurations, local genomic environment and evolutionary age of genes. Comparison of the expression correlation of lncRNA-coding pairs between normal and cancer samples found that the lineage-specific pairs including old protein-coding genes may play an important role in tumorigenesis. This work presents a systematically comprehensive understanding of the evolution and the expression pattern of human lncRNA-coding pairs.
López-Wilchis, Ricardo; Del Río-Portilla, Miguel Ángel; Guevara-Chumacero, Luis Manuel
2017-02-01
We described the complete mitochondrial genome (mitogenome) of the Wagner's mustached bat, Pteronotus personatus, a species belonging to the family Mormoopidae, and compared it with other published mitogenomes of bats (Chiroptera). The mitogenome of P. personatus was 16,570 bp long and contained a typically conserved structure including 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and one control region (D-loop). Most of the genes were encoded on the H-strand, except for eight tRNA and the ND6 genes. The order of protein-coding and rRNA genes was highly conserved in all mitogenomes. All protein-coding genes started with an ATG codon, except for ND2, ND3, and ND5, which initiated with ATA, and terminated with the typical stop codon TAA/TAG or the codon AGA. Phylogenetic trees constructed using Maximum Parsimony, Maximum Likelihood, and Bayesian inference methods showed an identical topology and indicated the monophyly of different families of bats (Mormoopidae, Phyllostomidae, Vespertilionidae, Rhinolophidae, and Pteropopidae) and the existence of two major clades corresponding to the suborders Yangochiroptera and Yinpterochiroptera. The mitogenome sequence provided here will be useful for further phylogenetic analyses and population genetic studies in mormoopid bats.
Bohm, C.; Chen, F.; Sevalle, J.; Qamar, S.; Dodd, R.; Li, Y.; Schmitt-Ulms, G.; Fraser, P.E.; St George-Hyslop, P.H.
2015-01-01
Inherited variants in multiple different genes are associated with increased risk for Alzheimer's disease (AD). In many of these genes, the inherited variants alter some aspect of the production or clearance of the neurotoxic amyloid β-peptide (Aβ). Thus missense, splice site or duplication mutants in the presenilin 1 (PS1), presenilin 2 (PS2) or the amyloid precursor protein (APP) genes, which alter the levels or shift the balance of Aβ produced, are associated with rare, highly penetrant autosomal dominant forms of Familial Alzheimer's Disease (FAD). Similarly, the more prevalent late-onset forms of AD are associated with both coding and non-coding variants in genes such as SORL1, PICALM and ABCA7 that affect the production and clearance of Aβ. This review summarises some of the recent molecular and structural work on the role of these genes and the proteins coded by them in the biology of Aβ. We also briefly outline how the emerging knowledge about the pathways involved in Aβ generation and clearance can be potentially targeted therapeutically. This article is part of Special Issue entitled "Neuronal Protein". PMID:25748120
LaPolla, R J; Mayne, K M; Davidson, N
1984-01-01
A mouse cDNA clone has been isolated that contains the complete coding region of a protein highly homologous to the delta subunit of the Torpedo acetylcholine receptor (AcChoR). The cDNA library was constructed in the vector lambda 10 from membrane-associated poly(A)+ RNA from BC3H-1 mouse cells. Surprisingly, the delta clone was selected by hybridization with cDNA encoding the gamma subunit of the Torpedo AcChoR. The nucleotide sequence of the mouse cDNA clone contains an open reading frame of 520 amino acids. This amino acid sequence exhibits 59% and 50% sequence homology to the Torpedo AcChoR delta and gamma subunits, respectively. However, the mouse nucleotide sequence has several stretches of high homology with the Torpedo gamma subunit cDNA, but not with delta. The mouse protein has the same general structural features as do the Torpedo subunits. It is encoded by a 3.3-kilobase mRNA. There is probably only one, but at most two, chromosomal genes coding for this or closely related sequences. Images PMID:6096870
Statistical inference of protein structural alignments using information and compression.
Collier, James H; Allison, Lloyd; Lesk, Arthur M; Stuckey, Peter J; Garcia de la Banda, Maria; Konagurthu, Arun S
2017-04-01
Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. We have implemented this approach in MMLigner , the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner 's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner . arun.konagurthu@monash.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Crystal Structure of Cocosin, A Potential Food Allergen from Coconut (Cocos nucifera).
Jin, Tengchuan; Wang, Cheng; Zhang, Caiying; Wang, Yang; Chen, Yu-Wei; Guo, Feng; Howard, Andrew; Cao, Min-Jie; Fu, Tong-Jen; McHugh, Tara H; Zhang, Yuzhu
2017-08-30
Coconut (Cocos nucifera) is an important palm tree. Coconut fruit is widely consumed. The most abundant storage protein in coconut fruit is cocosin (a likely food allergen), which belongs to the 11S globulin family. Cocosin was crystallized near a century ago, but its structure remains unknown. By optimizing crystallization conditions and cryoprotectant solutions, we were able to obtain cocosin crystals that diffracted to 1.85 Å. The cocosin gene was cloned from genomic DNA isolated from dry coconut tissue. The protein sequence deduced from the predicted cocosin coding sequence was used to guide model building and structure refinement. The structure of cocosin was determined for the first time, and it revealed a typical 11S globulin feature of a double layer doughnut-shaped hexamer.
Benyo, B; Biro, J C; Benyo, Z
2004-01-01
The theory of "codon-amino acid coevolution" was first proposed by Woese in 1967. It suggests that there is a stereochemical matching - that is, affinity - between amino acids and certain of the base triplet sequences that code for those amino acids. We have constructed a common periodic table of codons and amino acids, where the nucleic acid table showed perfect axial symmetry for codons and the corresponding amino acid table also displayed periodicity regarding the biochemical properties (charge and hydrophobicity) of the 20 amino acids and the position of the stop signals. The table indicates that the middle (2/sup nd/) amino acid in the codon has a prominent role in determining some of the structural features of the amino acids. The possibility that physical contact between codons and amino acids might exist was tested on restriction enzymes. Many recognition site-like sequences were found in the coding sequences of these enzymes and as many as 73 examples of codon-amino acid co-location were observed in the 7 known 3D structures (December 2003) of endonuclease-nucleic acid complexes. These results indicate that the smallest possible units of specific nucleic acid-protein interaction are indeed the stereochemically compatible codons and amino acids.
SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments
Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic
2001-01-01
Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202
Biophysics of protein evolution and evolutionary protein biophysics
Sikosek, Tobias; Chan, Hue Sun
2014-01-01
The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence–structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by ‘hidden’ conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution. PMID:25165599
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dolan, Kyle T.; Duguid, Erica M.; He, Chuan
2011-11-17
SlyA is a master virulence regulator that controls the transcription of numerous genes in Salmonella enterica. We present here crystal structures of SlyA by itself and bound to a high-affinity DNA operator sequence in the slyA gene. SlyA interacts with DNA through direct recognition of a guanine base by Arg-65, as well as interactions between conserved Arg-86 and the minor groove and a large network of non-base-specific contacts with the sugar phosphate backbone. Our structures, together with an unpublished structure of SlyA bound to the small molecule effector salicylate (Protein Data Bank code 3DEU), reveal that, unlike many other MarRmore » family proteins, SlyA dissociates from DNA without large conformational changes when bound to this effector. We propose that SlyA and other MarR global regulators rely more on indirect readout of DNA sequence to exert control over many genes, in contrast to proteins (such as OhrR) that recognize a single operator.« less
On fuzzy semantic similarity measure for DNA coding.
Ahmad, Muneer; Jung, Low Tang; Bhuiyan, Md Al-Amin
2016-02-01
A coding measure scheme numerically translates the DNA sequence to a time domain signal for protein coding regions identification. A number of coding measure schemes based on numerology, geometry, fixed mapping, statistical characteristics and chemical attributes of nucleotides have been proposed in recent decades. Such coding measure schemes lack the biologically meaningful aspects of nucleotide data and hence do not significantly discriminate coding regions from non-coding regions. This paper presents a novel fuzzy semantic similarity measure (FSSM) coding scheme centering on FSSM codons׳ clustering and genetic code context of nucleotides. Certain natural characteristics of nucleotides i.e. appearance as a unique combination of triplets, preserving special structure and occurrence, and ability to own and share density distributions in codons have been exploited in FSSM. The nucleotides׳ fuzzy behaviors, semantic similarities and defuzzification based on the center of gravity of nucleotides revealed a strong correlation between nucleotides in codons. The proposed FSSM coding scheme attains a significant enhancement in coding regions identification i.e. 36-133% as compared to other existing coding measure schemes tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms. Copyright © 2015 Elsevier Ltd. All rights reserved.
Accessory proteins of SARS-CoV and other coronaviruses.
Liu, Ding Xiang; Fung, To Sing; Chong, Kelvin Kian-Long; Shukla, Aditi; Hilgenfeld, Rolf
2014-09-01
The huge RNA genome of SARS coronavirus comprises a number of open reading frames that code for a total of eight accessory proteins. Although none of these are essential for virus replication, some appear to have a role in virus pathogenesis. Notably, some SARS-CoV accessory proteins have been shown to modulate the interferon signaling pathways and the production of pro-inflammatory cytokines. The structural information on these proteins is also limited, with only two (p7a and p9b) having their structures determined by X-ray crystallography. This review makes an attempt to summarize the published knowledge on SARS-CoV accessory proteins, with an emphasis on their involvement in virus-host interaction. The accessory proteins of other coronaviruses are also briefly discussed. This paper forms part of a series of invited articles in Antiviral Research on "From SARS to MERS: 10 years of research on highly pathogenic human coronaviruses" (see Introduction by Hilgenfeld and Peiris (2013)). Copyright © 2014 Elsevier B.V. All rights reserved.
Transcription initiation complex structures elucidate DNA opening.
Plaschka, C; Hantsche, M; Dienemann, C; Burzinski, C; Plitzko, J; Cramer, P
2016-05-19
Transcription of eukaryotic protein-coding genes begins with assembly of the RNA polymerase (Pol) II initiation complex and promoter DNA opening. Here we report cryo-electron microscopy (cryo-EM) structures of yeast initiation complexes containing closed and open DNA at resolutions of 8.8 Å and 3.6 Å, respectively. DNA is positioned and retained over the Pol II cleft by a network of interactions between the TATA-box-binding protein TBP and transcription factors TFIIA, TFIIB, TFIIE, and TFIIF. DNA opening occurs around the tip of the Pol II clamp and the TFIIE 'extended winged helix' domain, and can occur in the absence of TFIIH. Loading of the DNA template strand into the active centre may be facilitated by movements of obstructing protein elements triggered by allosteric binding of the TFIIE 'E-ribbon' domain. The results suggest a unified model for transcription initiation with a key event, the trapping of open promoter DNA by extended protein-protein and protein-DNA contacts.
Foldability of a Natural De Novo Evolved Protein.
Bungard, Dixie; Copple, Jacob S; Yan, Jing; Chhun, Jimmy J; Kumirov, Vlad K; Foy, Scott G; Masel, Joanna; Wysocki, Vicki H; Cordes, Matthew H J
2017-11-07
The de novo evolution of protein-coding genes from noncoding DNA is emerging as a source of molecular innovation in biology. Studies of random sequence libraries, however, suggest that young de novo proteins will not fold into compact, specific structures typical of native globular proteins. Here we show that Bsc4, a functional, natural de novo protein encoded by a gene that evolved recently from noncoding DNA in the yeast S. cerevisiae, folds to a partially specific three-dimensional structure. Bsc4 forms soluble, compact oligomers with high β sheet content and a hydrophobic core, and undergoes cooperative, reversible denaturation. Bsc4 lacks a specific quaternary state, however, existing instead as a continuous distribution of oligomer sizes, and binds dyes indicative of amyloid oligomers or molten globules. The combination of native-like and non-native-like properties suggests a rudimentary fold that could potentially act as a functional intermediate in the emergence of new folded proteins de novo. Copyright © 2017 Elsevier Ltd. All rights reserved.
Molecular Dynamics Analysis of Lysozyme Protein in Ethanol-Water Mixed Solvent Environment
NASA Astrophysics Data System (ADS)
Ochije, Henry Ikechukwu
Effect of protein-solvent interaction on the protein structure is widely studied using both experimental and computational techniques. Despite such extensive studies molecular level understanding of proteins and some simple solvents is still not fully understood. This work focuses on detailed molecular dynamics simulations to study of solvent effect on lysozyme protein, using water, alcohol and different concentrations of water-alcohol mixtures as solvents. The lysozyme protein structure in water, alcohol and alcohol-water mixture (0-12% alcohol) was studied using GROMACS molecular dynamics simulation code. Compared to water environment, the lysozome structure showed remarkable changes in solvents with increasing alcohol concentration. In particular, significant changes were observed in the protein secondary structure involving alpha helices. The influence of alcohol on the lysozyme protein was investigated by studying thermodynamic and structural properties. With increasing ethanol concentration we observed a systematic increase in total energy, enthalpy, root mean square deviation (RMSD), and radius of gyration. a polynomial interpolation approach. Using the resulting polynomial equation, we could determine above quantities for any intermediate alcohol percentage. In order to validate this approach, we selected an intermediate ethanol percentage and carried out full MD simulation. The results from MD simulation were in reasonably good agreement with that obtained using polynomial approach. Hence, the polynomial approach based method proposed here eliminates the need for computationally intensive full MD analysis for the concentrations within the range (0-12%) studied in this work.
Evaluation of 10 genes encoding cardiac proteins in Doberman Pinschers with dilated cardiomyopathy.
O'Sullivan, M Lynne; O'Grady, Michael R; Pyle, W Glen; Dawson, John F
2011-07-01
To identify a causative mutation for dilated cardiomyopathy (DCM) in Doberman Pinschers by sequencing the coding regions of 10 cardiac genes known to be associated with familial DCM in humans. 5 Doberman Pinschers with DCM and congestive heart failure and 5 control mixed-breed dogs that were euthanized or died. RNA was extracted from frozen ventricular myocardial samples from each dog, and first-strand cDNA was synthesized via reverse transcription, followed by PCR amplification with gene-specific primers. Ten cardiac genes were analyzed: cardiac actin, α-actinin, α-tropomyosin, β-myosin heavy chain, metavinculin, muscle LIM protein, myosinbinding protein C, tafazzin, titin-cap (telethonin), and troponin T. Sequences for DCM-affected and control dogs and the published canine genome were compared. None of the coding sequences yielded a common causative mutation among all Doberman Pinscher samples. However, 3 variants were identified in the α-actinin gene in the DCM-affected Doberman Pinschers. One of these variants, identified in 2 of the 5 Doberman Pinschers, resulted in an amino acid change in the rod-forming triple coiled-coil domain. Mutations in the coding regions of several genes associated with DCM in humans did not appear to consistently account for DCM in Doberman Pinschers. However, an α-actinin variant was detected in some Doberman Pinschers that may contribute to the development of DCM given its potential effect on the structure of this protein. Investigation of additional candidate gene coding and noncoding regions and further evaluation of the role of α-actinin in development of DCM in Doberman Pinschers are warranted.
Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas.
Mathelier, Anthony; Lefebvre, Calvin; Zhang, Allen W; Arenillas, David J; Ding, Jiarui; Wasserman, Wyeth W; Shah, Sohrab P
2015-04-23
With the rapid increase of whole-genome sequencing of human cancers, an important opportunity to analyze and characterize somatic mutations lying within cis-regulatory regions has emerged. A focus on protein-coding regions to identify nonsense or missense mutations disruptive to protein structure and/or function has led to important insights; however, the impact on gene expression of mutations lying within cis-regulatory regions remains under-explored. We analyzed somatic mutations from 84 matched tumor-normal whole genomes from B-cell lymphomas with accompanying gene expression measurements to elucidate the extent to which these cancers are disrupted by cis-regulatory mutations. We characterize mutations overlapping a high quality set of well-annotated transcription factor binding sites (TFBSs), covering a similar portion of the genome as protein-coding exons. Our results indicate that cis-regulatory mutations overlapping predicted TFBSs are enriched in promoter regions of genes involved in apoptosis or growth/proliferation. By integrating gene expression data with mutation data, our computational approach culminates with identification of cis-regulatory mutations most likely to participate in dysregulation of the gene expression program. The impact can be measured along with protein-coding mutations to highlight key mutations disrupting gene expression and pathways in cancer. Our study yields specific genes with disrupted expression triggered by genomic mutations in either the coding or the regulatory space. It implies that mutated regulatory components of the genome contribute substantially to cancer pathways. Our analyses demonstrate that identifying genomically altered cis-regulatory elements coupled with analysis of gene expression data will augment biological interpretation of mutational landscapes of cancers.
SIBIS: a Bayesian model for inconsistent protein sequence estimation.
Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D
2014-09-01
The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A Real-Time All-Atom Structural Search Engine for Proteins
Gonzalez, Gabriel; Hannigan, Brett; DeGrado, William F.
2014-01-01
Protein designers use a wide variety of software tools for de novo design, yet their repertoire still lacks a fast and interactive all-atom search engine. To solve this, we have built the Suns program: a real-time, atomic search engine integrated into the PyMOL molecular visualization system. Users build atomic-level structural search queries within PyMOL and receive a stream of search results aligned to their query within a few seconds. This instant feedback cycle enables a new “designability”-inspired approach to protein design where the designer searches for and interactively incorporates native-like fragments from proven protein structures. We demonstrate the use of Suns to interactively build protein motifs, tertiary interactions, and to identify scaffolds compatible with hot-spot residues. The official web site and installer are located at http://www.degradolab.org/suns/ and the source code is hosted at https://github.com/godotgildor/Suns (PyMOL plugin, BSD license), https://github.com/Gabriel439/suns-cmd (command line client, BSD license), and https://github.com/Gabriel439/suns-search (search engine server, GPLv2 license). PMID:25079944
A real-time all-atom structural search engine for proteins.
Gonzalez, Gabriel; Hannigan, Brett; DeGrado, William F
2014-07-01
Protein designers use a wide variety of software tools for de novo design, yet their repertoire still lacks a fast and interactive all-atom search engine. To solve this, we have built the Suns program: a real-time, atomic search engine integrated into the PyMOL molecular visualization system. Users build atomic-level structural search queries within PyMOL and receive a stream of search results aligned to their query within a few seconds. This instant feedback cycle enables a new "designability"-inspired approach to protein design where the designer searches for and interactively incorporates native-like fragments from proven protein structures. We demonstrate the use of Suns to interactively build protein motifs, tertiary interactions, and to identify scaffolds compatible with hot-spot residues. The official web site and installer are located at http://www.degradolab.org/suns/ and the source code is hosted at https://github.com/godotgildor/Suns (PyMOL plugin, BSD license), https://github.com/Gabriel439/suns-cmd (command line client, BSD license), and https://github.com/Gabriel439/suns-search (search engine server, GPLv2 license).
IRaPPA: Information retrieval based integration of biophysical models for protein assembly selection
Moal, Iain H.; Barradas-Bautista, Didier; Jiménez-García, Brian; Torchala, Mieczyslaw; van der Velde, Arjan; Vreven, Thom; Weng, Zhiping; Bates, Paul A.; Fernández-Recio, Juan
2018-01-01
Motivation In order to function, proteins frequently bind to one another and form 3D assemblies. Knowledge of the atomic details of these structures helps our understanding of how proteins work together, how mutations can lead to disease, and facilitates the designing of drugs which prevent or mimic the interaction. Results Atomic modeling of protein-protein interactions requires the selection of near-native structures from a set of docked poses based on their calculable properties. By considering this as an information retrieval problem, we have adapted methods developed for Internet search ranking and electoral voting into IRaPPA, a pipeline integrating biophysical properties. The approach enhances the identification of near-native structures when applied to four docking methods, resulting in a near-native appearing in the top 10 solutions for up to 50% of complexes benchmarked, and up to 70% in the top 100. Availability IRaPPA has been implemented in the SwarmDock server (http://bmm.crick.ac.uk/~SwarmDock/), pyDock server (http://life.bsc.es/pid/pydockrescoring/) and ZDOCK server (http://zdock.umassmed.edu/), with code available on request. PMID:28200016
Montandon, P E; Vasserot, A; Stutz, E
1986-01-01
We retrieved a 1.6 kbp intron separating two exons of the psb C gene which codes for the 44 kDa reaction center protein of photosystem II. This intron is 3 to 4 times the size of all previously sequenced Euglena gracilis chloroplast introns. It contains an open reading frame of 458 codons potentially coding for a basic protein of 54 kDa of yet unknown function. The intron boundaries follow consensus sequences established for chloroplast introns related to class II and nuclear pre-mRNA introns. Its 3'-terminal segment has structural features similar to class II mitochondrial introns with an invariant base A as possible branch point for lariat formation.
Molecular cloning of chitinase 33 (chit33) gene from Trichoderma atroviride
Matroudi, S.; Zamani, M.R.; Motallebi, M.
2008-01-01
In this study Trichoderma atroviride was selected as over producer of chitinase enzyme among 30 different isolates of Trichoderma sp. on the basis of chitinase specific activity. From this isolate the genomic and cDNA clones encoding chit33 have been isolated and sequenced. Comparison of genomic and cDNA sequences for defining gene structure indicates that this gene contains three short introns and also an open reading frame coding for a protein of 321 amino acids. The deduced amino acid sequence includes a 19 aa putative signal peptide. Homology between this sequence and other reported Trichoderma Chit33 proteins are discussed. The coding sequence of chit33 gene was cloned in pEt26b(+) expression vector and expressed in E. coli. PMID:24031242
Novel numerical and graphical representation of DNA sequences and proteins.
Randić, M; Novic, M; Vikić-Topić, D; Plavsić, D
2006-12-01
We have introduced novel numerical and graphical representations of DNA, which offer a simple and unique characterization of DNA sequences. The numerical representation of a DNA sequence is given as a sequence of real numbers derived from a unique graphical representation of the standard genetic code. There is no loss of information on the primary structure of a DNA sequence associated with this numerical representation. The novel representations are illustrated with the coding sequences of the first exon of beta-globin gene of half a dozen species in addition to human. The method can be extended to proteins as is exemplified by humanin, a 24-aa peptide that has recently been identified as a specific inhibitor of neuronal cell death induced by familial Alzheimer's disease mutant genes.
Robust enzyme design: bioinformatic tools for improved protein stability.
Suplatov, Dmitry; Voevodin, Vladimir; Švedas, Vytas
2015-03-01
The ability of proteins and enzymes to maintain a functionally active conformation under adverse environmental conditions is an important feature of biocatalysts, vaccines, and biopharmaceutical proteins. From an evolutionary perspective, robust stability of proteins improves their biological fitness and allows for further optimization. Viewed from an industrial perspective, enzyme stability is crucial for the practical application of enzymes under the required reaction conditions. In this review, we analyze bioinformatic-driven strategies that are used to predict structural changes that can be applied to wild type proteins in order to produce more stable variants. The most commonly employed techniques can be classified into stochastic approaches, empirical or systematic rational design strategies, and design of chimeric proteins. We conclude that bioinformatic analysis can be efficiently used to study large protein superfamilies systematically as well as to predict particular structural changes which increase enzyme stability. Evolution has created a diversity of protein properties that are encoded in genomic sequences and structural data. Bioinformatics has the power to uncover this evolutionary code and provide a reproducible selection of hotspots - key residues to be mutated in order to produce more stable and functionally diverse proteins and enzymes. Further development of systematic bioinformatic procedures is needed to organize and analyze sequences and structures of proteins within large superfamilies and to link them to function, as well as to provide knowledge-based predictions for experimental evaluation. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A rate distortion approach to protein symmetry.
Wallace, Rodrick
2010-08-01
A spontaneous symmetry breaking argument is applied to the problem of protein folding, via a rate distortion analysis of the relation between genome coding and the final condensation of the protein molten globule that is, in spirit, analogous to Tlusty's (2007) exploration of the evolution of the genetic code. In the 'energy' picture, the average distortion between codon message and final protein structure, under constraints driven by evolutionary selection, serves as a temperature analog, so that low values limit the possible distribution of protein forms, producing the canonical folding funnel. A dual 'developmental' perspective sees the rate distortion function itself as the temperature analog, and permits incorporation of chaperons or toxic exposures as catalysts, driving the system to different possible outcomes or affecting the rate of convergence. The rate distortion function appears constrained by the availability of metabolic free energy, with implications for prebiotic evolution, and a nonequilibrium empirical Onsager treatment provides an adaptable statistical model that can be fitted to data, in the same manner as a regression equation. In sum, mechanistic models of protein folding fail to account for the observed spectrum of protein folding and aggregation disorders, suggesting that a biologically based cognitive paradigm describing folding will be needed for understanding the etiology, prevention, and treatment of these diseases. The developmental formalism introduced here may contribute substantially to such a paradigm.
Quantifying the mechanisms of domain gain in animal proteins.
Buljan, Marija; Frankish, Adam; Bateman, Alex
2010-01-01
Protein domains are protein regions that are shared among different proteins and are frequently functionally and structurally independent from the rest of the protein. Novel domain combinations have a major role in evolutionary innovation. However, the relative contributions of the different molecular mechanisms that underlie domain gains in animals are still unknown. By using animal gene phylogenies we were able to identify a set of high confidence domain gain events and by looking at their coding DNA investigate the causative mechanisms. Here we show that the major mechanism for gains of new domains in metazoan proteins is likely to be gene fusion through joining of exons from adjacent genes, possibly mediated by non-allelic homologous recombination. Retroposition and insertion of exons into ancestral introns through intronic recombination are, in contrast to previous expectations, only minor contributors to domain gains and have accounted for less than 1% and 10% of high confidence domain gain events, respectively. Additionally, exonization of previously non-coding regions appears to be an important mechanism for addition of disordered segments to proteins. We observe that gene duplication has preceded domain gain in at least 80% of the gain events. The interplay of gene duplication and domain gain demonstrates an important mechanism for fast neofunctionalization of genes.
Reiche, Kristin; Kasack, Katharina; Schreiber, Stephan; Lüders, Torben; Due, Eldri U.; Naume, Bjørn; Riis, Margit; Kristensen, Vessela N.; Horn, Friedemann; Børresen-Dale, Anne-Lise; Hackermüller, Jörg; Baumbusch, Lars O.
2014-01-01
Breast cancer, the second leading cause of cancer death in women, is a highly heterogeneous disease, characterized by distinct genomic and transcriptomic profiles. Transcriptome analyses prevalently assessed protein-coding genes; however, the majority of the mammalian genome is expressed in numerous non-coding transcripts. Emerging evidence supports that many of these non-coding RNAs are specifically expressed during development, tumorigenesis, and metastasis. The focus of this study was to investigate the expression features and molecular characteristics of long non-coding RNAs (lncRNAs) in breast cancer. We investigated 26 breast tumor and 5 normal tissue samples utilizing a custom expression microarray enclosing probes for mRNAs as well as novel and previously identified lncRNAs. We identified more than 19,000 unique regions significantly differentially expressed between normal versus breast tumor tissue, half of these regions were non-coding without any evidence for functional open reading frames or sequence similarity to known proteins. The identified non-coding regions were primarily located in introns (53%) or in the intergenic space (33%), frequently orientated in antisense-direction of protein-coding genes (14%), and commonly distributed at promoter-, transcription factor binding-, or enhancer-sites. Analyzing the most diverse mRNA breast cancer subtypes Basal-like versus Luminal A and B resulted in 3,025 significantly differentially expressed unique loci, including 682 (23%) for non-coding transcripts. A notable number of differentially expressed protein-coding genes displayed non-synonymous expression changes compared to their nearest differentially expressed lncRNA, including an antisense lncRNA strongly anticorrelated to the mRNA coding for histone deacetylase 3 (HDAC3), which was investigated in more detail. Previously identified chromatin-associated lncRNAs (CARs) were predominantly downregulated in breast tumor samples, including CARs located in the protein-coding genes for CALD1, FTX, and HNRNPH1. In conclusion, a number of differentially expressed lncRNAs have been identified with relation to cancer-related protein-coding genes. PMID:25264628
Reiche, Kristin; Kasack, Katharina; Schreiber, Stephan; Lüders, Torben; Due, Eldri U; Naume, Bjørn; Riis, Margit; Kristensen, Vessela N; Horn, Friedemann; Børresen-Dale, Anne-Lise; Hackermüller, Jörg; Baumbusch, Lars O
2014-01-01
Breast cancer, the second leading cause of cancer death in women, is a highly heterogeneous disease, characterized by distinct genomic and transcriptomic profiles. Transcriptome analyses prevalently assessed protein-coding genes; however, the majority of the mammalian genome is expressed in numerous non-coding transcripts. Emerging evidence supports that many of these non-coding RNAs are specifically expressed during development, tumorigenesis, and metastasis. The focus of this study was to investigate the expression features and molecular characteristics of long non-coding RNAs (lncRNAs) in breast cancer. We investigated 26 breast tumor and 5 normal tissue samples utilizing a custom expression microarray enclosing probes for mRNAs as well as novel and previously identified lncRNAs. We identified more than 19,000 unique regions significantly differentially expressed between normal versus breast tumor tissue, half of these regions were non-coding without any evidence for functional open reading frames or sequence similarity to known proteins. The identified non-coding regions were primarily located in introns (53%) or in the intergenic space (33%), frequently orientated in antisense-direction of protein-coding genes (14%), and commonly distributed at promoter-, transcription factor binding-, or enhancer-sites. Analyzing the most diverse mRNA breast cancer subtypes Basal-like versus Luminal A and B resulted in 3,025 significantly differentially expressed unique loci, including 682 (23%) for non-coding transcripts. A notable number of differentially expressed protein-coding genes displayed non-synonymous expression changes compared to their nearest differentially expressed lncRNA, including an antisense lncRNA strongly anticorrelated to the mRNA coding for histone deacetylase 3 (HDAC3), which was investigated in more detail. Previously identified chromatin-associated lncRNAs (CARs) were predominantly downregulated in breast tumor samples, including CARs located in the protein-coding genes for CALD1, FTX, and HNRNPH1. In conclusion, a number of differentially expressed lncRNAs have been identified with relation to cancer-related protein-coding genes.
Integrating protein structural dynamics and evolutionary analysis with Bio3D.
Skjærven, Lars; Yao, Xin-Qiu; Scarabelli, Guido; Grant, Barry J
2014-12-10
Popular bioinformatics approaches for studying protein functional dynamics include comparisons of crystallographic structures, molecular dynamics simulations and normal mode analysis. However, determining how observed displacements and predicted motions from these traditionally separate analyses relate to each other, as well as to the evolution of sequence, structure and function within large protein families, remains a considerable challenge. This is in part due to the general lack of tools that integrate information of molecular structure, dynamics and evolution. Here, we describe the integration of new methodologies for evolutionary sequence, structure and simulation analysis into the Bio3D package. This major update includes unique high-throughput normal mode analysis for examining and contrasting the dynamics of related proteins with non-identical sequences and structures, as well as new methods for quantifying dynamical couplings and their residue-wise dissection from correlation network analysis. These new methodologies are integrated with major biomolecular databases as well as established methods for evolutionary sequence and comparative structural analysis. New functionality for directly comparing results derived from normal modes, molecular dynamics and principal component analysis of heterogeneous experimental structure distributions is also included. We demonstrate these integrated capabilities with example applications to dihydrofolate reductase and heterotrimeric G-protein families along with a discussion of the mechanistic insight provided in each case. The integration of structural dynamics and evolutionary analysis in Bio3D enables researchers to go beyond a prediction of single protein dynamics to investigate dynamical features across large protein families. The Bio3D package is distributed with full source code and extensive documentation as a platform independent R package under a GPL2 license from http://thegrantlab.org/bio3d/ .
Urvoas, Agathe; Guellouz, Asma; Valerio-Lepiniec, Marie; Graille, Marc; Durand, Dominique; Desravines, Danielle C; van Tilbeurgh, Herman; Desmadril, Michel; Minard, Philippe
2010-11-26
Repeat proteins have a modular organization and a regular architecture that make them attractive models for design and directed evolution experiments. HEAT repeat proteins, although very common, have not been used as a scaffold for artificial proteins, probably because they are made of long and irregular repeats. Here, we present and validate a consensus sequence for artificial HEAT repeat proteins. The sequence was defined from the structure-based sequence analysis of a thermostable HEAT-like repeat protein. Appropriate sequences were identified for the N- and C-caps. A library of genes coding for artificial proteins based on this sequence design, named αRep, was assembled using new and versatile methodology based on circular amplification. Proteins picked randomly from this library are expressed as soluble proteins. The biophysical properties of proteins with different numbers of repeats and different combinations of side chains in hypervariable positions were characterized. Circular dichroism and differential scanning calorimetry experiments showed that all these proteins are folded cooperatively and are very stable (T(m) >70 °C). Stability of these proteins increases with the number of repeats. Detailed gel filtration and small-angle X-ray scattering studies showed that the purified proteins form either monomers or dimers. The X-ray structure of a stable dimeric variant structure was solved. The protein is folded with a highly regular topology and the repeat structure is organized, as expected, as pairs of alpha helices. In this protein variant, the dimerization interface results directly from the variable surface enriched in aromatic residues located in the randomized positions of the repeats. The dimer was crystallized both in an apo and in a PEG-bound form, revealing a very well defined binding crevice and some structure flexibility at the interface. This fortuitous binding site could later prove to be a useful binding site for other low molecular mass partners. Copyright © 2010 Elsevier Ltd. All rights reserved.
Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.
Mørk, Søren; Holmes, Ian
2012-03-01
Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. Supplementary data are available at Bioinformatics online.
A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics
Tang, Haixu; Li, Sujun; Ye, Yuzhen
2016-01-01
Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro. PMID:27918579
Venselaar, Hanka; Te Beek, Tim A H; Kuipers, Remko K P; Hekkelman, Maarten L; Vriend, Gert
2010-11-08
Many newly detected point mutations are located in protein-coding regions of the human genome. Knowledge of their effects on the protein's 3D structure provides insight into the protein's mechanism, can aid the design of further experiments, and eventually can lead to the development of new medicines and diagnostic tools. In this article we describe HOPE, a fully automatic program that analyzes the structural and functional effects of point mutations. HOPE collects information from a wide range of information sources including calculations on the 3D coordinates of the protein by using WHAT IF Web services, sequence annotations from the UniProt database, and predictions by DAS services. Homology models are built with YASARA. Data is stored in a database and used in a decision scheme to identify the effects of a mutation on the protein's 3D structure and function. HOPE builds a report with text, figures, and animations that is easy to use and understandable for (bio)medical researchers. We tested HOPE by comparing its output to the results of manually performed projects. In all straightforward cases HOPE performed similar to a trained bioinformatician. The use of 3D structures helps optimize the results in terms of reliability and details. HOPE's results are easy to understand and are presented in a way that is attractive for researchers without an extensive bioinformatics background.
Representation mutations from standard genetic codes
NASA Astrophysics Data System (ADS)
Aisah, I.; Suyudi, M.; Carnia, E.; Suhendi; Supriatna, A. K.
2018-03-01
Graph is widely used in everyday life especially to describe model problem and describe it concretely and clearly. In addition graph is also used to facilitate solve various kinds of problems that are difficult to be solved by calculation. In Biology, graph can be used to describe the process of protein synthesis in DNA. Protein has an important role for DNA (deoxyribonucleic acid) or RNA (ribonucleic acid). Proteins are composed of amino acids. In this study, amino acids are related to genetics, especially the genetic code. The genetic code is also known as the triplet or codon code which is a three-letter arrangement of DNA nitrogen base. The bases are adenine (A), thymine (T), guanine (G) and cytosine (C). While on RNA thymine (T) is replaced with Urasil (U). The set of all Nitrogen bases in RNA is denoted by N = {C U, A, G}. This codon works at the time of protein synthesis inside the cell. This codon also encodes the stop signal as a sign of the stop of protein synthesis process. This paper will examine the process of protein synthesis through mathematical studies and present it in three-dimensional space or graph. The study begins by analysing the set of all codons denoted by NNN such that to obtain geometric representations. At this stage there is a matching between the sets of all nitrogen bases N with Z 2 × Z 2; C=(\\overline{0},\\overline{0}),{{U}}=(\\overline{0},\\overline{1}),{{A}}=(\\overline{1},\\overline{0}),{{G}}=(\\overline{1},\\overline{1}). By matching the algebraic structure will be obtained such as group, group Klein-4,Quotien group etc. With the help of Geogebra software, the set of all codons denoted by NNN can be presented in a three-dimensional space as a multicube NNN and also can be represented as a graph, so that can easily see relationship between the codon.
Accurate Structural Correlations from Maximum Likelihood Superpositions
Theobald, Douglas L; Wuttke, Deborah S
2008-01-01
The cores of globular proteins are densely packed, resulting in complicated networks of structural interactions. These interactions in turn give rise to dynamic structural correlations over a wide range of time scales. Accurate analysis of these complex correlations is crucial for understanding biomolecular mechanisms and for relating structure to function. Here we report a highly accurate technique for inferring the major modes of structural correlation in macromolecules using likelihood-based statistical analysis of sets of structures. This method is generally applicable to any ensemble of related molecules, including families of nuclear magnetic resonance (NMR) models, different crystal forms of a protein, and structural alignments of homologous proteins, as well as molecular dynamics trajectories. Dominant modes of structural correlation are determined using principal components analysis (PCA) of the maximum likelihood estimate of the correlation matrix. The correlations we identify are inherently independent of the statistical uncertainty and dynamic heterogeneity associated with the structural coordinates. We additionally present an easily interpretable method (“PCA plots”) for displaying these positional correlations by color-coding them onto a macromolecular structure. Maximum likelihood PCA of structural superpositions, and the structural PCA plots that illustrate the results, will facilitate the accurate determination of dynamic structural correlations analyzed in diverse fields of structural biology. PMID:18282091
Nabuurs, Sanne M; Westphal, Adrie H; aan den Toorn, Marije; Lindhoud, Simon; van Mierlo, Carlo P M
2009-06-17
Partially folded protein species transiently exist during folding of most proteins. Often these species are molten globules, which may be on- or off-pathway to native protein. Molten globules have a substantial amount of secondary structure but lack virtually all the tertiary side-chain packing characteristic of natively folded proteins. These ensembles of interconverting conformers are prone to aggregation and potentially play a role in numerous devastating pathologies, and thus attract considerable attention. The molten globule that is observed during folding of apoflavodoxin from Azotobacter vinelandii is off-pathway, as it has to unfold before native protein can be formed. Here we report that this species can be trapped under nativelike conditions by substituting amino acid residue F44 by Y44, allowing spectroscopic characterization of its conformation. Whereas native apoflavodoxin contains a parallel beta-sheet surrounded by alpha-helices (i.e., the flavodoxin-like or alpha-beta parallel topology), it is shown that the molten globule has a totally different topology: it is helical and contains no beta-sheet. The presence of this remarkably nonnative species shows that single polypeptide sequences can code for distinct folds that swap upon changing conditions. Topological switching between unrelated protein structures is likely a general phenomenon in the protein structure universe.
Meitinger, T; Meindl, A; Bork, P; Rost, B; Sander, C; Haasemann, M; Murken, J
1993-12-01
The X-lined gene for Norrie disease, which is characterized by blindness, deafness and mental retardation has been cloned recently. This gene has been thought to code for a putative extracellular factor; its predicted amino acid sequence is homologous to the C-terminal domain of diverse extracellular proteins. Sequence pattern searches and three-dimensional modelling now suggest that the Norrie disease protein (NDP) has a tertiary structure similar to that of transforming growth factor beta (TGF beta). Our model identifies NDP as a member of an emerging family of growth factors containing a cystine knot motif, with direct implications for the physiological role of NDP. The model also sheds light on sequence related domains such as the C-terminal domain of mucins and of von Willebrand factor.
Structural Isosteres of Phosphate Groups in the Protein Data Bank.
Zhang, Yuezhou; Borrel, Alexandre; Ghemtio, Leo; Regad, Leslie; Boije Af Gennäs, Gustav; Camproux, Anne-Claude; Yli-Kauhaluoma, Jari; Xhaard, Henri
2017-03-27
We developed a computational workflow to mine the Protein Data Bank for isosteric replacements that exist in different binding site environments but have not necessarily been identified and exploited in compound design. Taking phosphate groups as examples, the workflow was used to construct 157 data sets, each composed of a reference protein complexed with AMP, ADP, ATP, or pyrophosphate as well other ligands. Phosphate binding sites appear to have a high hydration content and large size, resulting in U-shaped bioactive conformations recurrently found across unrelated protein families. A total of 16 413 replacements were extracted, filtered for a significant structural overlap on phosphate groups, and sorted according to their SMILES codes. In addition to the classical isosteres of phosphate, such as carboxylate, sulfone, or sulfonamide, unexpected replacements that do not conserve charge or polarity, such as aryl, aliphatic, or positively charged groups, were found.
ClusCo: clustering and comparison of protein models.
Jamroz, Michal; Kolinski, Andrzej
2013-02-22
The development, optimization and validation of protein modeling methods require efficient tools for structural comparison. Frequently, a large number of models need to be compared with the target native structure. The main reason for the development of Clusco software was to create a high-throughput tool for all-versus-all comparison, because calculating similarity matrix is the one of the bottlenecks in the protein modeling pipeline. Clusco is fast and easy-to-use software for high-throughput comparison of protein models with different similarity measures (cRMSD, dRMSD, GDT_TS, TM-Score, MaxSub, Contact Map Overlap) and clustering of the comparison results with standard methods: K-means Clustering or Hierarchical Agglomerative Clustering. The application was highly optimized and written in C/C++, including the code for parallel execution on CPU and GPU, which resulted in a significant speedup over similar clustering and scoring computation programs.
RNA 3D Structural Motifs: Definition, Identification, Annotation, and Database Searching
NASA Astrophysics Data System (ADS)
Nasalean, Lorena; Stombaugh, Jesse; Zirbel, Craig L.; Leontis, Neocles B.
Structured RNA molecules resemble proteins in the hierarchical organization of their global structures, folding and broad range of functions. Structured RNAs are composed of recurrent modular motifs that play specific functional roles. Some motifs direct the folding of the RNA or stabilize the folded structure through tertiary interactions. Others bind ligands or proteins or catalyze chemical reactions. Therefore, it is desirable, starting from the RNA sequence, to be able to predict the locations of recurrent motifs in RNA molecules. Conversely, the potential occurrence of one or more known 3D RNA motifs may indicate that a genomic sequence codes for a structured RNA molecule. To identify known RNA structural motifs in new RNA sequences, precise structure-based definitions are needed that specify the core nucleotides of each motif and their conserved interactions. By comparing instances of each recurrent motif and applying base pair isosteriCity relations, one can identify neutral mutations that preserve its structure and function in the contexts in which it occurs.
Finding Inspiration in the Protein Data Bank to Chemically Antagonize Readers of the Histone Code.
Campagna-Slater, Valérie; Schapira, Matthieu
2010-04-12
Members of the Royal family of proteins are readers of the histone code that contain aromatic cages capable of recognizing specific sequences and lysine methylation states on histone tails. These binding modules play a key role in epigenetic signalling, and are part of a larger group of epigenetic targets that are becoming increasingly attractive for drug discovery. In the current study, pharmacophore representations of the aromatic cages forming the methyl-lysine (Me-Lys) recognition site were used to search the Protein Data Bank (PDB) for ligand binding pockets possessing similar chemical and geometrical features in unrelated proteins. The small molecules bound to these sites were then extracted from the PDB, and clustered based on fragments binding to the aromatic cages. The compounds collected are numerous and structurally diverse, but point to a limited set of preferred chemotypes; these include quaternary ammonium, sulfonium, and primary, secondary and tertiary amine moieties, as well as aromatic, aliphatic or orthogonal rings, and bicyclic systems. The chemical tool-kit identified can be used to design antagonists of the Royal family and related proteins. Copyright © 2010 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Smith, Colin A; Kortemme, Tanja
2011-01-01
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.
Lee, Woonghee; Kim, Jin Hae; Westler, William M; Markley, John L
2011-06-15
PONDEROSA (Peak-picking Of Noe Data Enabled by Restriction of Shift Assignments) accepts input information consisting of a protein sequence, backbone and sidechain NMR resonance assignments, and 3D-NOESY ((13)C-edited and/or (15)N-edited) spectra, and returns assignments of NOESY crosspeaks, distance and angle constraints, and a reliable NMR structure represented by a family of conformers. PONDEROSA incorporates and integrates external software packages (TALOS+, STRIDE and CYANA) to carry out different steps in the structure determination. PONDEROSA implements internal functions that identify and validate NOESY peak assignments and assess the quality of the calculated three-dimensional structure of the protein. The robustness of the analysis results from PONDEROSA's hierarchical processing steps that involve iterative interaction among the internal and external modules. PONDEROSA supports a variety of input formats: SPARKY assignment table (.shifts) and spectrum file formats (.ucsf), XEASY proton file format (.prot), and NMR-STAR format (.star). To demonstrate the utility of PONDEROSA, we used the package to determine 3D structures of two proteins: human ubiquitin and Escherichia coli iron-sulfur scaffold protein variant IscU(D39A). The automatically generated structural constraints and ensembles of conformers were as good as or better than those determined previously by much less automated means. The program, in the form of binary code along with tutorials and reference manuals, is available at http://ponderosa.nmrfam.wisc.edu/.
Ma, Dejian; Tillman, Tommy S; Tang, Pei; Meirovitch, Eva; Eckenhoff, Roderic; Carnini, Anna; Xu, Yan
2008-10-28
Structural studies of polytopic membrane proteins are often hampered by the vagaries of these proteins in membrane mimetic environments and by the difficulties in handling them with conventional techniques. Designing and creating water-soluble analogues with preserved native structures offer an attractive alternative. We report here solution NMR studies of WSK3, a water-soluble analogue of the potassium channel KcsA. The WSK3 NMR structure (PDB ID code 2K1E) resembles the KcsA crystal structures, validating the approach. By more stringent comparison criteria, however, the introduction of several charged residues aimed at improving water solubility seems to have led to the possible formations of a few salt bridges and hydrogen bonds not present in the native structure, resulting in slight differences in the structure of WSK3 relative to KcsA. NMR dynamics measurements show that WSK3 is highly flexible in the absence of a lipid environment. Reduced spectral density mapping and model-free analyses reveal dynamic characteristics consistent with an isotropically tumbling tetramer experiencing slow (nanosecond) motions with unusually low local ordering. An altered hydrogen-bond network near the selectivity filter and the pore helix, and the intrinsically dynamic nature of the selectivity filter, support the notion that this region is crucial for slow inactivation. Our results have implications not only for the design of water-soluble analogues of membrane proteins but also for our understanding of the basic determinants of intrinsic protein structure and dynamics.
Where to attach dye molecules to a protein: lessons from the computer program WHAT IF
NASA Astrophysics Data System (ADS)
Altenberg-Greulich, B.; Vriend, G.
2001-10-01
Genomic and proteomic projects are producing a flood of data that all require interpretation which often is best performed based on a three dimensional structure of the molecule(s) involved. These structures can be determined experimentally, or modelled by homology. Because of the complexity of the questions and the heterogeneity of the data, the software used for modelling proteins must become even more versatile. We describe several case studies in which the questions asked, the data, and the requirements on the software all are very different. It is shown how structural knowledge about a protein helps to determine the best place to bind a fluorescent dye. Such dyes are needed to determine protein-protein, protein-DNA interactions or intrinsic fluorescence microscopy. Further, using dyes you can trace molecules in the cell and thus get a handle on subcellular localisation. The first example (OCT-1) involves the search for free amino groups in a protein-DNA complex. The second example (BPTI) is a case, in which the amino acid distribution shows that amino groups are spread all over the structure, so that the natural structure has to be modified to get an answer. The third example (HFE) involves a model built by homology. In this case the amino group distribution can also be predicted. All these studies were performed using the WHAT IF software package. This package is available including source code, documentation, etc. See http://www.cmbi.kun.nl/whatif/
A discriminatory function for prediction of protein-DNA interactions based on alpha shape modeling.
Zhou, Weiqiang; Yan, Hong
2010-10-15
Protein-DNA interaction has significant importance in many biological processes. However, the underlying principle of the molecular recognition process is still largely unknown. As more high-resolution 3D structures of protein-DNA complex are becoming available, the surface characteristics of the complex become an important research topic. In our work, we apply an alpha shape model to represent the surface structure of the protein-DNA complex and developed an interface-atom curvature-dependent conditional probability discriminatory function for the prediction of protein-DNA interaction. The interface-atom curvature-dependent formalism captures atomic interaction details better than the atomic distance-based method. The proposed method provides good performance in discriminating the native structures from the docking decoy sets, and outperforms the distance-dependent formalism in terms of the z-score. Computer experiment results show that the curvature-dependent formalism with the optimal parameters can achieve a native z-score of -8.17 in discriminating the native structure from the highest surface-complementarity scored decoy set and a native z-score of -7.38 in discriminating the native structure from the lowest RMSD decoy set. The interface-atom curvature-dependent formalism can also be used to predict apo version of DNA-binding proteins. These results suggest that the interface-atom curvature-dependent formalism has a good prediction capability for protein-DNA interactions. The code and data sets are available for download on http://www.hy8.com/bioinformatics.htm kenandzhou@hotmail.com.
Identification of Phosphorylation Codes for Arrestin Recruitment by G Protein-Coupled Receptors.
Zhou, X Edward; He, Yuanzheng; de Waal, Parker W; Gao, Xiang; Kang, Yanyong; Van Eps, Ned; Yin, Yanting; Pal, Kuntal; Goswami, Devrishi; White, Thomas A; Barty, Anton; Latorraca, Naomi R; Chapman, Henry N; Hubbell, Wayne L; Dror, Ron O; Stevens, Raymond C; Cherezov, Vadim; Gurevich, Vsevolod V; Griffin, Patrick R; Ernst, Oliver P; Melcher, Karsten; Xu, H Eric
2017-07-27
G protein-coupled receptors (GPCRs) mediate diverse signaling in part through interaction with arrestins, whose binding promotes receptor internalization and signaling through G protein-independent pathways. High-affinity arrestin binding requires receptor phosphorylation, often at the receptor's C-terminal tail. Here, we report an X-ray free electron laser (XFEL) crystal structure of the rhodopsin-arrestin complex, in which the phosphorylated C terminus of rhodopsin forms an extended intermolecular β sheet with the N-terminal β strands of arrestin. Phosphorylation was detected at rhodopsin C-terminal tail residues T336 and S338. These two phospho-residues, together with E341, form an extensive network of electrostatic interactions with three positively charged pockets in arrestin in a mode that resembles binding of the phosphorylated vasopressin-2 receptor tail to β-arrestin-1. Based on these observations, we derived and validated a set of phosphorylation codes that serve as a common mechanism for phosphorylation-dependent recruitment of arrestins by GPCRs. Copyright © 2017 Elsevier Inc. All rights reserved.
Nadalin, Francesca; Carbone, Alessandra
2018-02-01
Large-scale computational docking will be increasingly used in future years to discriminate protein-protein interactions at the residue resolution. Complete cross-docking experiments make in silico reconstruction of protein-protein interaction networks a feasible goal. They ask for efficient and accurate screening of the millions structural conformations issued by the calculations. We propose CIPS (Combined Interface Propensity for decoy Scoring), a new pair potential combining interface composition with residue-residue contact preference. CIPS outperforms several other methods on screening docking solutions obtained either with all-atom or with coarse-grain rigid docking. Further testing on 28 CAPRI targets corroborates CIPS predictive power over existing methods. By combining CIPS with atomic potentials, discrimination of correct conformations in all-atom structures reaches optimal accuracy. The drastic reduction of candidate solutions produced by thousands of proteins docked against each other makes large-scale docking accessible to analysis. CIPS source code is freely available at http://www.lcqb.upmc.fr/CIPS. alessandra.carbone@lip6.fr. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Origin and evolution of spliceosomal introns
2012-01-01
Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded ‘introns first’ held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes. This article was reviewed by I. King Jordan, Manuel Irimia (nominated by Anthony Poole), Tobias Mourier (nominated by Anthony Poole), and Fyodor Kondrashov. For the complete reports, see the Reviewers’ Reports section. PMID:22507701
Yafremava, Liudmila S; Di Giulio, Massimo; Caetano-Anollés, Gustavo
2013-01-01
Amino acid substitution patterns between the nonbarophilic Pyrococcus furiosus and its barophilic relative P. abyssi confirm that hydrostatic pressure asymmetry indices reflect the extent to which amino acids are preferred by barophilic archaeal organisms. Substitution patterns in entire protein sequences, shared protein domains defined at fold superfamily level, domains in homologous sequence pairs, and domains of very ancient and very recent origin now provide further clues about the environment that led to the genetic code and diversified life. The pyrococcal proteomes are very similar and share a very early ancestor. Relative amino acid abundance analyses showed that biases in the use of amino acids are due to their shared fold superfamilies. Within these repertoires, only two of the five amino acids that are preferentially barophilic, aspartic acid and arginine, displayed this preference significantly and consistently across structure and in domains appearing in the ancestor. The more primordial asparagine, lysine and threonine displayed a consistent preference for nonbarophily across structure and in the ancestor. Since barophilic preferences are already evident in ancient domains that are at least ~3 billion year old, we conclude that barophily is a very ancient trait that unfolded concurrently with genetic idiosyncrasies in convergence towards a universal code.
RNA G-quadruplexes: emerging mechanisms in disease
Cammas, Anne
2017-01-01
Abstract RNA G-quadruplexes (G4s) are formed by G-rich RNA sequences in protein-coding (mRNA) and non-coding (ncRNA) transcripts that fold into a four-stranded conformation. Experimental studies and bioinformatic predictions support the view that these structures are involved in different cellular functions associated to both DNA processes (telomere elongation, recombination and transcription) and RNA post-transcriptional mechanisms (including pre-mRNA processing, mRNA turnover, targeting and translation). An increasing number of different diseases have been associated with the inappropriate regulation of RNA G4s exemplifying the potential importance of these structures on human health. Here, we review the different molecular mechanisms underlying the link between RNA G4s and human diseases by proposing several overlapping models of deregulation emerging from recent research, including (i) sequestration of RNA-binding proteins, (ii) aberrant expression or localization of RNA G4-binding proteins, (iii) repeat associated non-AUG (RAN) translation, (iv) mRNA translational blockade and (v) disabling of protein–RNA G4 complexes. This review also provides a comprehensive survey of the functional RNA G4 and their mechanisms of action. Finally, we highlight future directions for research aimed at improving our understanding on RNA G4-mediated regulatory mechanisms linked to diseases. PMID:28013268
Predicting Gene Structure Changes Resulting from Genetic Variants via Exon Definition Features.
Majoros, William H; Holt, Carson; Campbell, Michael S; Ware, Doreen; Yandell, Mark; Reddy, Timothy E
2018-04-25
Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed, and produce functional proteins. We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and noncoding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or noncoding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products, and we propose that they may commonly act as cryptic factors in disease. The software is available from geneprediction.org/SGRF. bmajoros@duke.edu. Supplementary information is available at Bioinformatics online.
Wise, Carol A.; Chiang, Lydia C.; Paznekas, William A.; Sharma, Mridula; Musy, Maurice M.; Ashley, Jennifer A.; Lovett, Michael; Jabs, Ethylin W.
1997-01-01
Treacher Collins Syndrome (TCS) is the most common of the human mandibulofacial dysostosis disorders. Recently, a partial TCOF1 cDNA was identified and shown to contain mutations in TCS families. Here we present the entire exon/intron genomic structure and the complete coding sequence of TCOF1. TCOF1 encodes a low complexity protein of 1,411 amino acids, whose predicted protein structure reveals repeated motifs that mirror the organization of its exons. These motifs are shared with nucleolar trafficking proteins in other species and are predicted to be highly phosphorylated by casein kinase. Consistent with this, the full-length TCOF1 protein sequence also contains putative nuclear and nucleolar localization signals. Throughout the open reading frame, we detected an additional eight mutations in TCS families and several polymorphisms. We postulate that TCS results from defects in a nucleolar trafficking protein that is critically required during human craniofacial development. PMID:9096354
NASA Astrophysics Data System (ADS)
Yuan, Ye; Wang, Xiuli; Guo, Sheping; Qiu, Xuemei
2011-06-01
Gram-negative Vibrio parahaemolyticus is a common pathogen in humans and marine animals. The outer membrane protein of bacteria plays an important role in the infection and pathogenicity to the host. Thus, the outer membrane proteins are an ideal target for vaccines. We amplified a complete outer membrane protein gene (ompW) from V. parahaemolyticus ATCC 17802. We then cloned and expressed the gene into Escherichia coli BL21 (DE3) cells. The gene coded for a protein that was 42.78 kDa. We purified the protein using Ni-NTA affinity chromatography and Anti-His antibody Western blotting, respectively. Our results provide a basis for future application of the OmpW protein as a vaccine candidate against infection by V. parahaemolyticus. In addition, the purified OmpW protein can be used for further functional and structural studies.
Performance of protein-structure predictions with the physics-based UNRES force field in CASP11.
Krupa, Paweł; Mozolewska, Magdalena A; Wiśniewska, Marta; Yin, Yanping; He, Yi; Sieradzan, Adam K; Ganzynkowicz, Robert; Lipska, Agnieszka G; Karczyńska, Agnieszka; Ślusarz, Magdalena; Ślusarz, Rafał; Giełdoń, Artur; Czaplewski, Cezary; Jagieła, Dawid; Zaborowski, Bartłomiej; Scheraga, Harold A; Liwo, Adam
2016-11-01
Participating as the Cornell-Gdansk group, we have used our physics-based coarse-grained UNited RESidue (UNRES) force field to predict protein structure in the 11th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP11). Our methodology involved extensive multiplexed replica exchange simulations of the target proteins with a recently improved UNRES force field to provide better reproductions of the local structures of polypeptide chains. All simulations were started from fully extended polypeptide chains, and no external information was included in the simulation process except for weak restraints on secondary structure to enable us to finish each prediction within the allowed 3-week time window. Because of simplified UNRES representation of polypeptide chains, use of enhanced sampling methods, code optimization and parallelization and sufficient computational resources, we were able to treat, for the first time, all 55 human prediction targets with sizes from 44 to 595 amino acid residues, the average size being 251 residues. Complete structures of six single-domain proteins were predicted accurately, with the highest accuracy being attained for the T0769, for which the CαRMSD was 3.8 Å for 97 residues of the experimental structure. Correct structures were also predicted for 13 domains of multi-domain proteins with accuracy comparable to that of the best template-based modeling methods. With further improvements of the UNRES force field that are now underway, our physics-based coarse-grained approach to protein-structure prediction will eventually reach global prediction capacity and, consequently, reliability in simulating protein structure and dynamics that are important in biochemical processes. Freely available on the web at http://www.unres.pl/ CONTACT: has5@cornell.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Li, Guohui; Hu, Zhaoyang; Guo, Xuli; Li, Guangtian; Tang, Qi; Wang, Peng; Chen, Keping; Yao, Qin
2013-06-01
Bombyx mori bidensovirus (BmBDV) VD1-ORF4 (open reading frame 4, ORF4) consists of 3,318 nucleotides, which codes for a predicted 1,105-amino acid protein containing a conserved DNA polymerase motif. However, its functions in viral propagation remain unknown. In the current study, the transcription of VD1-ORF4 was examined from 6 to 96 h postinfection (p.i.) by RT-PCR, 5'-RACE revealed the transcription initiation site of BmBDV ORF4 to be -16 nucleotides upstream from the start codon, and 3'-RACE revealed the transcription termination site of VD1-ORF4 to be +7 nucleotides downstream from termination codon. Three different proteins were examined in the extracts of BmBDV-infected silkworms midguts by Western blot using raised antibodies against VD1-ORF4 deduced amino acid, and a specific protein band about 53 kDa was further detected in purified virions using the same antibodies. Taken together, BmBDV VD1-ORF4 codes for three or more proteins during the viral life cycle, one of which is a 53 kDa protein and confirmed to be a component of BmBDV virion.
A Simple Test of Class-Level Genetic Association Can Reveal Novel Cardiometabolic Trait Loci.
Qian, Jing; Nunez, Sara; Reed, Eric; Reilly, Muredach P; Foulkes, Andrea S
2016-01-01
Characterizing the genetic determinants of complex diseases can be further augmented by incorporating knowledge of underlying structure or classifications of the genome, such as newly developed mappings of protein-coding genes, epigenetic marks, enhancer elements and non-coding RNAs. We apply a simple class-level testing framework, termed Genetic Class Association Testing (GenCAT), to identify protein-coding gene association with 14 cardiometabolic (CMD) related traits across 6 publicly available genome wide association (GWA) meta-analysis data resources. GenCAT uses SNP-level meta-analysis test statistics across all SNPs within a class of elements, as well as the size of the class and its unique correlation structure, to determine if the class is statistically meaningful. The novelty of findings is evaluated through investigation of regional signals. A subset of findings are validated using recently updated, larger meta-analysis resources. A simulation study is presented to characterize overall performance with respect to power, control of family-wise error and computational efficiency. All analysis is performed using the GenCAT package, R version 3.2.1. We demonstrate that class-level testing complements the common first stage minP approach that involves individual SNP-level testing followed by post-hoc ascribing of statistically significant SNPs to genes and loci. GenCAT suggests 54 protein-coding genes at 41 distinct loci for the 13 CMD traits investigated in the discovery analysis, that are beyond the discoveries of minP alone. An additional application to biological pathways demonstrates flexibility in defining genetic classes. We conclude that it would be prudent to include class-level testing as standard practice in GWA analysis. GenCAT, for example, can be used as a simple, complementary and efficient strategy for class-level testing that leverages existing data resources, requires only summary level data in the form of test statistics, and adds significant value with respect to its potential for identifying multiple novel and clinically relevant trait associations.
RPS8—a New Informative DNA Marker for Phylogeny of Babesia and Theileria Parasites in China
Tian, Zhan-Cheng; Liu, Guang-Yuan; Yin, Hong; Luo, Jian-Xun; Guan, Gui-Quan; Luo, Jin; Xie, Jun-Ren; Shen, Hui; Tian, Mei-Yuan; Zheng, Jin-feng; Yuan, Xiao-song; Wang, Fang-fang
2013-01-01
Piroplasmosis is a serious debilitating and sometimes fatal disease. Phylogenetic relationships within piroplasmida are complex and remain unclear. We compared the intron–exon structure and DNA sequences of the RPS8 gene from Babesia and Theileria spp. isolates in China. Similar to 18S rDNA, the 40S ribosomal protein S8 gene, RPS8, including both coding and non-coding regions is a useful and novel genetic marker for defining species boundaries and for inferring phylogenies because it tends to have little intra-specific variation but considerable inter-specific difference. However, more samples are needed to verify the usefulness of the RPS8 (coding and non-coding regions) gene as a marker for the phylogenetic position and detection of most Babesia and Theileria species, particularly for some closely related species. PMID:24244571
Protein change in plant evolution: tracing one thread connecting molecular and phenotypic diversity
Bartlett, Madelaine E.; Whipple, Clinton J.
2013-01-01
Proteins change over the course of evolutionary time. New protein-coding genes and gene families emerge and diversify, ultimately affecting an organism’s phenotype and interactions with its environment. Here we survey the range of structural protein change observed in plants and review the role these changes have had in the evolution of plant form and function. Verified examples tying evolutionary change in protein structure to phenotypic change remain scarce. We will review the existing examples, as well as draw from investigations into domestication, and quantitative trait locus (QTL) cloning studies searching for the molecular underpinnings of natural variation. The evolutionary significance of many cloned QTL has not been assessed, but all the examples identified so far have begun to reveal the extent of protein structural diversity tolerated in natural systems. This molecular (and phenotypic) diversity could come to represent part of natural selection’s source material in the adaptive evolution of novel traits. Protein structure and function can change in many distinct ways, but the changes we identified in studies of natural diversity and protein evolution were predicted to fall primarily into one of six categories: altered active and binding sites; altered protein–protein interactions; altered domain content; altered activity as an activator or repressor; altered protein stability; and hypomorphic and hypermorphic alleles. There was also variability in the evolutionary scale at which particular changes were observed. Some changes were detected at both micro- and macroevolutionary timescales, while others were observed primarily at deep or shallow phylogenetic levels. This variation might be used to determine the trajectory of future investigations in structural molecular evolution. PMID:24124420
Vucetic, Slobodan; Xie, Hongbo; Iakoucheva, Lilia M.; Oldfield, Christopher J.; Dunker, A. Keith; Obradovic, Zoran; Uversky, Vladimir N.
2008-01-01
Biologically active proteins without stable ordered structure (i.e., intrinsically disordered proteins) are attracting increased attention. Functional repertoires of ordered and disordered proteins are very different, and the ability to differentiate whether a given function is associated with intrinsic disorder or with a well-folded protein is crucial for modern protein science. However, there is a large gap between the number of proteins experimentally confirmed to be disordered and their actual number in nature. As a result, studies of functional properties of confirmed disordered proteins, while helpful in revealing the functional diversity of protein disorder, provide only a limited view. To overcome this problem, a bioinformatics approach for comprehensive study of functional roles of protein disorder was proposed in the first paper of this series (Xie H., Vucetic S., Iakoucheva L.M., Oldfield C.J., Dunker A.K., Obradovic Z., Uversky V.N. (2006) Functional anthology of intrinsic disorder. I. Biological processes and functions of proteins with long disordered regions. J. Proteome Res.). Applying this novel approach to Swiss-Prot sequences and functional keywords, we found over 238 and 302 keywords to be strongly positively or negatively correlated, respectively, with long intrinsically disordered regions. This paper describes ~90 Swiss-Prot keywords attributed to the cellular components, domains, technical terms, developmental processes and coding sequence diversities possessing strong positive and negative correlation with long disordered regions. PMID:17391015
Vucetic, Slobodan; Xie, Hongbo; Iakoucheva, Lilia M; Oldfield, Christopher J; Dunker, A Keith; Obradovic, Zoran; Uversky, Vladimir N
2007-05-01
Biologically active proteins without stable ordered structure (i.e., intrinsically disordered proteins) are attracting increased attention. Functional repertoires of ordered and disordered proteins are very different, and the ability to differentiate whether a given function is associated with intrinsic disorder or with a well-folded protein is crucial for modern protein science. However, there is a large gap between the number of proteins experimentally confirmed to be disordered and their actual number in nature. As a result, studies of functional properties of confirmed disordered proteins, while helpful in revealing the functional diversity of protein disorder, provide only a limited view. To overcome this problem, a bioinformatics approach for comprehensive study of functional roles of protein disorder was proposed in the first paper of this series (Xie, H.; Vucetic, S.; Iakoucheva, L. M.; Oldfield, C. J.; Dunker, A. K.; Obradovic, Z.; Uversky, V. N. Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. J. Proteome Res. 2007, 5, 1882-1898). Applying this novel approach to Swiss-Prot sequences and functional keywords, we found over 238 and 302 keywords to be strongly positively or negatively correlated, respectively, with long intrinsically disordered regions. This paper describes approximately 90 Swiss-Prot keywords attributed to the cellular components, domains, technical terms, developmental processes, and coding sequence diversities possessing strong positive and negative correlation with long disordered regions.
Significance of duon mutations in cancer genomes
NASA Astrophysics Data System (ADS)
Yadav, Vinod Kumar; Smith, Kyle S.; Flinders, Colin; Mumenthaler, Shannon M.; de, Subhajyoti
2016-06-01
Functional mutations in coding regions not only affect the structure and function of the protein products, but may also modulate their expression in some cases. This class of mutations, recently dubbed “duon mutations” due to their dual roles, can potentially have major impacts on downstream pathways. However their significance in diseases such as cancer remain unclear. In a survey covering 4606 samples from 19 cancer types, and integrating allelic expression, overall mRNA expression, regulatory motif perturbation, and chromatin signatures in one composite index called REDACT score, we identified potential duon mutations. Several such mutations are detected in known cancer genes in multiple cancer types. For instance a potential duon mutation in TP53 is associated with increased expression of the mutant allelic gene copy, thereby possibly amplifying the functional effects on the downstream pathways. Another potential duon mutation in SF3B1 is associated with abnormal splicing and changes in angiogenesis and matrix degradation related pathways. Our findings emphasize the need to interrogate the mutations in coding regions beyond their obvious effects on protein structures.
Cioffi, Anna Valentina; Ferrara, Diana; Cubellis, Maria Vittoria; Aniello, Francesco; Corrado, Marcella; Liguori, Francesca; Amoroso, Alessandro; Fucci, Laura; Branno, Margherita
2002-08-01
Analysis of the genome structure of the Paracentrotus lividus (sea urchin) DNA methyltransferase (DNA MTase) gene showed the presence of an open reading frame, named METEX, in intron 7 of the gene. METEX expression is developmentally regulated, showing no correlation with DNA MTase expression. In fact, DNA MTase transcripts are present at high concentrations in the early developmental stages, while METEX is expressed at late stages of development. Two METEX cDNA clones (Met1 and Met2) that are different in the 3' end have been isolated in a cDNA library screening. The putative translated protein from Met2 cDNA clone showed similarity with Escherichia coli endonuclease III on the basis of sequence and predictive three-dimensional structure. The protein, overexpressed in E. coli and purified, had functional properties similar to the endonuclease specific for apurinic/apyrimidinic (AP) sites on the basis of the lyase activity. Therefore the open reading frame, present in intron 7 of the P. lividus DNA MTase gene, codes for a functional AP endonuclease designated SuAP1.
Using cellular automata to generate image representation for biological sequences.
Xiao, X; Shao, S; Ding, Y; Huang, Z; Chen, X; Chou, K-C
2005-02-01
A novel approach to visualize biological sequences is developed based on cellular automata (Wolfram, S. Nature 1984, 311, 419-424), a set of discrete dynamical systems in which space and time are discrete. By transforming the symbolic sequence codes into the digital codes, and using some optimal space-time evolvement rules of cellular automata, a biological sequence can be represented by a unique image, the so-called cellular automata image. Many important features, which are originally hidden in a long and complicated biological sequence, can be clearly revealed thru its cellular automata image. With biological sequences entering into databanks rapidly increasing in the post-genomic era, it is anticipated that the cellular automata image will become a very useful vehicle for investigation into their key features, identification of their function, as well as revelation of their "fingerprint". It is anticipated that by using the concept of the pseudo amino acid composition (Chou, K.C. Proteins: Structure, Function, and Genetics, 2001, 43, 246-255), the cellular automata image approach can also be used to improve the quality of predicting protein attributes, such as structural class and subcellular location.
Jézéquel, Laetitia; Loeper, Jacqueline; Pompon, Denis
2008-11-01
Combinatorial libraries coding for mosaic enzymes with predefined crossover points constitute useful tools to address and model structure-function relationships and for functional optimization of enzymes based on multivariate statistics. The presented method, called sequence-independent generation of a chimera-ordered library (SIGNAL), allows easy shuffling of any predefined amino acid segment between two or more proteins. This method is particularly well adapted to the exchange of protein structural modules. The procedure could also be well suited to generate ordered combinatorial libraries independent of sequence similarities in a robotized manner. Sequence segments to be recombined are first extracted by PCR from a single-stranded template coding for an enzyme of interest using a biotin-avidin-based method. This technique allows the reduction of parental template contamination in the final library. Specific PCR primers allow amplification of two complementary mosaic DNA fragments, overlapping in the region to be exchanged. Fragments are finally reassembled using a fusion PCR. The process is illustrated via the construction of a set of mosaic CYP2B enzymes using this highly modular approach.
Molecular basis for the wide range of affinity found in Csr/Rsm protein-RNA recognition.
Duss, Olivier; Michel, Erich; Diarra dit Konté, Nana; Schubert, Mario; Allain, Frédéric H-T
2014-04-01
The carbon storage regulator/regulator of secondary metabolism (Csr/Rsm) type of small non-coding RNAs (sRNAs) is widespread throughout bacteria and acts by sequestering the global translation repressor protein CsrA/RsmE from the ribosome binding site of a subset of mRNAs. Although we have previously described the molecular basis of a high affinity RNA target bound to RsmE, it remains unknown how other lower affinity targets are recognized by the same protein. Here, we have determined the nuclear magnetic resonance solution structures of five separate GGA binding motifs of the sRNA RsmZ of Pseudomonas fluorescens in complex with RsmE. The structures explain how the variation of sequence and structural context of the GGA binding motifs modulate the binding affinity for RsmE by five orders of magnitude (∼10 nM to ∼3 mM, Kd). Furthermore, we see that conformational adaptation of protein side-chains and RNA enable recognition of different RNA sequences by the same protein contributing to binding affinity without conferring specificity. Overall, our findings illustrate how the variability in the Csr/Rsm protein-RNA recognition allows a fine-tuning of the competition between mRNAs and sRNAs for the CsrA/RsmE protein.
2010-01-01
Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT). Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the repeat may be disseminated by HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches. PMID:20626840
Goz, Eli; Zafrir, Zohar; Tuller, Tamir
2018-04-30
Understanding how viruses co-evolve with their hosts and adapt various genomic level strategies in order to ensure their fitness may have essential implications in unveiling the secrets of viral evolution, and in developing new vaccines and therapeutic approaches. Here, based on a novel genomic analysis of 2,625 different viruses and 439 corresponding host organisms, we provide evidence of universal evolutionary selection for high dimensional 'silent' patterns of information hidden in the redundancy of viral genetic code. Our model suggests that long substrings of nucleotides in the coding regions of viruses from all classes, often also repeat in the corresponding viral hosts from all domains of life. Selection for these substrings cannot be explained only by such phenomena as codon usage bias, horizontal gene transfer, and the encoded proteins. Genes encoding structural proteins responsible for building the core of the viral particles were found to include more host-repeating substrings, and these substrings tend to appear in the middle parts of the viral coding regions. In addition, in human viruses these substrings tend to be enriched with motives related to transcription factors and RNA binding proteins. The host-repeating substrings are possibly related to the evolutionary pressure on the viruses to effectively interact with host's intracellular factors and to efficiently escape from the host's immune system. tamirtul@post.tau.ac.il (TT). Supplementary data are available at Bioinformatics online.
NASA Astrophysics Data System (ADS)
Chen, Tao; Yang, Jie; Wang, Yuelang; Zhan, Chenyang; Zang, Yuhui; Qin, Junchuan
2005-05-01
Stem cell factor (SCF) and macrophage colony stimulating factor (M-CSF) can act in synergistic way to promote the growth of mononuclear phagocytes. SCF-M-CSF fusion proteins were designed on the computer using the Homology and Biopolymer modules of the software packages InsightII. Several existing crystal structures were used as templates to generate models of the complexes of receptor with fusion protein. The structure rationality of the fusion protein incorporated a series of flexible linker peptide was analyzed on InsightII system. Then, a suitable peptide GGGGSGGGGSGG was chosen for the fusion protein. Two recombinant SCF-M-CSF fusion proteins were generated by construction of a plasmid in which the coding regions of human SCF (1-165aa) and M-CSF (1-149aa) cDNA were connected by this linker peptide coding sequence followed by subsequent expression in insect cell. The results of Western blot and activity analysis showed that these two recombinant fusion proteins existed as a dimer with a molecular weight of 84 KD under non-reducing conditions and a monomer of 42 KD at reducing condition. The results of cell proliferation assays showed that each fusion protein induced a dose-dependent proliferative response. At equimolar concentration, SCF/M-CSF was about 20 times more potent than the standard monomeric SCF in stimulating TF-1 cell line growth, while M-CSF/SCF was 10 times of monomeric SCF. No activity difference of M-CSF/SCF or SCF/M-CSF to M-CSF (at same molar) was found in stimulating the HL-60 cell linear growth. The synergistic effect of SCF and M-CSF moieties in the fusion proteins was demonstrated by the result of clonogenic assay performed with human bone mononuclear, in which both SCF/M-CSF and M-CSF/SCF induced much higher number of CFU-M than equimolar amount of SCF or M-CSF or that of two cytokines mixture.
Kastner, Philomena; Mosgoeller, Wilhelm; Fang-Kircher, Susanne; Kitzmueller, Erwin; Kirchner, Liselotte; Hoeger, Harald; Seither, Peter; Lubec, Gert; Lubec, Barbara
2003-01-01
RNA polymerases (POL) are integral constituents of the protein synthesis machinery, with POL I and POL III coding for ribosomal RNA and POL II coding for protein. POL I is located in the nucleolus and transcribes class I genes, those that code for large ribosomal RNA. It has been reported that the POL system is seriously affected in perinatal asphyxia (PA) immediately after birth. Because POL I is necessary for protein synthesis and brain protein synthesis was shown to be deranged after hypoxic-ischemic conditions, we aimed to study whether POL derangement persists in a simple, well-documented animal model of graded global PA at the activity, mRNA, protein, and morphologic level until 8 d after the asphyctic insult. Nuclear POL I activity was determined according to a radiochemical method; mRNA steady state and protein levels of RPA4O-an essential subunit of POL I and III-were evaluated by blotting methods; and the POL I subunit polymerase activating factor-53 was evaluated using immunohistochemistry. Silver staining and transmission electron microscopy were used to examine the nucleolus. At the eighth day after PA, nuclear POL I decreased with the length of the asphyctic period, whereas mRNA and protein levels for RPA4O were unchanged. The subunit polymerase activating factor-53, however, was unambiguously reduced in several brain regions. Dramatic changes of nucleolar morphology were observed, the main finding being nucleolar disintegration at the electron microscopy level. We suggest that severe acidosis and/or deficient protein kinase C in the brain during the asphyctic period may be responsible for disintegration of the nucleolus as well as for decreased POL activity persisting until the eighth day after PA. The biologic effect may be that PA causes impaired RNA and protein synthesis, which has been already observed in hypoxic-ischemic states.
NASA Astrophysics Data System (ADS)
Faure, Guilhem; Koonin, Eugene V.
2015-05-01
Robustness to destabilizing effects of mutations is thought of as a key factor of protein evolution. The connections between two measures of robustness, the relative core size and the computationally estimated effect of mutations on protein stability (ΔΔG), protein abundance and the selection pressure on protein-coding genes (dN/dS) were analyzed for the organisms with a large number of available protein structures including four eukaryotes, two bacteria and one archaeon. The distribution of the effects of mutations in the core on protein stability is universal and indistinguishable in eukaryotes and bacteria, centered at slightly destabilizing amino acid replacements, and with a heavy tail of more strongly destabilizing replacements. The distribution of mutational effects in the hyperthermophilic archaeon Thermococcus gammatolerans is significantly shifted toward strongly destabilizing replacements which is indicative of stronger constraints that are imposed on proteins in hyperthermophiles. The median effect of mutations is strongly, positively correlated with the relative core size, in evidence of the congruence between the two measures of protein robustness. However, both measures show only limited correlations to the expression level and selection pressure on protein-coding genes. Thus, the degree of robustness reflected in the universal distribution of mutational effects appears to be a fundamental, ancient feature of globular protein folds whereas the observed variations are largely neutral and uncoupled from short term protein evolution. A weak anticorrelation between protein core size and selection pressure is observed only for surface residues in prokaryotes but a stronger anticorrelation is observed for all residues in eukaryotic proteins. This substantial difference between proteins of prokaryotes and eukaryotes is likely to stem from the demonstrable higher compactness of prokaryotic proteins.
Zhang, Yulong; Shao, Dandan; Cai, Miao; Yin, Hong; Zhang, Daochuan
2016-01-01
The complete mitochondrial genome of Gryllotalpa unispina was 15,513 bp in length and contained 70.9% AT. All G. unispina protein-coding sequences except for the nad2 started with a typical ATN codon. The usual termination codons (TAA) and incomplete stop codons (T) were found from 13 protein-coding genes. All tRNA genes were folded into the typical cloverleaf secondary structure, except trnS(AGN) lacking the dihydrouridine arm. The sizes of the large and small ribosomal RNA genes were 1245 and 725 bp, respectively. The A + T-rich region was 917 bp in length with 76.8%. The orientation and gene order of the G. unispina mitogenome were identical to the G. orientalis and G. pluvialis, there was no phenomenon of "DK rearrangement" which has been widely reported in Caelifera.
Yatawara, Lalani; Wickramasinghe, Susiji; Rajapakse, R P V J; Agatsuma, Takeshi
2010-09-01
In the present study, we determined the complete mitochondrial (mt) genome sequence (13,839bp) of parasitic nematode Setaria digitata and its structure and organization compared with Onchocerca volvulus, Dirofilaria immitis and Brugia malayi. The mt genome of S. digitata is slightly larger than the mt genomes of other filarial nematodes. S. digitata mt genome contains 36 genes (12 protein-coding genes, 22 transfer RNAs and 2 ribosomal RNAs) that are typically found in metazoans. This genome contains a high A+T (75.1%) content and low G+C content (24.9%). The mt gene order for S. digitata is the same as those for O. volvulus, D. immitis and B. malayi but it is distinctly different from other nematodes compared. The start codons inferred in the mt genome of S. digitata are TTT, ATT, TTG, ATG, GTT and ATA. Interestingly, the initiation codon TTT is unique to S. digitata mt genome and four protein-coding genes use this codon as a translation initiation codon. Five protein-coding genes use TAG as a stop codon whereas three genes use TAA and four genes use T as a termination codon. Out of 64 possible codons, only 57 are used for mitochondrial protein-coding genes of S. digitata. T-rich codons such as TTT (18.9%), GTT (7.9%), TTG (7.8%), TAT (7%), ATT (5.7%), TCT (4.8%) and TTA (4.1%) are used more frequently. This pattern of codon usage reflects the strong bias for T in the mt genome of S. digitata. In conclusion, the present investigation provides new molecular data for future studies of the comparative mitochondrial genomics and systematic of parasitic nematodes of socio-economic importance. 2010 Elsevier B.V. All rights reserved.
Mu, Chuang; Wang, Ruijia; Li, Tianqi; Li, Yuqiang; Tian, Meilin; Jiao, Wenqian; Huang, Xiaoting; Zhang, Lingling; Hu, Xiaoli; Wang, Shi; Bao, Zhenmin
2016-08-01
Long non-coding RNA (lncRNA) structurally resembles mRNA but cannot be translated into protein. Although the systematic identification and characterization of lncRNAs have been increasingly reported in model species, information concerning non-model species is still lacking. Here, we report the first systematic identification and characterization of lncRNAs in two sea cucumber species: (1) Apostichopus japonicus during lipopolysaccharide (LPS) challenge and in heathy tissues and (2) Holothuria glaberrima during radial organ complex regeneration, using RNA-seq datasets and bioinformatics analysis. We identified A. japonicus and H. glaberrima lncRNAs that were differentially expressed during LPS challenge and radial organ complex regeneration, respectively. Notably, the predicted lncRNA-microRNA-gene trinities revealed that, in addition to targeting protein-coding transcripts, miRNAs might also target lncRNAs, thereby participating in a potential novel layer of regulatory interactions among non-coding RNA classes in echinoderms. Furthermore, the constructed coding-non-coding network implied the potential involvement of lncRNA-gene interactions during the regulation of several important genes (e.g., Toll-like receptor 1 [TLR1] and transglutaminase-1 [TGM1]) in response to LPS challenge and radial organ complex regeneration in sea cucumbers. Overall, this pioneer systematic identification, annotation, and characterization of lncRNAs in echinoderm pave the way for similar studies and future genetic, genomic, and evolutionary research in non-model species.
Coding of Class I and II aminoacyl-tRNA synthetases
Carter, Charles W.
2018-01-01
SUMMARY The aminoacyl-tRNA synthetases and their cognate transfer RNAs translate the universal genetic code. The twenty canonical amino acids are sufficiently diverse to create a selective advantage for dividing amino acid activation between two distinct, apparently unrelated superfamilies of synthetases, Class I amino acids being generally larger and less polar, Class II amino acids smaller and more polar. Biochemical, bioinformatic, and protein engineering experiments support the hypothesis that the two Classes descended from opposite strands of the same ancestral gene. Parallel experimental deconstructions of Class I and II synthetases reveal parallel losses in catalytic proficiency at two novel modular levels—protozymes and Urzymes—associated with the evolution of catalytic activity. Bi-directional coding supports an important unification of the proteome; affords a genetic relatedness metric—middle base-pairing frequencies in sense/antisense alignments—that probes more deeply into the evolutionary history of translation than do single multiple sequence alignments; and has facilitated the analysis of hitherto unknown coding relationships in tRNA sequences. Reconstruction of native synthetases by modular thermodynamic cycles facilitated by domain engineering emphasizes the subtlety associated with achieving high specificity, shedding new light on allosteric relationships in contemporary synthetases. Synthetase Urzyme structural biology suggests that they are catalytically active molten globules, broadening the potential manifold of polypeptide catalysts accessible to primitive genetic coding and motivating revisions of the origins of catalysis. Finally, bi-directional genetic coding of some of the oldest genes in the proteome places major limitations on the likelihood that any RNA World preceded the origins of coded proteins. PMID:28828732
Identification & Characterization of Fungal Ice Nucleation Proteins
NASA Astrophysics Data System (ADS)
Scheel, Jan Frederik; Kunert, Anna Theresa; Kampf, Christopher Johannes; Mauri, Sergio; Weidner, Tobias; Pöschl, Ulrich; Fröhlich-Nowoisky, Janine
2016-04-01
Freezing of water at relatively warm subfreezing temperatures is dependent on ice nucleation catalysis facilitated by ice nuclei (IN). These IN can be of various origins and although extensive research was done and progress was achieved, the nature and mechanisms leading to an effective IN are to date still poorly understood. Some of the most important processes of our geosphere like the water cycle are highly dependent on effective ice nucleation at temperatures between -2°C - -8°C, a temperature range which is almost exclusively covered by biological IN (BioIN). BioIN are usually macromolecular structures of biological polymers. Sugars as well as proteins have been reported to serve as IN and the best characterized BioIN are ice nucleation proteins (IN-P) from gram negative bacteria. Fungal strains from Fusarium spp. were described to be effective IN at subfreezing temperatures up to -2°C already 25 years ago and more and more fungal species are described to serve as efficient IN. Fungal IN are also thought to be proteins or at least contain a proteinaceous compound, but to date the fungal IN-P primary structure as well as their coding genetic elements of all IN active fungi are unknown. The aim of this study is a.) to identify the proteins and their coding genetic elements from IN active fungi (F. acuminatum, F. avenaceum, M. alpina) and b.) to characterize the mechanisms by which fungal IN serve as effective IN. We designed an interdisciplinary approach using biological, analytical and physical methods to identify fungal IN-P and describe their biological, chemical, and physical properties.
The Diversity of Yellow-Related Proteins in Sand Flies (Diptera: Psychodidae)
Sima, Michal; Novotny, Marian; Pravda, Lukas; Sumova, Petra; Rohousova, Iva; Volf, Petr
2016-01-01
Yellow-related proteins (YRPs) present in sand fly saliva act as affinity binders of bioamines, and help the fly to complete a bloodmeal by scavenging the physiological signals of damaged cells. They are also the main antigens in sand fly saliva and their recombinant form is used as a marker of host exposure to sand flies. Moreover, several salivary proteins and plasmids coding these proteins induce strong immune response in hosts bitten by sand flies and are being used to design protecting vaccines against Leishmania parasites. In this study, thirty two 3D models of different yellow-related proteins from thirteen sand fly species of two genera were constructed based on the known protein structure from Lutzomyia longipalpis. We also studied evolutionary relationships among species based on protein sequences as well as sequence and structural variability of their ligand-binding site. All of these 33 sand fly YRPs shared a similar structure, including a unique tunnel that connects the ligand-binding site with the solvent by two independent paths. However, intraspecific modifications found among these proteins affects the charges of the entrances to the tunnel, the length of the tunnel and its hydrophobicity. We suggest that these structural and sequential differences influence the ligand-binding abilities of these proteins and provide sand flies with a greater number of YRP paralogs with more nuanced answers to bioamines. All these characteristics allow us to better evaluate these proteins with respect to their potential use as part of anti-Leishmania vaccines or as an antigen to measure host exposure to sand flies. PMID:27812196
Veiga, Ana B. G.; Ribeiro, José M. C.; Guimarães, Jorge A.; Francischetti, Ivo M.B.
2010-01-01
Accidents with the caterpillar Lonomia obliqua are often associated with a coagulation disorder and hemorrhagic syndrome in humans. In the present study, we have constructed cDNA libraries from two venomous structures of the caterpillar, namely the tegument and the bristle. High-throughput sequencing and bioinformatics analyses were performed in parallel. Over one thousand cDNAs were obtained and clustered to produce a database of 538 contigs and singletons (clusters) for the tegument library and 368 for the bristle library. We have thus identified dozens of full-length cDNAs coding for proteins with sequence homology to snake venom prothrombin activator, trypsin-like enzymes, blood coagulation factors and prophenoloxidase cascade activators. We also report cDNA coding for cysteine proteases, Group III phospholipase A2, C-type lectins, lipocalins, in addition to protease inhibitors including serpins, Kazal-type inhibitors, cystatins and trypsin inhibitor-like molecules. Antibacterial proteins and housekeeping genes are also described. A significant number of sequences were devoid of database matches, suggesting that their biologic function remains to be defined. We also report the N-terminus of the most abundant proteins present in the bristle, tegument, hemolymph, and "cryosecretion". Thus, we have created a catalog that contains the predicted molecular weight, isoelectric point, accession number, and putative function for each selected molecule from the venomous structures of L. obliqua. The role of these molecules in the coagulation disorder and hemorrhagic syndrome caused by envenomation with this caterpillar is discussed. All sequence information and the Supplemental Data, including Figures and Tables with hyperlinks to FASTA-formatted files for each contig and the best match to the Databases, are available at http://www.ncbi.nih.gov/projects/omes. PMID:16023793
Benítez-Burraco, A
FOXP2 is the first gene linked to a hereditary variant of specific language impairment and seems to code for a transcriptional repressor that intervenes in the regulation of the development and the functioning of certain thalamic-cortical-striatal circuits. In the last three years, significant progress has been made in the determination of the structural and functional properties of the gene. These advances essentially have to do with the precise analysis of the most important structural motifs of the protein that it codes for and the main parameters that determine its interaction with DNA. They also concern the determination of the functional and behavioural properties in vivo of the main isoforms of the FOXP2 protein, the exact determination of the pattern of expression of new orthologues of the gene, and the identification of the different target genes for factor FOXP2. This new evidence suggests that protein FOXP2 protein has a high degree of versatility in vivo when it comes to binding to DNA; that its different isoforms are biologically functional; and that the FOXP2 gene is functional during embryonic development and during the adult phase. It also suggests that it is involved in the development and/or functioning of the thalamic-cortical-striatal circuits associated to motor planning, sequential behaviour and procedural learning (a significant saving in developmental terms of the regulatory mechanism in which the gene is involved), as well as the accuracy of the models of linguistic processing that consider language to be, to a large extent, the result of an interaction between certain cortical and subcortical structures.
Solís, D; Jiménez-Barbero, J; Kaltner, H; Romero, A; Siebert, H C; von der Lieth, C W; Gabius, H J
2001-01-01
The term 'code' in biological information transfer appears to be tightly and hitherto exclusively connected with the genetic code based on nucleotides and translated into functional activities via proteins. However, the recent appreciation of the enormous coding capacity of oligosaccharide chains of natural glycoconjugates has spurred to give heed to a new concept: versatile glycan assembly by the genetically encoded glycosyltransferases endows cells with a probably not yet fully catalogued array of meaningful messages. Enciphered by sugar receptors such as endogenous lectins the information of code words established by a series of covalently linked monosaccharides as letters for example guides correct intra- and intercellular routing of glycoproteins, modulates cell proliferation or migration and mediates cell adhesion. Evidently, the elucidation of the structural frameworks and the recognition strategies within the operation of the sugar code poses a fascinating conundrum. The far-reaching impact of this recognition mode on the level of cells, tissues and organs has fueled vigorous investigations to probe the subtleties of protein-carbohydrate interactions. This review presents information on the necessarily concerted approach using X-ray crystallography, molecular modeling, nuclear magnetic resonance spectroscopy, thermodynamic analysis and engineered ligands and receptors. This part of the treatise is flanked by exemplarily chosen insights made possible by these techniques. Copyright 2001 S. Karger AG, Basel
Prestes, R C; Silva, L B; Torri, A M P; Kubota, E H; Rosa, C S; Roman, S S; Kempka, A P; Demiate, I M
2015-07-01
The objective of this work was to evaluate the effect of adding different starches (native and modified) on the physicochemical, sensory, structural and microbiological characteristics of low-fat chicken mortadella. Two formulations containing native cassava and regular corn starch, coded CASS (5.0 % of cassava starch) and CORN (5.0 % of regular corn starch), and one formulation produced with physically treated starch coded as MOD1 (2.5 % of Novation 2300) and chemically modified starch coded as MOD2 (2.5 % of Thermtex) were studied. The following tests were performed: physicochemical characterization (moisture, ash, protein, starch and lipid contents, and water activity); cooling, freezing and reheating losses; texture (texture profile test); color coordinates (L*, a*, b*, C and h); microbiological evaluation; sensory evaluation (multiple comparison and preference test); and histological evaluation (light microscopy). There was no significant difference (p > 0.05) for ash, protein, cooling loss, cohesiveness or in the preference test for the tested samples. The other evaluated parameters showed significant differences (p < 0.05). Histological study allowed for a qualitative evaluation between the physical properties of the food and its microscopic structure. The best results were obtained for formulation MOD2 (2.5 % Thermtex). The addition of modified starch resulted in a better performance than the native starch in relation to the evaluated technological parameters, mainly in relation to reheating losses, which demonstrated the good interaction between the modified starch in the structure of the product and the possibility of the application of this type of starch in other types of functional meat products.
Beltrame, M; Bianchi, M E
1990-01-01
We have cloned the genes for small acidic ribosomal proteins (A-proteins) of the fission yeast Schizosaccharomyces pombe. S. pombe contains four transcribed genes for small A-proteins per haploid genome, as is the case for Saccharomyces cerevisiae. In contrast, multicellular eucaryotes contain two transcribed genes per haploid genome. The four proteins of S. pombe, besides sharing a high overall similarity, form two couples of nearly identical sequences. Their corresponding genes have a very conserved structure and are transcribed to a similar level. Surprisingly, of each couple of genes coding for nearly identical proteins, one is essential for cell growth, whereas the other is not. We suggest that the unequal importance of the four small A-proteins for cell survival is related to their physical organization in 60S ribosomal subunits. Images PMID:2325655
Westholm, Jakub O.; Miura, Pedro; Olson, Sara; Shenker, Sol; Joseph, Brian; Sanfilippo, Piero; Celniker, Susan E.; Graveley, Brenton R.; Lai, Eric C.
2014-01-01
Circularization was recently recognized to broadly expand transcriptome complexity. Here, we exploit massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues and cultured cells, to rigorously annotate >2500 fruitfly circular RNAs. These mostly derive from back-splicing of protein-coding genes and lack poly(A) tails, and circularization of hundreds of genes is conserved across multiple Drosophila species. We elucidate structural and sequence properties of Drosophila circular RNAs, which exhibit commonalities and distinctions from mammalian circles. Notably, Drosophila circular RNAs harbor >1000 well-conserved canonical miRNA seed matches, especially within coding regions, and coding conserved miRNA sites reside preferentially within circularized exons. Finally, we analyze the developmental and tissue specificity of circular RNAs, and note their preferred derivation from neural genes and enhanced accumulation in neural tissues. Interestingly, circular isoforms increase dramatically relative to linear isoforms during CNS aging, and constitute a novel aging biomarker. PMID:25544350
Decoding Mechanisms by which Silent Codon Changes Influence Protein Biogenesis and Function
Bali, Vedrana; Bebok, Zsuzsanna
2015-01-01
Scope Synonymous codon usage has been a focus of investigation since the discovery of the genetic code and its redundancy. The occurrences of synonymous codons vary between species and within genes of the same genome, known as codon usage bias. Today, bioinformatics and experimental data allow us to compose a global view of the mechanisms by which the redundancy of the genetic code contributes to the complexity of biological systems from affecting survival in prokaryotes, to fine tuning the structure and function of proteins in higher eukaryotes. Studies analyzing the consequences of synonymous codon changes in different organisms have revealed that they impact nucleic acid stability, protein levels, structure and function without altering amino acid sequence. As such, synonymous mutations inevitably contribute to the pathogenesis of complex human diseases. Yet, fundamental questions remain unresolved regarding the impact of silent mutations in human disorders. In the present review we describe developments in this area concentrating on mechanisms by which synonymous mutations may affect protein function and human health. Purpose This synopsis illustrates the significance of synonymous mutations in disease pathogenesis. We review the different steps of gene expression affected by silent mutations, and assess the benefits and possible harmful effects of codon optimization applied in the development of therapeutic biologics. Physiological and medical relevance Understanding mechanisms by which synonymous mutations contribute to complex diseases such as cancer, neurodegeneration and genetic disorders, including the limitations of codon-optimized biologics, provides insight concerning interpretation of silent variants and future molecular therapies. PMID:25817479
Marsella, Luca; Sirocco, Francesco; Trovato, Antonio; Seno, Flavio; Tosatto, Silvio C.E.
2009-01-01
Motivation: Proteins with solenoid repeats evolve more quickly than non-repetitive ones and their periodicity may be rapidly hidden at sequence level, while still evident in structure. In order to identify these repeats, we propose here a novel method based on a metric characterizing amino-acid properties (polarity, secondary structure, molecular volume, codon diversity, electric charge) using five previously derived numerical functions. Results: The five spectra of the candidate sequences coding for structural repeats, obtained by Discrete Fourier Transform (DFT), show common features allowing determination of repeat periodicity with excellent results. Moreover it is possible to introduce a phase space parameterized by two quantities related to the Fourier spectra which allow for a clear distinction between a non-homologous set of globular proteins and proteins with solenoid repeats. The DFT method is shown to be competitive with other state of the art methods in the detection of solenoid structures, while improving its performance especially in the identification of periodicities, since it is able to recognize the actual repeat length in most cases. Moreover it highlights the relevance of local structural propensities in determining solenoid repeats. Availability: A web tool implementing the algorithm presented in the article (REPETITA) is available with additional details on the data sets at the URL: http://protein.bio.unipd.it/repetita/. Contact: silvio.tosatto@unipd.it PMID:19478001
CCBuilder 2.0: Powerful and accessible coiled-coil modeling.
Wood, Christopher W; Woolfson, Derek N
2018-01-01
The increased availability of user-friendly and accessible computational tools for biomolecular modeling would expand the reach and application of biomolecular engineering and design. For protein modeling, one key challenge is to reduce the complexities of 3D protein folds to sets of parametric equations that nonetheless capture the salient features of these structures accurately. At present, this is possible for a subset of proteins, namely, repeat proteins. The α-helical coiled coil provides one such example, which represents ≈ 3-5% of all known protein-encoding regions of DNA. Coiled coils are bundles of α helices that can be described by a small set of structural parameters. Here we describe how this parametric description can be implemented in an easy-to-use web application, called CCBuilder 2.0, for modeling and optimizing both α-helical coiled coils and polyproline-based collagen triple helices. This has many applications from providing models to aid molecular replacement for X-ray crystallography, in silico model building and engineering of natural and designed protein assemblies, and through to the creation of completely de novo "dark matter" protein structures. CCBuilder 2.0 is available as a web-based application, the code for which is open-source and can be downloaded freely. http://coiledcoils.chm.bris.ac.uk/ccbuilder2. We have created CCBuilder 2.0, an easy to use web-based application that can model structures for a whole class of proteins, the α-helical coiled coil, which is estimated to account for 3-5% of all proteins in nature. CCBuilder 2.0 will be of use to a large number of protein scientists engaged in fundamental studies, such as protein structure determination, through to more-applied research including designing and engineering novel proteins that have potential applications in biotechnology. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dyer, K.D.; Handen, J.S.; Rosenberg, H.F.
The Charcot-Leyden crystal (CLC) protein, or eosinophil lysophospholipase, is a characteristic protein of human eosinophils and basophils; recent work has demonstrated that the CLC protein is both structurally and functionally related to the galectin family of {beta}-galactoside binding proteins. The galectins as a group share a number of features in common, including a linear ligand binding site encoded on a single exon. In this work, we demonstrate that the intron-exon structure of the gene encoding CLC is analogous to those encoding the galectins. The coding sequence of the CLC gene is divided into four exons, with the entire {beta}-galactoside bindingmore » site encoded by exon III. We have isolated CLC {beta}-galactoside binding sites from both orangutan (Pongo pygmaeus) and murine (Mus musculus) genomic DNAs, both encoded on single exons, and noted conservation of the amino acids shown to interact directly with the {beta}-galactoside ligand. The most likely interpretation of these results suggests the occurrence of one or more exon duplication and insertion events, resulting in the distribution of this lectin domain to CLC as well as to the multiple galectin genes. 35 refs., 3 figs.« less
Masso, Majid; Vaisman, Iosif I
2014-01-01
The AUTO-MUTE 2.0 stand-alone software package includes a collection of programs for predicting functional changes to proteins upon single residue substitutions, developed by combining structure-based features with trained statistical learning models. Three of the predictors evaluate changes to protein stability upon mutation, each complementing a distinct experimental approach. Two additional classifiers are available, one for predicting activity changes due to residue replacements and the other for determining the disease potential of mutations associated with nonsynonymous single nucleotide polymorphisms (nsSNPs) in human proteins. These five command-line driven tools, as well as all the supporting programs, complement those that run our AUTO-MUTE web-based server. Nevertheless, all the codes have been rewritten and substantially altered for the new portable software, and they incorporate several new features based on user feedback. Included among these upgrades is the ability to perform three highly requested tasks: to run "big data" batch jobs; to generate predictions using modified protein data bank (PDB) structures, and unpublished personal models prepared using standard PDB file formatting; and to utilize NMR structure files that contain multiple models.
Identifying functionally informative evolutionary sequence profiles.
Gil, Nelson; Fiser, Andras
2018-04-15
Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.
Naville, M; Warren, I A; Haftek-Terreau, Z; Chalopin, D; Brunet, F; Levin, P; Galiana, D; Volff, J-N
2016-04-01
Viruses and transposable elements, once considered as purely junk and selfish sequences, have repeatedly been used as a source of novel protein-coding genes during the evolution of most eukaryotic lineages, a phenomenon called 'molecular domestication'. This is exemplified perfectly in mammals and other vertebrates, where many genes derived from long terminal repeat (LTR) retroelements (retroviruses and LTR retrotransposons) have been identified through comparative genomics and functional analyses. In particular, genes derived from gag structural protein and envelope (env) genes, as well as from the integrase-coding and protease-coding sequences, have been identified in humans and other vertebrates. Retroelement-derived genes are involved in many important biological processes including placenta formation, cognitive functions in the brain and immunity against retroelements, as well as in cell proliferation, apoptosis and cancer. These observations support an important role of retroelement-derived genes in the evolution and diversification of the vertebrate lineage. Copyright © 2016 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
Motomura, Kenta; Nakamura, Morikazu; Otaki, Joji M.
2013-01-01
Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-redundant amino acid database relative to their probabilistically expected frequencies, demonstrate the biological usage bias of SCSs. As a result, this frequency-based linguistic approach is expected to have diverse applications, such as secondary structure specifications by structure-specific SCSs and immunological adjuvants with rare or non-existent SCSs. Linguistic similarities (e.g., wide ranges of scale-free distributions) and dissimilarities (e.g., behaviors of low-rank samples) between proteins and the natural English language have been revealed in the rank-frequency relationships of SCSs or words. We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept. These tools have the potential to assist researchers in deciphering structurally and functionally important protein sites, species-specific sequences, and functional relationships between SCSs. The SCS Package also provides researchers with a tool to construct amino acid sequences de novo based on the idiomatic usage of SCSs. PMID:24688703
Motomura, Kenta; Nakamura, Morikazu; Otaki, Joji M
2013-01-01
Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-redundant amino acid database relative to their probabilistically expected frequencies, demonstrate the biological usage bias of SCSs. As a result, this frequency-based linguistic approach is expected to have diverse applications, such as secondary structure specifications by structure-specific SCSs and immunological adjuvants with rare or non-existent SCSs. Linguistic similarities (e.g., wide ranges of scale-free distributions) and dissimilarities (e.g., behaviors of low-rank samples) between proteins and the natural English language have been revealed in the rank-frequency relationships of SCSs or words. We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept. These tools have the potential to assist researchers in deciphering structurally and functionally important protein sites, species-specific sequences, and functional relationships between SCSs. The SCS Package also provides researchers with a tool to construct amino acid sequences de novo based on the idiomatic usage of SCSs.
Mix, Heiko; Lobanov, Alexey V.; Gladyshev, Vadim N.
2007-01-01
Expression of selenocysteine (Sec)-containing proteins requires the presence of a cis-acting mRNA structure, called selenocysteine insertion sequence (SECIS) element. In bacteria, this structure is located in the coding region immediately downstream of the Sec-encoding UGA codon, whereas in eukaryotes a completely different SECIS element has evolved in the 3′-untranslated region. Here, we report that SECIS elements in the coding regions of selenoprotein mRNAs support Sec insertion in higher eukaryotes. Comprehensive computational analysis of all available viral genomes revealed a SECIS element within the ORF of a naturally occurring selenoprotein homolog of glutathione peroxidase 4 in fowlpox virus. The fowlpox SECIS element supported Sec insertion when expressed in mammalian cells as part of the coding region of viral or mammalian selenoproteins. In addition, readthrough at UGA was observed when the viral SECIS element was located upstream of the Sec codon. We also demonstrate successful de novo design of a functional SECIS element in the coding region of a mammalian selenoprotein. Our data provide evidence that the location of the SECIS element in the untranslated region is not a functional necessity but rather is an evolutionary adaptation to enable a more efficient synthesis of selenoproteins. PMID:17169995
Wen, Jingran; Scoles, Daniel R.; Facelli, Julio C.
2017-01-01
Spinocerebellar ataxia type 2 (SCA2) and type 3 (SCA3) are two common autosomal-dominant inherited ataxia syndromes, both of which are related to the unstable expansion of tri-nucleotide CAG repeats in the coding region of the related ATXN2 and ATXN3 genes, respectively. The poly-glutamine (poly-Q) tract encoded by the CAG repeats has long been recognized as an important factor in disease pathogenesis and progress. In this study, using the I-TASSER method for 3D structure prediction, we investigated the effect of poly-Q tract enlargement on the structure and folding of ataxin-2 and ataxin-3 proteins. Our results show good agreement with the known experimental structures of the Josephin and UIM domains providing credence to the simulation results presented here, which show that the enlargement of the poly-Q region not only affects the local structure of these regions but also affects the structures of functional domains as well as the whole protein. The changes observed in the predicted models of the UIM domains in ataxin-3 when the poly-Q track is enlarged provide new insights on possible pathogenic mechanisms. PMID:26861241
Protein collapse is encoded in the folded state architecture.
Samanta, Himadri S; Zhuravlev, Pavel I; Hinczewski, Michael; Hori, Naoto; Chakrabarti, Shaon; Thirumalai, D
2017-05-21
Folded states of single domain globular proteins are compact with high packing density. The radius of gyration, R g , of both the folded and unfolded states increase as N ν where N is the number of amino acids in the protein. The values of the Flory exponent ν are, respectively, ≈⅓ and ≈0.6 in the folded and unfolded states, coinciding with those for homopolymers. However, the extent of compaction of the unfolded state of a protein under low denaturant concentration (collapsibility), conditions favoring the formation of the folded state, is unknown. We develop a theory that uses the contact map of proteins as input to quantitatively assess collapsibility of proteins. Although collapsibility is universal, the propensity to be compact depends on the protein architecture. Application of the theory to over two thousand proteins shows that collapsibility depends not only on N but also on the contact map reflecting the native structure. A major prediction of the theory is that β-sheet proteins are far more collapsible than structures dominated by α-helices. The theory and the accompanying simulations, validating the theoretical predictions, provide insights into the differing conclusions reached using different experimental probes assessing the extent of compaction of proteins. By calculating the criterion for collapsibility as a function of protein length we provide quantitative insights into the reasons why single domain proteins are small and the physical reasons for the origin of multi-domain proteins. Collapsibility of non-coding RNA molecules is similar β-sheet proteins structures adding support to "Compactness Selection Hypothesis".
DeepQA: improving the estimation of single protein model quality with deep belief networks.
Cao, Renzhi; Bhattacharya, Debswapna; Hou, Jie; Cheng, Jianlin
2016-12-05
Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem. We introduce a novel single-model quality assessment method DeepQA based on deep belief network that utilizes a number of selected features describing the quality of a model from different perspectives, such as energy, physio-chemical characteristics, and structural information. The deep belief network is trained on several large datasets consisting of models from the Critical Assessment of Protein Structure Prediction (CASP) experiments, several publicly available datasets, and models generated by our in-house ab initio method. Our experiments demonstrate that deep belief network has better performance compared to Support Vector Machines and Neural Networks on the protein model quality assessment problem, and our method DeepQA achieves the state-of-the-art performance on CASP11 dataset. It also outperformed two well-established methods in selecting good outlier models from a large set of models of mostly low quality generated by ab initio modeling methods. DeepQA is a useful deep learning tool for protein single model quality assessment and protein structure prediction. The source code, executable, document and training/test datasets of DeepQA for Linux is freely available to non-commercial users at http://cactus.rnet.missouri.edu/DeepQA/ .
Chen, Yang; Li, Yaxing; Zhong, Jiayong; Zhang, Jing; Chen, Zhipeng; Yang, Lijuan; Cao, Xin; He, Qing-Yu; Zhang, Gong; Wang, Tong
2015-09-04
Finding protein evidence (PE) for protein coding genes is a primary task of the Phase I Chromosome-Centric Human Proteome Project (C-HPP). Currently, there are 2948 PE level 2-4 coding genes per neXtProt, which are deemed missing proteins in the human proteome. As most samples prepared and analyzed in the C-HPP framework were focusing on detergent soluble proteins, we posit that as a natural composition the cytoplasmic detergent-insoluble proteins (DIPs) represent a source of finding missing proteins. We optimized a workflow and separated cytoplasmic DIPs from three human lung and three human hepatoma cell lines via differential speed centrifugation. We verified that the detergent-soluble proteins (DSPs) could be sufficiently depleted and the cytoplasmic DIP isolation was partially reproducible with Spearman r > 0.70 according to two independent SILAC MS experiments. Through label-free MS, we identified 4524 and 4156 DIPs from lung and liver cells, respectively. Among them, a total of 23 missing proteins (22 PE2 and 1 PE4) were identified by MS, and 18 of them had translation evidence; in addition, six PE5 proteins were identified by MS, three with translation evidence. We showed that cytoplasmic DIPs were not an enrichment of transmembrane proteins and were chromosome-, cell type-, and tissue-specific. Furthermore, we demonstrated that DIPs were distinct from DSPs in terms of structural and physical-chemical features. In conclusion, we have found 23 missing proteins and 6 PE5 proteins from the cytoplasmic insoluble proteome that is biologically and physical-chemically different from the soluble proteome, suggesting that cytoplasmic DIPs carry comprehensive and valuable information for finding PE of missing proteins. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium with the data set identifier PXD001694.
Proudhon, D; Wei, J; Briat, J; Theil, E C
1996-03-01
Ferritin, a protein widespread in nature, concentrates iron approximately 10(11)-10(12)-fold above the solubility within a spherical shell of 24 subunits; it derives in plants and animals from a common ancestor (based on sequence) but displays a cytoplasmic location in animals compared to the plastid in contemporary plants. Ferritin gene regulation in plants and animals is altered by development, hormones, and excess iron; iron signals target DNA in plants but mRNA in animals. Evolution has thus conserved the two end points of ferritin gene expression, the physiological signals and the protein structure, while allowing some divergence of the genetic mechanisms. Comparison of ferritin gene organization in plants and animals, made possible by the cloning of a dicot (soybean) ferritin gene presented here and the recent cloning of two monocot (maize) ferritin genes, shows evolutionary divergence in ferritin gene organization between plants and animals but conservation among plants or among animals; divergence in the genetic mechanism for iron regulation is reflected by the absence in all three plant genes of the IRE, a highly conserved, noncoding sequence in vertebrate animal ferritin mRNA. In plant ferritin genes, the number of introns (n = 7) is higher than in animals (n = 3). Second, no intron positions are conserved when ferritin genes of plants and animals are compared, although all ferritin gene introns are in the coding region; within kingdoms, the intron positions in ferritin genes are conserved. Finally, secondary protein structure has no apparent relationship to intron/exon boundaries in plant ferritin genes, whereas in animal ferritin genes the correspondence is high. The structural differences in introns/exons among phylogenetically related ferritin coding sequences and the high conservation of the gene structure within plant or animal kingdoms of the gene structure within plant or animal kingdoms suggest that kingdom-specific functional constraints may exist to maintain a particular intron/exon pattern within ferritin genes. In the case of plants, where ferritin gene intron placement is unrelated to triplet codons or protein structure, and where ferritin is targeted to the plastid, the selection pressure on gene organization may relate to RNA function and plastid/nuclear signaling.
Subramaniam, Saravanan; Mohapatra, Jajati K; Das, Biswajit; Sharma, Gaurav K; Biswal, Jitendra K; Mahajan, Sonalika; Misri, Jyoti; Dash, Bana B; Pattnaik, Bramhadev
2015-07-01
Foot-and-mouth disease virus (FMDV) serotype Asia1 was first reported in India in 1951, where three major genetic lineages (B, C and D) of this serotype have been described until now. In this study, the capsid protein coding region of serotype Asia1 viruses (n = 99) from India were analyzed, giving importance to the viruses circulating since 2007. All of the isolates (n = 50) recovered during 2007-2013 were found to group within the re-emerging cluster of lineage C (designated as sublineage C(R)). The evolutionary rate of sublineage C(R) was estimated to be slightly higher than that of the serotype as a whole, and the time of the most recent common ancestor for this cluster was estimated to be approximately 2001. In comparison to the older isolates of lineage C (1993-2001), the re-emerging viruses showed variation at eight amino acid positions, including substitutions at the antigenically critical residues VP279 and VP2131. However, no direct correlation was found between sequence variations and antigenic relationships. The number of codons under positive selection and the nature of the selection pressure varied widely among the structural proteins, implying a heterogeneous pattern of evolution in serotype Asia1. While episodic diversifying selection appears to play a major role in shaping the evolution of VP1 and VP3, selection pressure acting on codons of VP2 is largely pervasive. Further, episodic positive selection appears to be responsible for the early diversification of lineage C. Recombination events identified in the structural protein coding region indicates its probable role in adaptive evolution of serotype Asia1 viruses.
Bioinformatics of prokaryotic RNAs
Backofen, Rolf; Amman, Fabian; Costa, Fabrizio; Findeiß, Sven; Richter, Andreas S; Stadler, Peter F
2014-01-01
The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes. PMID:24755880
Structure and expression of canary myc family genes.
Collum, R G; Clayton, D F; Alt, F W
1991-01-01
We found that the canary N-myc gene is highly related to mammalian N-myc genes in both the protein-coding region and the long 3' untranslated region. Examined coding regions of the canary c-myc gene were also highly related to their mammalian counterparts, but in contrast to N-myc, the canary and mammalian c-myc genes were quite divergent in their 3' untranslated regions. We readily detected N-myc and c-myc expression in the adult canary brain and found N-myc expression both at sites of proliferating neuronal precursors and in mature neurons. Images PMID:1996121
The complete mitochondrial genome of the butterfly Apatura metis (Lepidoptera: Nymphalidae).
Zhang, Min; Nie, Xinping; Cao, Tianwen; Wang, Juping; Li, Tao; Zhang, Xiaonan; Guo, Yaping; Ma, Enbo; Zhong, Yang
2012-06-01
As an important pest in the Slender Leaved Willow (Salix alba), Apatura metis is called Freyer's purple emperor, and its mitochondrial genome is 15,236 bp long. The encoded genes for 22 tRNA genes, two ribosomal RNA (rrnL and rrnS) genes, and 13 protein-coding genes (PCGs), and a control region in the A. metis mitochondria are highly homologous to other lepidopteran species. The mitochondrial genome of A. metis is biased toward a high A + T content (A + T = 80.5%). All protein-coding genes, except for COI begins with the CGA codon as observed in other lepidopterans, start with a typical ATN initiation codon. All tRNAs show the classic clover-leaf structure, except that the dihydrouridine (DHU) arm of tRNA(Ser(AGN)) forms a simple loop. The A. metis A + T-rich region contains some conserved structures including a structure combining the motif 'ATAGA' and 19 bp poly (T) stretch, which is similar to those found in other lepidopteran mitogenomes. The phylogenetic analyses of lepidopterans based on mitogenomes sequences demonstrate that each of the six superfamilies is monophyletic, and the relationship among them is (((Noctuoidea + (Geometroidea + Bombycoidea)) + Pyraloidea) + Papilionoidea) + Tortricoidea. In Papilionoidea group, our conclusion argues that ((Lycaenidae + Pieridae) + Nymphalidae) + Papilionidae.
Protein and gene structure of a blue laccase from Pleurotus ostreatus1.
Giardina, P; Palmieri, G; Scaloni, A; Fontanella, B; Faraco, V; Cennamo, G; Sannia, G
1999-01-01
A new laccase isoenzyme (POXA1b, where POX is phenol oxidase), produced by Pleurotus ostreatus in cultures supplemented with copper sulphate, has been purified and fully characterized. The main characteristics of this protein (molecular mass in native and denaturing conditions, pI and catalytic properties) are almost identical to the previously studied laccase POXA1w. However, POXA1b contains four copper atoms per molecule instead of one copper, two zinc and one iron atom per molecule of POXA1w. Furthermore, POXA1b shows an unusually high stability at alkaline pH. The gene and cDNA coding for POXA1b have been cloned and sequenced. The gene coding sequence contains 1599 bp, interrupted by 15 introns. Comparison of the structure of the poxa1b gene with the two previously studied P. ostreatus laccase genes (pox1 and poxc) suggests that these genes belong to two different subfamilies. The amino acid sequence of POXA1b deduced from the cDNA sequence has been almost completely verified by means of matrix-assisted laser desorption ionization MS. It has been demonstrated that three out of six putative glycosylation sites are post-translationally modified and the structure of the bound glycosidic moieties has been determined, whereas two other putative glycosylation sites are unmodified. PMID:10417329
Carraro, Nicola; Tisdale-Orr, Tracy Eizabeth; Clouse, Ronald Matthew; Knöller, Anne Sophie; Spicer, Rachel
2012-01-01
Intercellular transport of the plant hormone auxin is mediated by three families of membrane-bound protein carriers, with the PIN and ABCB families coding primarily for efflux proteins and the AUX/LAX family coding for influx proteins. In the last decade our understanding of gene and protein function for these transporters in Arabidopsis has expanded rapidly but very little is known about their role in woody plant development. Here we present a comprehensive account of all three families in the model woody species Populus, including chromosome distribution, protein structure, quantitative gene expression, and evolutionary relationships. The PIN and AUX/LAX gene families in Populus comprise 16 and 8 members respectively and show evidence for the retention of paralogs following a relatively recent whole genome duplication. There is also differential expression across tissues within many gene pairs. The ABCB family is previously undescribed in Populus and includes 20 members, showing a much deeper evolutionary history, including both tandem and whole genome duplication as well as probable gene loss. A striking number of these transporters are expressed in developing Populus stems and we suggest that evolutionary and structural relationships with known auxin transporters in Arabidopsis can point toward candidate genes for further study in Populus. This is especially important for the ABCBs, which is a large family and includes members in Arabidopsis that are able to transport other substrates in addition to auxin. Protein modeling, sequence alignment and expression data all point to ABCB1.1 as a likely auxin transport protein in Populus. Given that basipetal auxin flow through the cambial zone shapes the development of woody stems, it is important that we identify the full complement of genes involved in this process. This work should lay the foundation for studies targeting specific proteins for functional characterization and in situ localization. PMID:22645571
Redwan, R M; Saidin, A; Kumar, S V
2015-08-12
Pineapple (Ananas comosus var. comosus) is known as the king of fruits for its crown and is the third most important tropical fruit after banana and citrus. The plant, which is indigenous to South America, is the most important species in the Bromeliaceae family and is largely traded for fresh fruit consumption. Here, we report the complete chloroplast sequence of the MD-2 pineapple that was sequenced using the PacBio sequencing technology. In this study, the high error rate of PacBio long sequence reads of A. comosus's total genomic DNA were improved by leveraging on the high accuracy but short Illumina reads for error-correction via the latest error correction module from Novocraft. Error corrected long PacBio reads were assembled by using a single tool to produce a contig representing the pineapple chloroplast genome. The genome of 159,636 bp in length is featured with the conserved quadripartite structure of chloroplast containing a large single copy region (LSC) with a size of 87,482 bp, a small single copy region (SSC) with a size of 18,622 bp and two inverted repeat regions (IRA and IRB) each with the size of 26,766 bp. Overall, the genome contained 117 unique coding regions and 30 were repeated in the IR region with its genes contents, structure and arrangement similar to its sister taxon, Typha latifolia. A total of 35 repeats structure were detected in both the coding and non-coding regions with a majority being tandem repeats. In addition, 205 SSRs were detected in the genome with six protein-coding genes contained more than two SSRs. Comparative chloroplast genomes from the subclass Commelinidae revealed a conservative protein coding gene albeit located in a highly divergence region. Analysis of selection pressure on protein-coding genes using Ka/Ks ratio showed significant positive selection exerted on the rps7 gene of the pineapple chloroplast with P less than 0.05. Phylogenetic analysis confirmed the recent taxonomical relation among the member of commelinids which support the monophyly relationship between Arecales and Dasypogonaceae and between Zingiberales to the Poales, which includes the A. comosus. The complete sequence of the chloroplast of pineapple provides insights to the divergence of genic chloroplast sequences from the members of the subclass Commelinidae. The complete pineapple chloroplast will serve as a reference for in-depth taxonomical studies in the Bromeliaceae family when more species under the family are sequenced in the future. The genetic sequence information will also make feasible other molecular applications of the pineapple chloroplast for plant genetic improvement.
Residues with similar hexagon neighborhoods share similar side-chain conformations.
Li, Shuai Cheng; Bu, Dongbo; Li, Ming
2012-01-01
We present in this study a new approach to code protein side-chain conformations into hexagon substructures. Classical side-chain packing methods consist of two steps: first, side-chain conformations, known as rotamers, are extracted from known protein structures as candidates for each residue; second, a searching method along with an energy function is used to resolve conflicts among residues and to optimize the combinations of side chain conformations for all residues. These methods benefit from the fact that the number of possible side-chain conformations is limited, and the rotamer candidates are readily extracted; however, these methods also suffer from the inaccuracy of energy functions. Inspired by threading and Ab Initio approaches to protein structure prediction, we propose to use hexagon substructures to implicitly capture subtle issues of energy functions. Our initial results indicate that even without guidance from an energy function, hexagon structures alone can capture side-chain conformations at an accuracy of 83.8 percent, higher than 82.6 percent by the state-of-art side-chain packing methods.
Highlights on distinctive structural and functional properties of HTLV Tax proteins
Romanelli, Maria Grazia; Diani, Erica; Bergamo, Elisa; Casoli, Claudio; Ciminale, Vincenzo; Bex, Françoise; Bertazzoni, Umberto
2013-01-01
Human T cell leukemia viruses (HTLVs) are complex human retroviruses of the Deltaretrovirus genus. Four types have been identified thus far, with HTLV-1 and HTLV-2 much more prevalent than HTLV-3 or HTLV-4. HTLV-1 and HTLV-2 possess strictly related genomic structures, but differ significantly in pathogenicity, as HTLV-1 is the causative agent of adult T cell leukemia and of HTLV-associated myelopathy/tropical spastic paraparesis, whereas HTLV-2 is not associated with neoplasia. HTLVs code for a protein named Tax that is responsible for enhancing viral expression and drives cell transformation. Much effort has been invested to dissect the impact of Tax on signal transduction pathways and to identify functional differences between the HTLV Tax proteins that may explain the distinct oncogenic potential of HTLV-1 and HTLV-2. This review summarizes our current knowledge of Tax-1 and Tax-2 with emphasis on their structure, role in activation of the NF-κB (nuclear factor kappa-B) pathway, and interactions with host factors. PMID:24058363
Milanesi, Luciano; Petrillo, Mauro; Sepe, Leandra; Boccia, Angelo; D'Agostino, Nunzio; Passamano, Myriam; Di Nardo, Salvatore; Tasco, Gianluca; Casadio, Rita; Paolella, Giovanni
2005-01-01
Background Protein kinases are a well defined family of proteins, characterized by the presence of a common kinase catalytic domain and playing a significant role in many important cellular processes, such as proliferation, maintenance of cell shape, apoptosys. In many members of the family, additional non-kinase domains contribute further specialization, resulting in subcellular localization, protein binding and regulation of activity, among others. About 500 genes encode members of the kinase family in the human genome, and although many of them represent well known genes, a larger number of genes code for proteins of more recent identification, or for unknown proteins identified as kinase only after computational studies. Results A systematic in silico study performed on the human genome, led to the identification of 5 genes, on chromosome 1, 11, 13, 15 and 16 respectively, and 1 pseudogene on chromosome X; some of these genes are reported as kinases from NCBI but are absent in other databases, such as KinBase. Comparative analysis of 483 gene regions and subsequent computational analysis, aimed at identifying unannotated exons, indicates that a large number of kinase may code for alternately spliced forms or be incorrectly annotated. An InterProScan automated analysis was perfomed to study domain distribution and combination in the various families. At the same time, other structural features were also added to the annotation process, including the putative presence of transmembrane alpha helices, and the cystein propensity to participate into a disulfide bridge. Conclusion The predicted human kinome was extended by identifiying both additional genes and potential splice variants, resulting in a varied panorama where functionality may be searched at the gene and protein level. Structural analysis of kinase proteins domains as defined in multiple sources together with transmembrane alpha helices and signal peptide prediction provides hints to function assignment. The results of the human kinome analysis are collected in the KinWeb database, available for browsing and searching over the internet, where all results from the comparative analysis and the gene structure annotation are made available, alongside the domain information. Kinases may be searched by domain combinations and the relative genes may be viewed in a graphic browser at various level of magnification up to gene organization on the full chromosome set. PMID:16351747
Proteomic analysis in giant axonal neuropathy: new insights into disease mechanisms.
Mussche, Silke; De Paepe, Boel; Smet, Joél; Devreese, Katrien; Lissens, Willy; Rasic, Vedrana Milic; Murnane, Matthew; Devreese, Bart; Van Coster, Rudy
2012-08-01
Giant axonal neuropathy (GAN) is a progressive hereditary disease that affects the peripheral and central nervous systems. It is characterized morphologically by aggregates of intermediate filaments in different tissues. Mutations have been reported in the gene that codes for gigaxonin. Nevertheless, the underlying molecular mechanism remains obscure. Cell lines from 4 GAN patients and 4 controls were analyzed by iTRAQ. Among the dysregulated proteins were ribosomal protein L29, ribosomal protein L37, galectin-1, glia-derived nexin, and aminopeptidase N. Also, nuclear proteins linked to formin-binding proteins were found to be dysregulated. Although the major role of gigaxonin is reported to be degradation of cytoskeleton-associated proteins, the amount of 76 structural cytoskeletal proteins was unaltered. Several of the dysregulated proteins play a role in cytoskeletal reorganization. Based on these findings, we speculate that disturbed cytoskeletal regulation is responsible for formation of aggregates of intermediate filaments. Copyright © 2012 Wiley Periodicals, Inc.
Feinauer, Christoph; Procaccini, Andrea; Zecchina, Riccardo; Weigt, Martin; Pagnani, Andrea
2014-01-01
In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code. PMID:24663061
Analysis of informational redundancy in the protein-assembling machinery
NASA Astrophysics Data System (ADS)
Berkovich, Simon
2004-03-01
Entropy analysis of the DNA structure does not reveal a significant departure from randomness indicating lack of informational redundancy. This signifies the absence of a hidden meaning in the genome text and supports the 'barcode' interpretation of DNA given in [1]. Lack of informational redundancy is a characteristic property of an identification label rather than of a message of instructions. Yet randomness of DNA has to induce non-random structures of the proteins. Protein synthesis is a two-step process: transcription into RNA with gene splicing and formation a structure of amino acids. Entropy estimations, performed by A. Djebbari, show typical values of redundancy of the biomolecules along these pathways: DNA gene 4proteins 15-40in gene expression, the RNA copy carries the same information as the original DNA template. Randomness is essentially eliminated only at the step of the protein creation by a degenerate code. According to [1], the significance of the substitution of U for T with a subsequent gene splicing is that these transformations result in a different pattern of RNA oscillations, so the vital DNA communications are protected against extraneous noise coming from the protein making activities. 1. S. Berkovich, "On the 'barcode' functionality of DNA, or the Phenomenon of Life in the Physical Universe", Dorrance Publishing Co., Pittsburgh, 2003
NCAD, a database integrating the intrinsic conformational preferences of non-coded amino acids
Revilla-López, Guillem; Torras, Juan; Curcó, David; Casanovas, Jordi; Calaza, M. Isabel; Zanuy, David; Jiménez, Ana I.; Cativiela, Carlos; Nussinov, Ruth; Grodzinski, Piotr; Alemán, Carlos
2010-01-01
Peptides and proteins find an ever-increasing number of applications in the biomedical and materials engineering fields. The use of non-proteinogenic amino acids endowed with diverse physicochemical and structural features opens the possibility to design proteins and peptides with novel properties and functions. Moreover, non-proteinogenic residues are particularly useful to control the three-dimensional arrangement of peptidic chains, which is a crucial issue for most applications. However, information regarding such amino acids –also called non-coded, non-canonical or non-standard– is usually scattered among publications specialized in quite diverse fields as well as in patents. Making all these data useful to the scientific community requires new tools and a framework for their assembly and coherent organization. We have successfully compiled, organized and built a database (NCAD, Non-Coded Amino acids Database) containing information about the intrinsic conformational preferences of non-proteinogenic residues determined by quantum mechanical calculations, as well as bibliographic information about their synthesis, physical and spectroscopic characterization, conformational propensities established experimentally, and applications. The architecture of the database is presented in this work together with the first family of non-coded residues included, namely, α-tetrasubstituted α-amino acids. Furthermore, the NCAD usefulness is demonstrated through a test-case application example. PMID:20455555
Structure of Thermotoga maritima Stationary Phase Survival Protein SurE: A Novel Acid Phosphatase
Zhang, R.-G.; Skarina, T.; Katz, J.E.; Beasley, S.; Khachatryan, A.; Vyas, S.; Arrowsmith, C.H.; Clarke, S.; Edwards, A.; Joachimiak, A.; Savchenko, A.
2009-01-01
Summary Background The rpoS, nlpD, pcm, and surE genes are among many whose expression is induced during the stationary phase of bacterial growth. rpoS codes for the stationary-phase RNA polymerase σ subunit, and nlpD codes for a lipoprotein. The pcm gene product repairs damaged proteins by converting the atypical isoaspartyl residues back to L-aspartyls. The physiological and biochemical functions of surE are unknown, but its importance in stress is supported by the duplication of the surE gene in E. coli subjected to high-temperature growth. The pcm and surE genes are highly conserved in bacteria, archaea, and plants. Results The structure of SurE from Thermotoga maritima was determined at 2.0 Å. The SurE monomer is composed of two domains; a conserved N-terminal domain, a Rossman fold, and a C-terminal oligomerization domain, a new fold. Monomers form a dimer that assembles into a tetramer. Biochemical analysis suggests that SurE is an acid phosphatase, with an optimum pH of 5.5–6.2. The active site was identified in the N-terminal domain through analysis of conserved residues. Structure-based site-directed point mutations abolished phosphatase activity. T. maritima SurE intra- and inter-subunit salt bridges were identified that may explain the SurE thermostability. Conclusions The structure of SurE provided information about the protein’s fold, oligomeric state, and active site. The protein possessed magnesium-dependent acid phosphatase activity, but the physiologically relevant substrate(s) remains to be identified. The importance of three of the assigned active site residues in catalysis was confirmed by site-directed mutagenesis. PMID:11709173
A combinatorial code for pattern formation in Drosophila oogenesis.
Yakoby, Nir; Bristow, Christopher A; Gong, Danielle; Schafer, Xenia; Lembong, Jessica; Zartman, Jeremiah J; Halfon, Marc S; Schüpbach, Trudi; Shvartsman, Stanislav Y
2008-11-01
Two-dimensional patterning of the follicular epithelium in Drosophila oogenesis is required for the formation of three-dimensional eggshell structures. Our analysis of a large number of published gene expression patterns in the follicle cells suggests that they follow a simple combinatorial code based on six spatial building blocks and the operations of union, difference, intersection, and addition. The building blocks are related to the distribution of inductive signals, provided by the highly conserved epidermal growth factor receptor and bone morphogenetic protein signaling pathways. We demonstrate the validity of the code by testing it against a set of patterns obtained in a large-scale transcriptional profiling experiment. Using the proposed code, we distinguish 36 distinct patterns for 81 genes expressed in the follicular epithelium and characterize their joint dynamics over four stages of oogenesis. The proposed combinatorial framework allows systematic analysis of the diversity and dynamics of two-dimensional transcriptional patterns and guides future studies of gene regulation.
Decoding the genome beyond sequencing: the new phase of genomic research.
Heng, Henry H Q; Liu, Guo; Stevens, Joshua B; Bremer, Steven W; Ye, Karen J; Abdallah, Batoul Y; Horne, Steven D; Ye, Christine J
2011-10-01
While our understanding of gene-based biology has greatly improved, it is clear that the function of the genome and most diseases cannot be fully explained by genes and other regulatory elements. Genes and the genome represent distinct levels of genetic organization with their own coding systems; Genes code parts like protein and RNA, but the genome codes the structure of genetic networks, which are defined by the whole set of genes, chromosomes and their topological interactions within a cell. Accordingly, the genetic code of DNA offers limited understanding of genome functions. In this perspective, we introduce the genome theory which calls for the departure of gene-centric genomic research. To make this transition for the next phase of genomic research, it is essential to acknowledge the importance of new genome-based biological concepts and to establish new technology platforms to decode the genome beyond sequencing. Copyright © 2011 Elsevier Inc. All rights reserved.
Cloud4Psi: cloud computing for 3D protein structure similarity searching.
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Kłapciński, Artur
2014-10-01
Popular methods for 3D protein structure similarity searching, especially those that generate high-quality alignments such as Combinatorial Extension (CE) and Flexible structure Alignment by Chaining Aligned fragment pairs allowing Twists (FATCAT) are still time consuming. As a consequence, performing similarity searching against large repositories of structural data requires increased computational resources that are not always available. Cloud computing provides huge amounts of computational power that can be provisioned on a pay-as-you-go basis. We have developed the cloud-based system that allows scaling of the similarity searching process vertically and horizontally. Cloud4Psi (Cloud for Protein Similarity) was tested in the Microsoft Azure cloud environment and provided good, almost linearly proportional acceleration when scaled out onto many computational units. Cloud4Psi is available as Software as a Service for testing purposes at: http://cloud4psi.cloudapp.net/. For source code and software availability, please visit the Cloud4Psi project home page at http://zti.polsl.pl/dmrozek/science/cloud4psi.htm. © The Author 2014. Published by Oxford University Press.
Cloud4Psi: cloud computing for 3D protein structure similarity searching
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Kłapciński, Artur
2014-01-01
Summary: Popular methods for 3D protein structure similarity searching, especially those that generate high-quality alignments such as Combinatorial Extension (CE) and Flexible structure Alignment by Chaining Aligned fragment pairs allowing Twists (FATCAT) are still time consuming. As a consequence, performing similarity searching against large repositories of structural data requires increased computational resources that are not always available. Cloud computing provides huge amounts of computational power that can be provisioned on a pay-as-you-go basis. We have developed the cloud-based system that allows scaling of the similarity searching process vertically and horizontally. Cloud4Psi (Cloud for Protein Similarity) was tested in the Microsoft Azure cloud environment and provided good, almost linearly proportional acceleration when scaled out onto many computational units. Availability and implementation: Cloud4Psi is available as Software as a Service for testing purposes at: http://cloud4psi.cloudapp.net/. For source code and software availability, please visit the Cloud4Psi project home page at http://zti.polsl.pl/dmrozek/science/cloud4psi.htm. Contact: dariusz.mrozek@polsl.pl PMID:24930141
NASA Astrophysics Data System (ADS)
Anderson, Lissa C.; Håkansson, Maria; Walse, Björn; Nilsson, Carol L.
2017-09-01
Structural technologies are an essential component in the design of precision therapeutics. Precision medicine entails the development of therapeutics directed toward a designated target protein, with the goal to deliver the right drug to the right patient at the right time. In the field of oncology, protein structural variants are often associated with oncogenic potential. In a previous proteogenomic screen of patient-derived glioblastoma (GBM) tumor materials, we identified a sequence variant of human mitochondrial branched-chain amino acid aminotransferase 2 as a putative factor of resistance of GBM to standard-of-care-treatments. The enzyme generates glutamate, which is neurotoxic. To elucidate structural coordinates that may confer altered substrate binding or activity of the variant BCAT2 T186R, a 45 kDa protein, we applied combined ETD and CID top-down mass spectrometry in a LC-FT-ICR MS at 21 T, and X-Ray crystallography in the study of both the variant and non-variant intact proteins. The combined ETD/CID fragmentation pattern allowed for not only extensive sequence coverage but also confident localization of the amino acid variant to its position in the sequence. The crystallographic experiments confirmed the hypothesis generated by in silico structural homology modeling, that the Lys59 side-chain of BCAT2 may repulse the Arg186 in the variant protein (PDB code: 5MPR), leading to destabilization of the protein dimer and altered enzyme kinetics. Taken together, the MS and novel 3D structural data give us reason to further pursue BCAT2 T186R as a precision drug target in GBM. [Figure not available: see fulltext.
Walia, Rasna R; Caragea, Cornelia; Lewis, Benjamin A; Towfic, Fadi; Terribilini, Michael; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant
2012-05-10
RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naïve Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.
Dasgupta, R; Kaesberg, P
1982-01-01
The nucleotide sequences of the subgenomic coat protein messengers (RNA4's) of two related bromoviruses, brome mosaic virus (BMV) and cowpea chlorotic mottle virus (CCMV), have been determined by direct RNA and CDNA sequencing without cloning. BMV RNA4 is 876 b long including a 5' noncoding region of nine nucleotides and a 3' noncoding region of 300 nucleotides. CCMV RNA 4 is 824 b long, including a 5' noncoding region of 10 nucleotides and a 3' noncoding region of 244 nucleotides. The encoded coat proteins are similar in length (188 amino acids for BMV and 189 amino acids for CCMV) and display about 70% homology in their amino acid sequences. Length difference between the two RNAs is due mostly to a single deletion, in CCMV with respect to BMV, of about 57 b immediately following the coding region. Allowing for this deletion the RNAs are indicate that mutations leading to divergence were constrained in the coding region primarily by the requirement of maintaining a favorable coat protein structure and in the 3' noncoding region primarily by the requirement of maintaining a favorable RNA spatial configuration. PMID:6895941
Okura, Hiromichi; Mihara, Hisakazu; Takahashi, Tsuyoshi
2013-10-01
The molecular recognition ability of proteins is essential in biological systems, and therefore a considerable amount of effort has been devoted to constructing desired target-binding proteins using a variety of naturally occurring proteins as scaffolds. However, since generating a binding site in a native protein can often affect its structural properties, highly stable de novo protein scaffolds may be more amenable than the native proteins. We previously reported the generation of de novo proteins comprising three α-helices and three β-strands (α3β3) from a genetic library coding simplified amino acid sets. Two α3β3 de novo proteins, vTAJ13 and vTAJ36, fold into a native-like stable and molten globule-like structures, respectively, even though the proteins have similar amino acid compositions. Here, we attempted to create binding sites for the vTAJ13 and vTAJ36 proteins to prove the utility of de novo designed artificial proteins as a molecular recognition tool. Randomization of six amino acids at two linker sites of vTAJ13 and vTAJ36 followed by biopanning generated binding proteins that recognize the target molecules, fluorescein and green fluorescent protein, with affinities of 10(-7)-10(-8) M. Of note, the selected proteins from the vTAJ13-based library tended to recognize the target molecules with high specificity, probably due to the native-like stable structure of vTAJ13. Our studies provide an example of the potential of de novo protein scaffolds, which are composed of a simplified amino acid set, to recognize a variety of target compounds.
GPU.proton.DOCK: Genuine Protein Ultrafast proton equilibria consistent DOCKing.
Kantardjiev, Alexander A
2011-07-01
GPU.proton.DOCK (Genuine Protein Ultrafast proton equilibria consistent DOCKing) is a state of the art service for in silico prediction of protein-protein interactions via rigorous and ultrafast docking code. It is unique in providing stringent account of electrostatic interactions self-consistency and proton equilibria mutual effects of docking partners. GPU.proton.DOCK is the first server offering such a crucial supplement to protein docking algorithms--a step toward more reliable and high accuracy docking results. The code (especially the Fast Fourier Transform bottleneck and electrostatic fields computation) is parallelized to run on a GPU supercomputer. The high performance will be of use for large-scale structural bioinformatics and systems biology projects, thus bridging physics of the interactions with analysis of molecular networks. We propose workflows for exploring in silico charge mutagenesis effects. Special emphasis is given to the interface-intuitive and user-friendly. The input is comprised of the atomic coordinate files in PDB format. The advanced user is provided with a special input section for addition of non-polypeptide charges, extra ionogenic groups with intrinsic pK(a) values or fixed ions. The output is comprised of docked complexes in PDB format as well as interactive visualization in a molecular viewer. GPU.proton.DOCK server can be accessed at http://gpudock.orgchm.bas.bg/.
Rampino, Patrizia; De Pascali, Mariarosaria; De Caroli, Monica; Luvisi, Andrea; De Bellis, Luigi; Piro, Gabriella; Perrotta, Carla
2017-11-01
Wheat, the main food source for a third of world population, appears strongly under threat because of predicted increasing temperatures coupled to drought. Plant complex molecular response to drought stress relies on the gene network controlling cell reactions to abiotic stress. In the natural environment, plants are subjected to the combination of abiotic and biotic stresses. Also the response of plants to biotic stress, to cope with pathogens, involves the activation of a molecular network. Investigations on combination of abiotic and biotic stresses indicate the existence of cross-talk between the two networks and a kind of overlapping can be hypothesized. In this work we describe the isolation and characterization of a drought-related durum wheat (Triticum durum Desf.) gene, identified in a previous study, coding for a protein combining features of NBS-LRR type resistance protein with a S/TPK domain, involved in drought stress response. This is one of the few examples reported where all three domains are present in a single protein and, to our knowledge, it is the first report on a gene specifically induced by drought stress and drought-related conditions, with this particular structure. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prochnik, Simon E.; Umen, James; Nedelcu, Aurora
2010-07-01
Analysis of the Volvox carteri genome reveals that this green alga's increased organismal complexity and multicellularity are associated with modifications in protein families shared with its unicellular ancestor, and not with large-scale innovations in protein coding capacity. The multicellular green alga Volvox carteri and its morphologically diverse close relatives (the volvocine algae) are uniquely suited for investigating the evolution of multicellularity and development. We sequenced the 138 Mb genome of V. carteri and compared its {approx}14,500 predicted proteins to those of its unicellular relative, Chlamydomonas reinhardtii. Despite fundamental differences in organismal complexity and life history, the two species have similarmore » protein-coding potentials, and few species-specific protein-coding gene predictions. Interestingly, volvocine algal-specific proteins are enriched in Volvox, including those associated with an expanded and highly compartmentalized extracellular matrix. Our analysis shows that increases in organismal complexity can be associated with modifications of lineage-specific proteins rather than large-scale invention of protein-coding capacity.« less
Occurrence, Functions and Biological Significance of Arginine-Rich Proteins.
Chandana, Thimmegowda; Venkatesh, Yeldur P
2016-01-01
Arginine, the most basic among the 20 amino acids, occurs less frequently than lysine in proteins despite being coded by six codons. Only a few important proteins of biological significance have been found to be abundant in arginine. It has been established that these arginine-rich proteins have been assigned important roles in the biological systems. Arginine-rich cationic proteins are known to stabilize macromolecular structures by establishing appropriate interactions (salt bridges, hydrogen bonds and cation-π interactions). These proteins are also known to be the key members of many regulatory pathways such as gene expression, chromatin stability, expurgation of introns from naïve mRNA, mRNA splicing, membrane-penetrating activity and pathogenesis-related defense, to name a few. Further, arginine occurs in various combinations with other amino acids (serine, lysine, proline, tryptophan, valine, glycine and glutamic acid) which diversify the potential functions of arginine-rich proteins. Arginine-rich proteins known till date from dietary sources have been described in terms of their structure and functional properties. A variety of activities such as bactericidal, membrane-penetrating, antimicrobial, anti-hypertensive, pro-angiogenic and others have been reported for arginine-rich proteins. This review attempts to collate the occurrence, functions and the biological significance of this unique class of proteins rich in arginine.
Li, Jian-Long; Liu, Min; Hu, Xue-Yi
2016-01-01
The complete mitochondrial (mt) genome of the saddleback clownfish Amphiprion polymnus was obtained in this study. The circular mtDNA molecule was 16,804 bp in size and the overall nucleotide composition of the H-strand was 29.59% A, 25.93% T, 15.44% G and 29.04% C, with an A + T bias. The complete mitogenome encoded 13 protein-coding genes, 2 rRNAs, 22 tRNAs and 1 control region (D-loop), with the gene arrangement and translation direction basically identical to other typical vertebrate mitogenomes. We found A. polymnus (KJ101554) and A. bicinctus (JQ030887) had the same length in the protein-coding gene ND5 with 1869 bp, while the ND5 in A. ocellaris (AP006017) was 3 bp less than that of A. polymnus and A. bicinctus. Both structures of ND5, however, could translate to amino acid successfully.
Genomic structure of two ras family genes in the slime mold Physarum polycephalum.
Trzcińska-Danielewicz, Joanna; Kozlowski, Piotr; Gierdal, Katarzyna; Wiejak, Jolanta; Jagielski, Adam; Toczko, Kazimierz; Fronk, Jan
2002-08-01
Genomic structure of two Physarum polycephalum ras family genes, Ppras2 and Pprap1, has been determined, including the upstream region of the latter. The genes are interrupted by three and four introns, respectively. The first intron of Ppras2 has the same location within the coding sequence as the first intron in another ras homolog from this organism, Ppras1 [Trzcińska-Danielewicz, J., Kozlowski, P., and Toczko, K. (1996). "Cloning and genomic sequence of the Physarum polycephalum Ppras1 gene, a homologue of the ras protooncogene", Gene 169, pp. 143-144]. All introns, ranging from 53 to ca. 460 base pairs, have the canonical 5' and 3' ends, are greatly enriched in pyrimidines in the coding strand and have frequent pyrimidines-only tracts. These latter features seem to be responsible for the difficulties in cloning and sequencing of parts of these genes. Short sequences shared with P. polycephalum transposon-like repeats are common in the introns, indicating a possible role of transposition in intron evolution. In all three ras family genes phase zero introns are located mostly between sequences coding for regular protein secondary structure elements.
Small Molecule Ligands of Methyl-Lysine Binding Proteins
Herold, J. Martin; Wigle, Tim J.; Norris, Jacqueline L.; Lam, Robert; Korboukh, Victoria K.; Gao, Cen; Ingerman, Lindsey A.; Kireev, Dmitri B.; Senisterra, Guillermo; Vedadi, Masoud; Tripathy, Ashutosh; Brown, Peter J.; Arrowsmith, Cheryl H.; Jin, Jian; Janzen, William P.; Frye, Stephen V.
2011-01-01
Proteins which bind methylated lysines (“readers” of the histone code) are important components in the epigenetic regulation of gene expression and can also modulate other proteins that contain methyl-lysine such as p53 and Rb. Recognition of methyl-lysine marks by MBT domains leads to compaction of chromatin and a repressed transcriptional state. Antagonists of MBT domains would serve as probes to interrogate the functional role of these proteins and initiate the chemical biology of methyl-lysine readers as a target class. Small molecule MBT antagonists were designed based on the structure of histone peptide-MBT complexes and their interaction with MBT domains determined using a chemiluminescent assay and ITC. The ligands discovered antagonize native histone peptide binding, exhibiting 5-fold stronger binding affinity to L3MBTL1 than its preferred histone peptide. The first co-crystal structure of a small molecule bound to L3MBTL1 was determined and provides new insights into binding requirements for further ligand design. PMID:21417280
Development of Lead Compounds as Fusion Inhibitors for Dengue Virus
2009-08-01
19a. NAME OF RESPONSIBLE PERSON USAMRMC a. REPORT U b . ABSTRACT U c. THIS PAGE U UU 61 19b. TELEPHONE NUMBER (include area code...and III (blue). B ) Structural alignment of E2 protein monomer in the absence and presence of βOG (pdbIDs 1OAN and 1OKE respectively), with the kl-β...hairpin loop colored as follows: prefusion state (yellow), intermediate βOG-E2 complex (blue), secondary structure colored by B -factor from blue
Leherte, Laurence; Vercauteren, Daniel P
2017-10-26
We investigate the influence of various solvent models on the structural stability and protein-water interface of three ubiquitin complexes (PDB access codes: 1Q0W , 2MBB , 2G3Q ) modeled using the Amber99sb force field (FF) and two different point charge distributions. A previously developed reduced point charge model (RPCM), wherein each amino acid residue is described by a limited number of point charges, is tested and compared to its all-atom (AA) version. The complexes are solvated in TIP4P-Ew or TIP3P type water molecules, involving either the scaling of the Lennard-Jones protein-O water interaction parameters, or the coarse-grain (CG) SIRAH water description. The best agreements between the RPCM and AA models were obtained for structural, protein-water, and ligand-ubiquitin properties when using the TIP4P-Ew water FF with a scaling factor γ of 0.7. At the RPCM level, a decrease in γ, or the inclusion of SIRAH particles, allows weakening of the protein-water interactions. It results in a slight collapse of the protein structure and a less compact hydration shell and, thus, in a decrease in the number of protein-water and water-water H-bonds. The dynamics of the surface protein atoms and of the water shell molecules are also slightly refrained, which allow the generation of stable RPCM trajectories.
Death of a dogma: eukaryotic mRNAs can code for more than one protein
Mouilleron, Hélène; Delcourt, Vivian; Roucou, Xavier
2016-01-01
mRNAs carry the genetic information that is translated by ribosomes. The traditional view of a mature eukaryotic mRNA is a molecule with three main regions, the 5′ UTR, the protein coding open reading frame (ORF) or coding sequence (CDS), and the 3′ UTR. This concept assumes that ribosomes translate one ORF only, generally the longest one, and produce one protein. As a result, in the early days of genomics and bioinformatics, one CDS was associated with each protein-coding gene. This fundamental concept of a single CDS is being challenged by increasing experimental evidence indicating that annotated proteins are not the only proteins translated from mRNAs. In particular, mass spectrometry (MS)-based proteomics and ribosome profiling have detected productive translation of alternative open reading frames. In several cases, the alternative and annotated proteins interact. Thus, the expression of two or more proteins translated from the same mRNA may offer a mechanism to ensure the co-expression of proteins which have functional interactions. Translational mechanisms already described in eukaryotic cells indicate that the cellular machinery is able to translate different CDSs from a single viral or cellular mRNA. In addition to summarizing data showing that the protein coding potential of eukaryotic mRNAs has been underestimated, this review aims to challenge the single translated CDS dogma. PMID:26578573
Arieti, Fabiana; Gabus, Caroline; Tambalo, Margherita; Huet, Tiphaine; Round, Adam; Thore, Stéphane
2014-01-01
The Split Ends (SPEN) protein was originally discovered in Drosophila in the late 1990s. Since then, homologous proteins have been identified in eukaryotic species ranging from plants to humans. Every family member contains three predicted RNA recognition motifs (RRMs) in the N-terminal region of the protein. We have determined the crystal structure of the region of the human SPEN homolog that contains these RRMs—the SMRT/HDAC1 Associated Repressor Protein (SHARP), at 2.0 Å resolution. SHARP is a co-regulator of the nuclear receptors. We demonstrate that two of the three RRMs, namely RRM3 and RRM4, interact via a highly conserved interface. Furthermore, we show that the RRM3–RRM4 block is the main platform mediating the stable association with the H12–H13 substructure found in the steroid receptor RNA activator (SRA), a long, non-coding RNA previously shown to play a crucial role in nuclear receptor transcriptional regulation. We determine that SHARP association with SRA relies on both single- and double-stranded RNA sequences. The crystal structure of the SHARP–RRM fragment, together with the associated RNA-binding studies, extend the repertoire of nucleic acid binding properties of RRM domains suggesting a new hypothesis for a better understanding of SPEN protein functions. PMID:24748666
Li, S.-F.; Xu, J.-W.; Yang, Q.-L.; Wang, C.H.; Chen, Q.; Chapman, D.C.; Lu, G.
2009-01-01
Based upon morphological characters, Silver carp Hypophthalmichthys molitrix and bighead carp Hypophthalmichthys nobilis (or Aristichthys nobilis) have been classified into either the same genus or two distinct genera. Consequently, the taxonomic relationship of the two species at the generic level remains equivocal. This issue is addressed by sequencing complete mitochondrial genomes of H. molitrix and H. nobilis, comparing their mitogenome organization, structure and sequence similarity, and conducting a comprehensive phylogenetic analysis of cyprinid species. As with other cyprinid fishes, the mitogenomes of the two species were structurally conserved, containing 37 genes including 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA (tRNAs) genes and a putative control region (D-loop). Sequence similarity between the two mitogenomes varied in different genes or regions, being highest in the tRNA genes (98??8%), lowest in the control region (89??4%) and intermediate in the protein-coding genes (94??2%). Analyses of the sequence comparison and phylogeny using concatenated protein sequences support the view that the two species belong to the genus Hypophthalmichthys. Further studies using nuclear markers and involving more closely related species, and the systematic combination of traditional biology and molecular biology are needed in order to confirm this conclusion. ?? 2009 The Fisheries Society of the British Isles.
Genetic characterisation of the recent foot-and-mouth disease virus subtype A/IRN/2005
Klein, Joern; Hussain, Manzoor; Ahmad, Munir; Normann, Preben; Afzal, Muhammad; Alexandersen, Soren
2007-01-01
Background According to the World Reference Laboratory for FMD, a new subtype of FMDV serotype A was detected in Iran in 2005. This subtype was designated A/IRN/2005, and rapidly spread throughout Iran and moved westwards into Saudi Arabia and Turkey where it was initially detected from August 2005 and subsequently caused major disease problems in the spring of 2006. The same subtype reached Jordan in 2007. As part of an ongoing project we have also detected this subtype in Pakistan with the first positive samples detected in April 2006. To characterise this subtype in detail, we have sequenced and analysed the complete coding sequence of three subtype A/IRN/2005 isolates collected in Pakistan in 2006, the complete coding sequence of one subtype A/IRN/2005 isolate collected during the first outbreak in Turkey in 2005 and, in addition, the partial 1D coding sequence derived from 4 epithelium samples and 34 swab-samples from Asian buffaloes or cattle subsequently found to be infected with the A/IRN/2005 subtype. Results The phylogenies of the genome regions encoding for the structural proteins, displayed, with the exception of 1A, distinct, serotype-specific clustering and an evolutionary relationship of the A/IRN/2005 sublineage with the A22 sublineage. Potential recombination events have been detected in parts of the genome region coding for the non-structural proteins of FMDV. In addition, amino acid substitutions have been detected in the deduced VP1 protein sequence, potentially related to clinical or subclinical outcome of FMD. Indications of differential susceptibility for developing a subclinical course of disease between Asian buffaloes and cattle have been detected. Furthermore, hitherto unknown insertions of 2 amino acids before the second start codon, as well as sublineage specific amino acids have been detected in the genome region encoding for the leader proteinase of A/IRN/2005 sublineage. Conclusion Our findings indicate that the A/IRN/2005 sublineage has undergone two different paths of evolution for the structural and non-structural genome regions. The structural genome regions have had their evolutionary starting point in the A22 sublineage. It can be assumed that, due to the quasispecies structure of FMDV populations and the error-prone replication process, advantageous mutations in a changed environment have been fixed and lead to the occurrence of the new A/IRN/2005 sublineage. Together with this mechanism, recombination within the non-structural genome regions, potentially modifying the virulence of the virus, may be involved in the success of this new sublineage. The possible origin of this recombinant virus may be a co-infection with Asia1 and a serotype A precursor of the A/IRN/2005 sublineage potentially within Asian Buffaloes, as these appears to relatively easy become infected, but usually without developing clinical disease and consequently showing not a strong acute inflammatory immune response against a second FMDV infection. PMID:18001482
Saturation of recognition elements blocks evolution of new tRNA identities
Saint-Léger, Adélaïde; Bello, Carla; Dans, Pablo D.; Torres, Adrian Gabriel; Novoa, Eva Maria; Camacho, Noelia; Orozco, Modesto; Kondrashov, Fyodor A.; Ribas de Pouplana, Lluís
2016-01-01
Understanding the principles that led to the current complexity of the genetic code is a central question in evolution. Expansion of the genetic code required the selection of new transfer RNAs (tRNAs) with specific recognition signals that allowed them to be matured, modified, aminoacylated, and processed by the ribosome without compromising the fidelity or efficiency of protein synthesis. We show that saturation of recognition signals blocks the emergence of new tRNA identities and that the rate of nucleotide substitutions in tRNAs is higher in species with fewer tRNA genes. We propose that the growth of the genetic code stalled because a limit was reached in the number of identity elements that can be effectively used in the tRNA structure. PMID:27386510
He, Peng; Huang, Sheng; Xiao, Guanghui; Zhang, Yuzhou; Yu, Jianing
2016-12-01
RNA editing is a posttranscriptional modification process that alters the RNA sequence so that it deviates from the genomic DNA sequence. RNA editing mainly occurs in chloroplasts and mitochondrial genomes, and the number of editing sites varies in terrestrial plants. Why and how RNA editing systems evolved remains a mystery. Ginkgo biloba is one of the oldest seed plants and has an important evolutionary position. Determining the patterns and distribution of RNA editing in the ancient plant provides insights into the evolutionary trend of RNA editing, and helping us to further understand their biological significance. In this paper, we investigated 82 protein-coding genes in the chloroplast genome of G. biloba and identified 255 editing sites, which is the highest number of RNA editing events reported in a gymnosperm. All of the editing sites were C-to-U conversions, which mainly occurred in the second codon position, biased towards to the U_A context, and caused an increase in hydrophobic amino acids. RNA editing could change the secondary structures of 82 proteins, and create or eliminate a transmembrane region in five proteins as determined in silico. Finally, the evolutionary tendencies of RNA editing in different gene groups were estimated using the nonsynonymous-synonymous substitution rate selection mode. The G. biloba chloroplast genome possesses the highest number of RNA editing events reported so far in a seed plant. Most of the RNA editing sites can restore amino acid conservation, increase hydrophobicity, and even influence protein structures. Similar purifying selections constitute the dominant evolutionary force at the editing sites of essential genes, such as the psa, some psb and pet groups, and a positive selection occurred in the editing sites of nonessential genes, such as most ndh and a few psb genes.
CCBuilder 2.0: Powerful and accessible coiled‐coil modeling
Wood, Christopher W.
2017-01-01
Abstract The increased availability of user‐friendly and accessible computational tools for biomolecular modeling would expand the reach and application of biomolecular engineering and design. For protein modeling, one key challenge is to reduce the complexities of 3D protein folds to sets of parametric equations that nonetheless capture the salient features of these structures accurately. At present, this is possible for a subset of proteins, namely, repeat proteins. The α‐helical coiled coil provides one such example, which represents ≈ 3–5% of all known protein‐encoding regions of DNA. Coiled coils are bundles of α helices that can be described by a small set of structural parameters. Here we describe how this parametric description can be implemented in an easy‐to‐use web application, called CCBuilder 2.0, for modeling and optimizing both α‐helical coiled coils and polyproline‐based collagen triple helices. This has many applications from providing models to aid molecular replacement for X‐ray crystallography, in silico model building and engineering of natural and designed protein assemblies, and through to the creation of completely de novo “dark matter” protein structures. CCBuilder 2.0 is available as a web‐based application, the code for which is open‐source and can be downloaded freely. http://coiledcoils.chm.bris.ac.uk/ccbuilder2. Lay Summary We have created CCBuilder 2.0, an easy to use web‐based application that can model structures for a whole class of proteins, the α‐helical coiled coil, which is estimated to account for 3–5% of all proteins in nature. CCBuilder 2.0 will be of use to a large number of protein scientists engaged in fundamental studies, such as protein structure determination, through to more‐applied research including designing and engineering novel proteins that have potential applications in biotechnology. PMID:28836317
NASA Astrophysics Data System (ADS)
Mackiewicz, P.; Gierlik, A.; Kowalczuk, M.; Szczepanik, D.; Dudek, M. R.; Cebrat, S.
1999-12-01
We have analysed protein coding and intergenic sequences in the Borrelia burgdorferi (the Lyme disease bacterium) genome using different kinds of DNA walks. Genes occupying the leading strand of DNA have significantly different nucleotide composition from genes occupying the lagging strand. Nucleotide compositional bias of the two DNA strands reflects the aminoacid composition of proteins. 96% of genes coding for ribosomal proteins lie on the leading DNA strand, which suggests that the positions of these as well as other genes are non-random. In the B. burgdorferi genome, the asymmetry in intergenic DNA sequences is lower than the asymmetry in the third positions in codons. All these characters of the B. burgdorferi genome suggest that both replication-associated mutational pressure and recombination mechanisms have established the specific structure of the genome and now any recombination leading to inversion of a gene in respect to the direction of replication is forbidden. This property of the genome allows us to assume that it is in a steady state, which enables us to fix some parameters for simulations of DNA evolution.
Albertini, A M; Caramori, T; Crabb, W D; Scoffone, F; Galizzi, A
1991-01-01
We cloned and sequenced 8.3 kb of Bacillus subtilis DNA corresponding to the flaA locus involved in flagellar biosynthesis, motility, and chemotaxis. The DNA sequence revealed the presence of 10 complete and 2 incomplete open reading frames. Comparison of the deduced amino acid sequences to data banks showed similarities of nine of the deduced products to a number of proteins of Escherichia coli and Salmonella typhimurium for which a role in flagellar functioning has been directly demonstrated. In particular, the sequence data suggest that the flaA operon codes for the M-ring protein, components of the motor switch, and the distal part of the basal-body rod. The gene order is remarkably similar to that described for region III of the enterobacterial flagellar regulon. One of the open reading frames was translated into a protein with 48% amino acid identity to S. typhimurium FliI and 29% identity to the beta subunit of E. coli ATP synthase. PMID:1828465
Balasco, Nicole; Barone, Daniela; Vitagliano, Luigi
2015-01-01
Recent structural investigations have shown that the C-terminal domain (CTD) of the transcription factor RfaH undergoes unique structural modifications that have a profound impact into its functional properties. These modifications cause a complete change in RfaH(CTD) topology that converts from an α-hairpin to a β-barrel fold. To gain insights into the determinants of this major structural conversion, we here performed computational studies (protein structure prediction and molecular dynamics simulations) on RfaH(CTD). Although these analyses, in line with literature data, suggest that the isolated RfaH(CTD) has a strong preference for the β-barrel fold, they also highlight that a specific region of the protein is endowed with a chameleon conformational behavior. In particular, the Leu-rich region (residues 141-145) has a good propensity to adopt both α-helical and β-structured states. Intriguingly, in the RfaH homolog NusG, whose CTD uniquely adopts the β-barrel fold, the corresponding region is rich in residues as Val or Ile that present a strong preference for the β-structure. On this basis, we suggest that the presence of this Leu-rich element in RfaH(CTD) may be responsible for the peculiar structural behavior of the domain. The analysis of the sequences of RfaH family (PfamA code PF02357) unraveled that other members potentially share the structural properties of RfaH(CTD). These observations suggest that the unusual conformational behavior of RfaH(CTD) may be rare but not unique.
Choudhry, H; Albukhari, A; Morotti, M; Haider, S; Moralli, D; Smythies, J; Schödel, J; Green, C M; Camps, C; Buffa, F; Ratcliffe, P; Ragoussis, J; Harris, A L; Mole, D R
2015-01-01
Activation of cellular transcriptional responses, mediated by hypoxia-inducible factor (HIF), is common in many types of cancer, and generally confers a poor prognosis. Known to induce many hundreds of protein-coding genes, HIF has also recently been shown to be a key regulator of the non-coding transcriptional response. Here, we show that NEAT1 long non-coding RNA (lncRNA) is a direct transcriptional target of HIF in many breast cancer cell lines and in solid tumors. Unlike previously described lncRNAs, NEAT1 is regulated principally by HIF-2 rather than by HIF-1. NEAT1 is a nuclear lncRNA that is an essential structural component of paraspeckles and the hypoxic induction of NEAT1 induces paraspeckle formation in a manner that is dependent upon both NEAT1 and on HIF-2. Paraspeckles are multifunction nuclear structures that sequester transcriptionally active proteins as well as RNA transcripts that have been subjected to adenosine-to-inosine (A-to-I) editing. We show that the nuclear retention of one such transcript, F11R (also known as junctional adhesion molecule 1, JAM1), in hypoxia is dependent upon the hypoxic increase in NEAT1, thereby conferring a novel mechanism of HIF-dependent gene regulation. Induction of NEAT1 in hypoxia also leads to accelerated cellular proliferation, improved clonogenic survival and reduced apoptosis, all of which are hallmarks of increased tumorigenesis. Furthermore, in patients with breast cancer, high tumor NEAT1 expression correlates with poor survival. Taken together, these results indicate a new role for HIF transcriptional pathways in the regulation of nuclear structure and that this contributes to the pro-tumorigenic hypoxia-phenotype in breast cancer. PMID:25417700
Corbi, N; Libri, V; Fanciulli, M; Tinsley, J M; Davies, K E; Passananti, C
2000-06-01
Up-regulation of utrophin gene expression is recognized as a plausible therapeutic approach in the treatment of Duchenne muscular dystrophy (DMD). We have designed and engineered new zinc finger-based transcription factors capable of binding and activating transcription from the promoter of the dystrophin-related gene, utrophin. Using the recognition 'code' that proposes specific rules between zinc finger primary structure and potential DNA binding sites, we engineered a new gene named 'Jazz' that encodes for a three-zinc finger peptide. Jazz belongs to the Cys2-His2 zinc finger type and was engineered to target the nine base pair DNA sequence: 5'-GCT-GCT-GCG-3', present in the promoter region of both the human and mouse utrophin gene. The entire zinc finger alpha-helix region, containing the amino acid positions that are crucial for DNA binding, was specifically chosen on the basis of the contacts more frequently represented in the available list of the 'code'. Here we demonstrate that Jazz protein binds specifically to the double-stranded DNA target, with a dissociation constant of about 32 nM. Band shift and super-shift experiments confirmed the high affinity and specificity of Jazz protein for its DNA target. Moreover, we show that chimeric proteins, named Gal4-Jazz and Sp1-Jazz, are able to drive the transcription of a test gene from the human utrophin promoter.
Long non-coding RNAs and mRNAs profiling during spleen development in pig.
Che, Tiandong; Li, Diyan; Jin, Long; Fu, Yuhua; Liu, Yingkai; Liu, Pengliang; Wang, Yixin; Tang, Qianzi; Ma, Jideng; Wang, Xun; Jiang, Anan; Li, Xuewei; Li, Mingzhou
2018-01-01
Genome-wide transcriptomic studies in humans and mice have become extensive and mature. However, a comprehensive and systematic understanding of protein-coding genes and long non-coding RNAs (lncRNAs) expressed during pig spleen development has not been achieved. LncRNAs are known to participate in regulatory networks for an array of biological processes. Here, we constructed 18 RNA libraries from developing fetal pig spleen (55 days before birth), postnatal pig spleens (0, 30, 180 days and 2 years after birth), and the samples from the 2-year-old Wild Boar. A total of 15,040 lncRNA transcripts were identified among these samples. We found that the temporal expression pattern of lncRNAs was more restricted than observed for protein-coding genes. Time-series analysis showed two large modules for protein-coding genes and lncRNAs. The up-regulated module was enriched for genes related to immune and inflammatory function, while the down-regulated module was enriched for cell proliferation processes such as cell division and DNA replication. Co-expression networks indicated the functional relatedness between protein-coding genes and lncRNAs, which were enriched for similar functions over the series of time points examined. We identified numerous differentially expressed protein-coding genes and lncRNAs in all five developmental stages. Notably, ceruloplasmin precursor (CP), a protein-coding gene participating in antioxidant and iron transport processes, was differentially expressed in all stages. This study provides the first catalog of the developing pig spleen, and contributes to a fuller understanding of the molecular mechanisms underpinning mammalian spleen development.
Sanchez, Robersy; Grau, Ricardo
2005-09-01
A Boolean structure of the genetic code where Boolean deductions have biological and physicochemical meanings was discussed in a previous paper. Now, from these Boolean deductions we propose to define the value of amino acid information in order to consider the genetic information system as a communication system and to introduce the semantic content of information ignored by the conventional information theory. In this proposal, the value of amino acid information is proportional to the molecular weight of amino acids with a proportional constant of about 1.96 x 10(25) bits per kg. In addition to this, for the experimental estimations of the minimum energy dissipation in genetic logic operations, we present two postulates: (1) the energy Ei (i=1,2,...,20) of amino acids in the messages conveyed by proteins is proportional to the value of information, and (2) amino acids are distributed according to their energy Ei so the amino acid population in proteins follows a Boltzmann distribution. Specifically, in the genetic message carried by the DNA from the genomes of living organisms, we found that the minimum energy dissipation in genetic logic operations was close to kTLn(2) joules per bit.
Nelson, Leigh A; Cameron, Stephen L; Yeates, David K
2011-10-01
The monogeneric family Fergusoninidae consists of gall-forming flies that, together with Fergusobia (Tylenchida: Neotylenchidae) nematodes, form the only known mutualistic association between insects and nematodes. In this study, the entire 16,000 bp mitochondrial genome of Fergusonina taylori Nelson and Yeates was sequenced. The circular genome contains one encoding region including 27 genes and one non-coding A+T-rich region. The arrangement of the protein-coding, ribosomal RNA (rRNA) and transfer RNA (tRNA) genes was the same as that found in the ancestral insect. Nucleotide composition is highly A+T biased. All of the protein initiation codons are ATN, except for nad1 which begins with TTT. All 22 tRNA anticodons of F. taylori match those observed in Drosophila yakuba, and all form the typical cloverleaf structure except for tRNA-Ser((AGN)) which lacks a dihydrouridine (DHU) arm. Secondary structural features of the rRNA genes of Fergusonina are similar to those proposed for other insects, with minor modifications. The mitochondrial genome of Fergusonina presented here may prove valuable for resolving the sister group to the Fergusoninidae, and expands the available mtDNA data sources for acalyptrates overall.
A new method to improve network topological similarity search: applied to fold recognition
Lhota, John; Hauptman, Ruth; Hart, Thomas; Ng, Clara; Xie, Lei
2015-01-01
Motivation: Similarity search is the foundation of bioinformatics. It plays a key role in establishing structural, functional and evolutionary relationships between biological sequences. Although the power of the similarity search has increased steadily in recent years, a high percentage of sequences remain uncharacterized in the protein universe. Thus, new similarity search strategies are needed to efficiently and reliably infer the structure and function of new sequences. The existing paradigm for studying protein sequence, structure, function and evolution has been established based on the assumption that the protein universe is discrete and hierarchical. Cumulative evidence suggests that the protein universe is continuous. As a result, conventional sequence homology search methods may be not able to detect novel structural, functional and evolutionary relationships between proteins from weak and noisy sequence signals. To overcome the limitations in existing similarity search methods, we propose a new algorithmic framework—Enrichment of Network Topological Similarity (ENTS)—to improve the performance of large scale similarity searches in bioinformatics. Results: We apply ENTS to a challenging unsolved problem: protein fold recognition. Our rigorous benchmark studies demonstrate that ENTS considerably outperforms state-of-the-art methods. As the concept of ENTS can be applied to any similarity metric, it may provide a general framework for similarity search on any set of biological entities, given their representation as a network. Availability and implementation: Source code freely available upon request Contact: lxie@iscb.org PMID:25717198
1996-01-01
Mutations in the Caenorhabditis elegans gene unc-89 result in nematodes having disorganized muscle structure in which thick filaments are not organized into A-bands, and there are no M-lines. Beginning with a partial cDNA from the C. elegans sequencing project, we have cloned and sequenced the unc-89 gene. An unc-89 allele, st515, was found to contain an 84-bp deletion and a 10-bp duplication, resulting in an in- frame stop codon within predicted unc-89 coding sequence. Analysis of the complete coding sequence for unc-89 predicts a novel 6,632 amino acid polypeptide consisting of sequence motifs which have been implicated in protein-protein interactions. UNC-89 begins with 67 residues of unique sequences, SH3, dbl/CDC24, and PH domains, 7 immunoglobulins (Ig) domains, a putative KSP-containing multiphosphorylation domain, and ends with 46 Ig domains. A polyclonal antiserum raised to a portion of unc-89 encoded sequence reacts to a twitchin-sized polypeptide from wild type, but truncated polypeptides from st515 and from the amber allele e2338. By immunofluorescent microscopy, this antiserum localizes to the middle of A-bands, consistent with UNC-89 being a structural component of the M-line. Previous studies indicate that myofilament lattice assembly begins with positional cues laid down in the basement membrane and muscle cell membrane. We propose that the intracellular protein UNC-89 responds to these signals, localizes, and then participates in assembling an M-line. PMID:8603916
Guo, D; Maiss, E; Adam, G; Casper, R
1995-05-01
The RNA3 of prunus necrotic ringspot ilarvirus (PNRSV) has been cloned and its entire sequence determined. The RNA3 consists of 1943 nucleotides (nt) and possesses two large open reading frames (ORFs) separated by an intergenic region of 74 nt. The 5' proximal ORF is 855 nt in length and codes for a protein of molecular mass 31.4 kDa which has homologies with the putative movement protein of other members of the Bromoviridae. The 3' proximal ORF of 675 nt is the cistron for the coat protein (CP) and has a predicted molecular mass of 24.9 kDa. The sequence of the 3' non-coding region (NCR) of PNRSV RNA3 showed a high degree of similarity with those of tobacco streak virus (TSV), prune dwarf virus (PDV), apple mosaic virus (ApMV) and also alfalfa mosaic virus (AIMV). In addition it contained potential stem-loop structures with interspersed AUGC motifs characteristic for ilar- and alfamoviruses. This conserved primary and secondary structure in all 3' NCRs may be responsible for the interaction with homologous and heterologous CPs and subsequent activation of genome replication. The CP gene of an ApMV isolate (ApMV-G) of 657 nt has also been cloned and sequenced. Although ApMV and PNRSV have a distant serological relationship, the deduced amino acid sequences of their CPs have an identity of only 51.8%. The N termini of PNRSV and ApMV CPs have in common a zinc-finger motif and the potential to form an amphipathic helix.
Crystal structure of AFV3-109, a highly conserved protein from crenarchaeal viruses
Keller, Jenny; Leulliot, Nicolas; Cambillau, Christian; Campanacci, Valérie; Porciero, Stéphanie; Prangishvili, David; Forterre, Patrick; Cortez, Diego; Quevillon-Cheruel, Sophie; van Tilbeurgh, Herman
2007-01-01
The extraordinary morphologies of viruses infecting hyperthermophilic archaea clearly distinguish them from bacterial and eukaryotic viruses. Moreover, their genomes code for proteins that to a large extend have no related sequences in the extent databases. However, a small pool of genes is shared by overlapping subsets of these viruses, and the most conserved gene, exemplified by the ORF109 of the Acidianus Filamentous Virus 3, AFV3, is present on genomes of members of three viral familes, the Lipothrixviridae, Rudiviridae, and "Bicaudaviridae", as well as of the unclassified Sulfolobus Turreted Icosahedral Virus, STIV. We present here the crystal structure of the protein (Mr = 13.1 kD, 109 residues) encoded by the AFV3 ORF 109 in two different crystal forms at 1.5 and 1.3 Å resolution. The structure of AFV3-109 is a five stranded β-sheet with loops on one side and three helices on the other. It forms a dimer adopting the shape of a cradle that encompasses the best conserved regions of the sequence. No protein with a related fold could be identified except for the ortholog from STIV1, whose structure was deposited at the Protein Data Bank. We could clearly identify a well bound glycerol inside the cradle, contacting exclusively totally conserved residues. This interaction was confirmed in solution by fluorescence titration. Although the function of AFV3-109 cannot be deduced directly from its structure, structural homology with the STIV1 protein, and the size and charge distribution of the cavity suggested it could interact with nucleic acids. Fluorescence quenching titrations also showed that AFV3-109 interacts with dsDNA. Genomic sequence analysis revealed bacterial homologs of AFV3-109 as a part of a putative previously unidentified prophage sequences in some Firmicutes. PMID:17241456
Paraspeckles: nuclear bodies built on long noncoding RNA
Bond, Charles S.
2009-01-01
Paraspeckles are ribonucleoprotein bodies found in the interchromatin space of mammalian cell nuclei. These structures play a role in regulating the expression of certain genes in differentiated cells by nuclear retention of RNA. The core paraspeckle proteins (PSF/SFPQ, P54NRB/NONO, and PSPC1 [paraspeckle protein 1]) are members of the DBHS (Drosophila melanogaster behavior, human splicing) family. These proteins, together with the long nonprotein-coding RNA NEAT1 (MEN-ϵ/β), associate to form paraspeckles and maintain their integrity. Given the large numbers of long noncoding transcripts currently being discovered through whole transcriptome analysis, paraspeckles may be a paradigm for a class of subnuclear bodies formed around long noncoding RNA. PMID:19720872
p3d--Python module for structural bioinformatics.
Fufezan, Christian; Specht, Michael
2009-08-21
High-throughput bioinformatic analysis tools are needed to mine the large amount of structural data via knowledge based approaches. The development of such tools requires a robust interface to access the structural data in an easy way. For this the Python scripting language is the optimal choice since its philosophy is to write an understandable source code. p3d is an object oriented Python module that adds a simple yet powerful interface to the Python interpreter to process and analyse three dimensional protein structure files (PDB files). p3d's strength arises from the combination of a) very fast spatial access to the structural data due to the implementation of a binary space partitioning (BSP) tree, b) set theory and c) functions that allow to combine a and b and that use human readable language in the search queries rather than complex computer language. All these factors combined facilitate the rapid development of bioinformatic tools that can perform quick and complex analyses of protein structures. p3d is the perfect tool to quickly develop tools for structural bioinformatics using the Python scripting language.
Moal, Iain H; Barradas-Bautista, Didier; Jiménez-García, Brian; Torchala, Mieczyslaw; van der Velde, Arjan; Vreven, Thom; Weng, Zhiping; Bates, Paul A; Fernández-Recio, Juan
2017-06-15
In order to function, proteins frequently bind to one another and form 3D assemblies. Knowledge of the atomic details of these structures helps our understanding of how proteins work together, how mutations can lead to disease, and facilitates the designing of drugs which prevent or mimic the interaction. Atomic modeling of protein-protein interactions requires the selection of near-native structures from a set of docked poses based on their calculable properties. By considering this as an information retrieval problem, we have adapted methods developed for Internet search ranking and electoral voting into IRaPPA, a pipeline integrating biophysical properties. The approach enhances the identification of near-native structures when applied to four docking methods, resulting in a near-native appearing in the top 10 solutions for up to 50% of complexes benchmarked, and up to 70% in the top 100. IRaPPA has been implemented in the SwarmDock server ( http://bmm.crick.ac.uk/∼SwarmDock/ ), pyDock server ( http://life.bsc.es/pid/pydockrescoring/ ) and ZDOCK server ( http://zdock.umassmed.edu/ ), with code available on request. moal@ebi.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
NASA Astrophysics Data System (ADS)
Dai, Jin; Niemi, Antti J.; He, Jianfeng
2016-07-01
The Landau-Ginzburg-Wilson paradigm is proposed as a framework, to investigate the conformational landscape of intrinsically unstructured proteins. A universal Cα-trace Landau free energy is deduced from general symmetry considerations, with the ensuing all-atom structure modeled using publicly available reconstruction programs Pulchra and Scwrl. As an example, the conformational stability of an amyloid precursor protein intra-cellular domain (AICD) is inspected; the reference conformation is the crystallographic structure with code 3DXC in Protein Data Bank (PDB) that describes a heterodimer of AICD and a nuclear multi-domain adaptor protein Fe65. Those conformations of AICD that correspond to local or near-local minima of the Landau free energy are identified. For this, the response of the original 3DXC conformation to variations in the ambient temperature is investigated, using the Glauber algorithm. The conclusion is that in isolation the AICD conformation in 3DXC must be unstable. A family of degenerate conformations that minimise the Landau free energy is identified, and it is proposed that the native state of an isolated AICD is a superposition of these conformations. The results are fully in line with the presumed intrinsically unstructured character of isolated AICD and should provide a basis for a systematic analysis of AICD structure in future NMR experiments.
MultitaskProtDB-II: an update of a database of multitasking/moonlighting proteins
Franco-Serrano, Luís; Hernández, Sergio; Calvo, Alejandra; Severi, María A; Ferragut, Gabriela; Pérez-Pons, JosepAntoni; Piñol, Jaume; Pich, Òscar; Mozo-Villarias, Ángel; Amela, Isaac
2018-01-01
Abstract Multitasking, or moonlighting, is the capability of some proteins to execute two or more biological functions. MultitaskProtDB-II is a database of multifunctional proteins that has been updated. In the previous version, the information contained was: NCBI and UniProt accession numbers, canonical and additional biological functions, organism, monomeric/oligomeric states, PDB codes and bibliographic references. In the present update, the number of entries has been increased from 288 to 694 moonlighting proteins. MultitaskProtDB-II is continually being curated and updated. The new database also contains the following information: GO descriptors for the canonical and moonlighting functions, three-dimensional structure (for those proteins lacking PDB structure, a model was made using Itasser and Phyre), the involvement of the proteins in human diseases (78% of human moonlighting proteins) and whether the protein is a target of a current drug (48% of human moonlighting proteins). These numbers highlight the importance of these proteins for the analysis and explanation of human diseases and target-directed drug design. Moreover, 25% of the proteins of the database are involved in virulence of pathogenic microorganisms, largely in the mechanism of adhesion to the host. This highlights their importance for the mechanism of microorganism infection and vaccine design. MultitaskProtDB-II is available at http://wallace.uab.es/multitaskII. PMID:29136215
NASA Technical Reports Server (NTRS)
Funderburgh, J. L.; Funderburgh, M. L.; Brown, S. J.; Vergnes, J. P.; Hassell, J. R.; Mann, M. M.; Conrad, G. W.; Spooner, B. S. (Principal Investigator)
1993-01-01
Amino acid sequence from tryptic peptides of three different bovine corneal keratan sulfate proteoglycan (KSPG) core proteins (designated 37A, 37B, and 25) showed similarities to the sequence of a chicken KSPG core protein lumican. Bovine lumican cDNA was isolated from a bovine corneal expression library by screening with chicken lumican cDNA. The bovine cDNA codes for a 342-amino acid protein, M(r) 38,712, containing amino acid sequences identified in the 37B KSPG core protein. The bovine lumican is 68% identical to chicken lumican, with an 83% identity excluding the N-terminal 40 amino acids. Location of 6 cysteine and 4 consensus N-glycosylation sites in the bovine sequence were identical to those in chicken lumican. Bovine lumican had about 50% identity to bovine fibromodulin and 20% identity to bovine decorin and biglycan. About two-thirds of the lumican protein consists of a series of 10 amino acid leucine-rich repeats that occur in regions of calculated high beta-hydrophobic moment, suggesting that the leucine-rich repeats contribute to beta-sheet formation in these proteins. Sequences obtained from 37A and 25 core proteins were absent in bovine lumican, thus predicting a unique primary structure and separate mRNA for each of the three bovine KSPG core proteins.
Exploring the potential impact of an expanded genetic code on protein function
Xiao, Han; Nasertorabi, Fariborz; Choi, Sei -hyun; ...
2015-05-18
With few exceptions, all living organisms encode the same 20 canonical amino acids; however, it remains an open question whether organisms with additional amino acids beyond the common 20 might have an evolutionary advantage. In this paper, we begin to test that notion by making a large library of mutant enzymes in which 10 structurally distinct noncanonical amino acids were substituted at single sites randomly throughout TEM-1 β-lactamase. A screen for growth on the β-lactam antibiotic cephalexin afforded a unique p-acrylamido-phenylalanine (AcrF) mutation at Val-216 that leads to an increase in catalytic efficiency by increasing k cat, but not significantlymore » affecting K M. To understand the structural basis for this enhanced activity, we solved the X-ray crystal structures of the ligand-free mutant enzyme and of the deacylation-defective wild-type and mutant cephalexin acyl-enzyme intermediates. These structures show that the Val-216–AcrF mutation leads to conformational changes in key active site residues—both in the free enzyme and upon formation of the acyl-enzyme intermediate—that lower the free energy of activation of the substrate transacylation reaction. Finally, the functional changes induced by this mutation could not be reproduced by substitution of any of the 20 canonical amino acids for Val-216, indicating that an expanded genetic code may offer novel solutions to proteins as they evolve new activities.« less
Death of a dogma: eukaryotic mRNAs can code for more than one protein.
Mouilleron, Hélène; Delcourt, Vivian; Roucou, Xavier
2016-01-08
mRNAs carry the genetic information that is translated by ribosomes. The traditional view of a mature eukaryotic mRNA is a molecule with three main regions, the 5' UTR, the protein coding open reading frame (ORF) or coding sequence (CDS), and the 3' UTR. This concept assumes that ribosomes translate one ORF only, generally the longest one, and produce one protein. As a result, in the early days of genomics and bioinformatics, one CDS was associated with each protein-coding gene. This fundamental concept of a single CDS is being challenged by increasing experimental evidence indicating that annotated proteins are not the only proteins translated from mRNAs. In particular, mass spectrometry (MS)-based proteomics and ribosome profiling have detected productive translation of alternative open reading frames. In several cases, the alternative and annotated proteins interact. Thus, the expression of two or more proteins translated from the same mRNA may offer a mechanism to ensure the co-expression of proteins which have functional interactions. Translational mechanisms already described in eukaryotic cells indicate that the cellular machinery is able to translate different CDSs from a single viral or cellular mRNA. In addition to summarizing data showing that the protein coding potential of eukaryotic mRNAs has been underestimated, this review aims to challenge the single translated CDS dogma. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Levitin, Anastasia; Yanofsky, Charles
2010-01-01
Tryptophan, phenylalanine, tyrosine, and several other metabolites are all synthesized from a common precursor, chorismic acid. Since tryptophan is a product of an energetically expensive biosynthetic pathway, bacteria have developed sensing mechanisms to downregulate synthesis of the enzymes of tryptophan formation when synthesis of the amino acid is not needed. In Bacillus subtilis and some other Gram-positive bacteria, trp operon expression is regulated by two proteins, TRAP (the tryptophan-activated RNA binding protein) and AT (the anti-TRAP protein). TRAP is activated by bound tryptophan, and AT synthesis is increased upon accumulation of uncharged tRNATrp. Tryptophan-activated TRAP binds to trp operon leader RNA, generating a terminator structure that promotes transcription termination. AT binds to tryptophan-activated TRAP, inhibiting its RNA binding ability. In B. subtilis, AT synthesis is upregulated both transcriptionally and translationally in response to the accumulation of uncharged tRNATrp. In this paper, we focus on explaining the differences in organization and regulatory functions of the at operon's leader peptide-coding region, rtpLP, of B. subtilis and Bacillus licheniformis. Our objective was to correlate the greater growth sensitivity of B. licheniformis to tryptophan starvation with the spacing of the three Trp codons in its at operon leader peptide-coding region. Our findings suggest that the Trp codon location in rtpLP of B. licheniformis is designed to allow a mild charged-tRNATrp deficiency to expose the Shine-Dalgarno sequence and start codon for the AT protein, leading to increased AT synthesis. PMID:20061467
Lee, Woonghee; Kim, Jin Hae; Westler, William M.; Markley, John L.
2011-01-01
Summary: PONDEROSA (Peak-picking Of Noe Data Enabled by Restriction of Shift Assignments) accepts input information consisting of a protein sequence, backbone and sidechain NMR resonance assignments, and 3D-NOESY (13C-edited and/or 15N-edited) spectra, and returns assignments of NOESY crosspeaks, distance and angle constraints, and a reliable NMR structure represented by a family of conformers. PONDEROSA incorporates and integrates external software packages (TALOS+, STRIDE and CYANA) to carry out different steps in the structure determination. PONDEROSA implements internal functions that identify and validate NOESY peak assignments and assess the quality of the calculated three-dimensional structure of the protein. The robustness of the analysis results from PONDEROSA's hierarchical processing steps that involve iterative interaction among the internal and external modules. PONDEROSA supports a variety of input formats: SPARKY assignment table (.shifts) and spectrum file formats (.ucsf), XEASY proton file format (.prot), and NMR-STAR format (.star). To demonstrate the utility of PONDEROSA, we used the package to determine 3D structures of two proteins: human ubiquitin and Escherichia coli iron-sulfur scaffold protein variant IscU(D39A). The automatically generated structural constraints and ensembles of conformers were as good as or better than those determined previously by much less automated means. Availability: The program, in the form of binary code along with tutorials and reference manuals, is available at http://ponderosa.nmrfam.wisc.edu/. Contact: whlee@nmrfam.wisc.edu; markley@nmrfam.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21511715
2014-01-01
Background It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. Results We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. Conclusion SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/. PMID:24776231
Cao, Renzhi; Wang, Zheng; Wang, Yiheng; Cheng, Jianlin
2014-04-28
It is important to predict the quality of a protein structural model before its native structure is known. The method that can predict the absolute local quality of individual residues in a single protein model is rare, yet particularly needed for using, ranking and refining protein models. We developed a machine learning tool (SMOQ) that can predict the distance deviation of each residue in a single protein model. SMOQ uses support vector machines (SVM) with protein sequence and structural features (i.e. basic feature set), including amino acid sequence, secondary structures, solvent accessibilities, and residue-residue contacts to make predictions. We also trained a SVM model with two new additional features (profiles and SOV scores) on 20 CASP8 targets and found that including them can only improve the performance when real deviations between native and model are higher than 5Å. The SMOQ tool finally released uses the basic feature set trained on 85 CASP8 targets. Moreover, SMOQ implemented a way to convert predicted local quality scores into a global quality score. SMOQ was tested on the 84 CASP9 single-domain targets. The average difference between the residue-specific distance deviation predicted by our method and the actual distance deviation on the test data is 2.637Å. The global quality prediction accuracy of the tool is comparable to other good tools on the same benchmark. SMOQ is a useful tool for protein single model quality assessment. Its source code and executable are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/.
Gene cloning and prokaryotic expression of recombinant flagellin A from Vibrio parahaemolyticus
NASA Astrophysics Data System (ADS)
Yuan, Ye; Wang, Xiuli; Guo, Sheping; Liu, Yang; Ge, Hui; Qiu, Xuemei
2010-11-01
The Gram-negative Vibrio parahaemolyticus is a common pathogen in humans and marine animals. Bacteria flagellins play an important role during infection and induction of the host immune response. Thus, flagellin proteins are an ideal target for vaccines. We amplified the complete flagellin subunit gene ( flaA) from V. parahaemolyticus ATCC 17802. We then cloned and expressed the gene into Escherichia coli BL21 (DE3) cells. The gene coded for a protein that was 62.78 kDa. We purified and characterized the protein using Ni-NTA affinity chromatography and Anti-His antibody Western blotting, respectively. Our results provide a basis for further studies into the utility of the FlaA protein as a vaccine candidate against infection by Vibrio parahaemolyticus. In addition, the purified FlaA protein can be used for further functional and structural studies.
Omeire, Destiny; Abdin, Shaunte; Brooks, Daniel M; Miranda, Hector C
2015-04-01
The Germain's Peacock-Pheasant Polyplectron germaini (Aves, Galliformes, Phasianidae) is classified as Near Threatened on the IUCN Red List. The complete mitochondrial genome of P. germaini is 16,699 bp, consisting of 13 protein-coding genes, 2 rRNA, 22 tRNA genes and 1 control region. All of the 13 protein-coding genes have ATG as start codon. Eight of the 13 protein-coding genes have TAA as stop codon.
Improved method for predicting protein fold patterns with ensemble classifiers.
Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C
2012-01-27
Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.
3D-SURFER: software for high-throughput protein surface comparison and analysis
La, David; Esquivel-Rodríguez, Juan; Venkatraman, Vishwesh; Li, Bin; Sael, Lee; Ueng, Stephen; Ahrendt, Steven; Kihara, Daisuke
2009-01-01
Summary: We present 3D-SURFER, a web-based tool designed to facilitate high-throughput comparison and characterization of proteins based on their surface shape. As each protein is effectively represented by a vector of 3D Zernike descriptors, comparison times for a query protein against the entire PDB take, on an average, only a couple of seconds. The web interface has been designed to be as interactive as possible with displays showing animated protein rotations, CATH codes and structural alignments using the CE program. In addition, geometrically interesting local features of the protein surface, such as pockets that often correspond to ligand binding sites as well as protrusions and flat regions can also be identified and visualized. Availability: 3D-SURFER is a web application that can be freely accessed from: http://dragon.bio.purdue.edu/3d-surfer Contact: dkihara@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19759195
3D-SURFER: software for high-throughput protein surface comparison and analysis.
La, David; Esquivel-Rodríguez, Juan; Venkatraman, Vishwesh; Li, Bin; Sael, Lee; Ueng, Stephen; Ahrendt, Steven; Kihara, Daisuke
2009-11-01
We present 3D-SURFER, a web-based tool designed to facilitate high-throughput comparison and characterization of proteins based on their surface shape. As each protein is effectively represented by a vector of 3D Zernike descriptors, comparison times for a query protein against the entire PDB take, on an average, only a couple of seconds. The web interface has been designed to be as interactive as possible with displays showing animated protein rotations, CATH codes and structural alignments using the CE program. In addition, geometrically interesting local features of the protein surface, such as pockets that often correspond to ligand binding sites as well as protrusions and flat regions can also be identified and visualized. 3D-SURFER is a web application that can be freely accessed from: http://dragon.bio.purdue.edu/3d-surfer dkihara@purdue.edu Supplementary data are available at Bioinformatics online.
RNA structural constraints in the evolution of the influenza A virus genome NP segment
Gultyaev, Alexander P; Tsyganov-Bodounov, Anton; Spronken, Monique IJ; van der Kooij, Sander; Fouchier, Ron AM; Olsthoorn, René CL
2014-01-01
Conserved RNA secondary structures were predicted in the nucleoprotein (NP) segment of the influenza A virus genome using comparative sequence and structure analysis. A number of structural elements exhibiting nucleotide covariations were identified over the whole segment length, including protein-coding regions. Calculations of mutual information values at the paired nucleotide positions demonstrate that these structures impose considerable constraints on the virus genome evolution. Functional importance of a pseudoknot structure, predicted in the NP packaging signal region, was confirmed by plaque assays of the mutant viruses with disrupted structure and those with restored folding using compensatory substitutions. Possible functions of the conserved RNA folding patterns in the influenza A virus genome are discussed. PMID:25180940
Juozapaitis, Mindaugas; Zvirbliene, Aurelija; Kucinskaite, Indre; Sezaite, Indre; Slibinskas, Rimantas; Coiras, Mayte; de Ory Manchon, Fernando; López-Huertas, María Rosa; Pérez-Breña, Pilar; Staniulis, Juozas; Narkeviciute, Irena; Sasnauskas, Kestutis
2008-05-01
Human parainfluenza virus types 1 and 3 (HPIV1 and HPIV3, respectively), members of the virus family Paramyxoviridae, are common causes of lower respiratory tract infections in infants, young children, the immunocompromised, the chronically ill, and the elderly. In order to synthesize recombinant HPIV1 and HPIV3 nucleocapsid proteins, the coding sequences were cloned into the yeast Saccharomyces cerevisiae expression vector pFGG3 under control of GAL7 promoter. A high level of recombinant virus nucleocapsid proteins expression (20-24 mg l(-1) of yeast culture) was obtained. Electron microscopy demonstrated the assembly of typical herring-bone structures of purified recombinant nucleocapsid proteins, characteristic for other paramyxoviruses. These structures contained host RNA, which was resistant to RNase treatment. The nucleocapsid proteins were stable in yeast and were easily purified by caesium chloride gradient ultracentrifugation. Therefore, this system proved to be simple, efficient and cost-effective, suitable for high-level production of parainfluenza virus nucleocapsids as nucleocapsid-like particles. When used as coating antigens in an indirect ELISA, the recombinant N proteins reacted with sera of patients infected with HPIV1 or 3. Serological assays to detect HPIV-specific antibodies could be designed on this basis.
Structure-based functional annotation: yeast ymr099c codes for a D-hexose-6-phosphate mutarotase.
Graille, Marc; Baltaze, Jean-Pierre; Leulliot, Nicolas; Liger, Dominique; Quevillon-Cheruel, Sophie; van Tilbeurgh, Herman
2006-10-06
Despite the generation of a large amount of sequence information over the last decade, more than 40% of well characterized enzymatic functions still lack associated protein sequences. Assigning protein sequences to documented biochemical functions is an interesting challenge. We illustrate here that structural genomics may be a reasonable approach in addressing these questions. We present the crystal structure of the Saccharomyces cerevisiae YMR099cp, a protein of unknown function. YMR099cp adopts the same fold as galactose mutarotase and shares the same catalytic machinery necessary for the interconversion of the alpha and beta anomers of galactose. The structure revealed the presence in the active site of a sulfate ion attached by an arginine clamp made by the side chain from two strictly conserved arginine residues. This sulfate is ideally positioned to mimic the phosphate group of hexose 6-phosphate. We have subsequently successfully demonstrated that YMR099cp is a hexose-6-phosphate mutarotase with broad substrate specificity. We solved high resolution structures of some substrate enzyme complexes, further confirming our functional hypothesis. The metabolic role of a hexose-6-phosphate mutarotase is discussed. This work illustrates that structural information has been crucial to assign YMR099cp to the orphan EC activity: hexose-phosphate mutarotase.
SMOG 2: A Versatile Software Package for Generating Structure-Based Models.
Noel, Jeffrey K; Levi, Mariana; Raghunathan, Mohit; Lammert, Heiko; Hayes, Ryan L; Onuchic, José N; Whitford, Paul C
2016-03-01
Molecular dynamics simulations with coarse-grained or simplified Hamiltonians have proven to be an effective means of capturing the functionally important long-time and large-length scale motions of proteins and RNAs. Originally developed in the context of protein folding, structure-based models (SBMs) have since been extended to probe a diverse range of biomolecular processes, spanning from protein and RNA folding to functional transitions in molecular machines. The hallmark feature of a structure-based model is that part, or all, of the potential energy function is defined by a known structure. Within this general class of models, there exist many possible variations in resolution and energetic composition. SMOG 2 is a downloadable software package that reads user-designated structural information and user-defined energy definitions, in order to produce the files necessary to use SBMs with high performance molecular dynamics packages: GROMACS and NAMD. SMOG 2 is bundled with XML-formatted template files that define commonly used SBMs, and it can process template files that are altered according to the needs of each user. This computational infrastructure also allows for experimental or bioinformatics-derived restraints or novel structural features to be included, e.g. novel ligands, prosthetic groups and post-translational/transcriptional modifications. The code and user guide can be downloaded at http://smog-server.org/smog2.
Martín, A C; López, R; García, P
1996-06-01
Cp-1, a bacteriophage infecting Streptococcus pneumoniae, has a linear double-stranded DNA genome, with a terminal protein covalently linked to its 5' ends, that replicates by the protein-priming mechanism. We describe here the complete DNA sequence and transcriptional map of the Cp-1 genome. These analyses have led to the firm assignment of 10 genes and the localization of 19 additional open reading frames in the 19,345-bp Cp-1 DNA. Striking similarities and differences between some of these proteins and those of the Bacillus subtilis phage phi 29, a system that also replicates its DNA by the protein-priming mechanism, have been revealed. The genes coding for structural proteins and assembly factors are located in the central part of the Cp-1 genome. Several proteins corresponding to the predicted gene products were identified by in vitro and in vivo expression of the cloned genes. Mature major head protein from the virion particles results from hydrolysis of the primary gene product at the His-49 residue, whereas the phage gene is expressed in Escherichia coli without modification. We have also identified two open reading frames coding for proteins that show high degrees of similarity to the N- and C-terminal regions, respectively, of the single tail protein identified in phi 29. Sequencing and primer extension analysis suggest transcription of a small RNA showing a secondary structure similar to that of the prohead RNA required for the ATP-dependent packaging of phi 29 DNA. On the basis of its temporal expression, transcription of the Cp-1 genome takes place in two stages, early and late. Combined Northern (RNA) blot and primer extension experiments allowed us to map the 5' initiation sites of the transcripts, and we found that only three genes were transcribed from right to left. These analyses reveal that there are also noticeable differences between Cp-l and phi 29 in transcriptional organization. Considered together, the observations reported here provide new tangible evidence on phylogenetic relationships between B. subtilis and S. pneumoniae.
McLysaght, Aoife; Guerzoni, Daniele
2015-09-26
The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces, Drosophila, Plasmodium, Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an 'RNA-first' or 'ORF-first' pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations. © 2015 The Authors.
The complete nucleotide sequence of RNA beta from the type strain of barley stripe mosaic virus.
Gustafson, G; Armour, S L
1986-01-01
The complete nucleotide sequence of RNA beta from the type strain of barley stripe mosaic virus (BSMV) has been determined. The sequence is 3289 nucleotides in length and contains four open reading frames (ORFs) which code for proteins of Mr 22,147 (ORF1), Mr 58,098 (ORF2), Mr 17,378 (ORF3), and Mr 14,119 (ORF4). The predicted N-terminal amino acid sequence of the polypeptide encoded by the ORF nearest the 5'-end of the RNA (ORF1) is identical (after the initiator methionine) to the published N-terminal amino acid sequence of BSMV coat protein for 29 of the first 30 amino acids. ORF2 occupies the central portion of the coding region of RNA beta and ORF3 is located at the 3'-end. The ORF4 sequence overlaps the 3'-region of ORF2 and the 5'-region of ORF3 and differs in codon usage from the other three RNA beta ORFs. The coding region of RNA beta is followed by a poly(A) tract and a 238 nucleotide tRNA-like structure which are common to all three BSMV genomic RNAs. Images PMID:3754962
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ekiert, Damian C.; Cox, Jeffery S.
Nearly 10% of the coding capacity of the Mycobacterium tuberculosis genome is devoted to two highly expanded and enigmatic protein families called PE and PPE, some of which are important virulence/immunogenicity factors and are secreted during infection via a unique alternative secretory system termed "type VII." How PE-PPE proteins function during infection and how they are translocated to the bacterial surface through the five distinct type VII secretion systems [ESAT-6 secretion system (ESX)] of M. tuberculosis is poorly understood. Here in this paper, we report the crystal structure of a PE-PPE heterodimer bound to ESX secretion-associated protein G (EspG), whichmore » adopts a novel fold. This PE-PPE-EspG complex, along with structures of two additional EspGs, suggests that EspG acts as an adaptor that recognizes specific PE-PPE protein complexes via extensive interactions with PPE domains, and delivers them to ESX machinery for secretion. Surprisingly, secretion of most PE-PPE proteins in M. tuberculosis is likely mediated by EspG from the ESX-5 system, underscoring the importance of ESX-5 in mycobacterial pathogenesis. Furthermore, our results indicate that PE-PPE domains function as cis-acting targeting sequences that are read out by EspGs, revealing the molecular specificity for secretion through distinct ESX pathways.« less
Ekiert, Damian C.; Cox, Jeffery S.
2014-10-01
Nearly 10% of the coding capacity of the Mycobacterium tuberculosis genome is devoted to two highly expanded and enigmatic protein families called PE and PPE, some of which are important virulence/immunogenicity factors and are secreted during infection via a unique alternative secretory system termed "type VII." How PE-PPE proteins function during infection and how they are translocated to the bacterial surface through the five distinct type VII secretion systems [ESAT-6 secretion system (ESX)] of M. tuberculosis is poorly understood. Here in this paper, we report the crystal structure of a PE-PPE heterodimer bound to ESX secretion-associated protein G (EspG), whichmore » adopts a novel fold. This PE-PPE-EspG complex, along with structures of two additional EspGs, suggests that EspG acts as an adaptor that recognizes specific PE-PPE protein complexes via extensive interactions with PPE domains, and delivers them to ESX machinery for secretion. Surprisingly, secretion of most PE-PPE proteins in M. tuberculosis is likely mediated by EspG from the ESX-5 system, underscoring the importance of ESX-5 in mycobacterial pathogenesis. Furthermore, our results indicate that PE-PPE domains function as cis-acting targeting sequences that are read out by EspGs, revealing the molecular specificity for secretion through distinct ESX pathways.« less
Three-dimensional structure of Erwinia carotovora L-asparaginase
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kislitsyn, Yu. A.; Kravchenko, O. V.; Nikonov, S. V.
2006-10-15
Three-dimensional structure of Erwinia carotovora L-asparaginase, which has antitumor activity and is used for the treatment of acute lymphoblastic leukemia, was solved at 3 A resolution and refined to R{sub cryst} = 20% and R{sub free} = 28%. Crystals of recombinant Erwinia carotovora L-asparaginase were grown by the hanging-drop vapor-diffusion method from protein solutions in a HEPES buffer (pH 6.5) and PEG MME 5000 solutions in a cacodylate buffer (pH 6.5) as the precipitant. Three-dimensional X-ray diffraction data were collected up to 3 A resolution from one crystal at room temperature. The structure was solved by the molecular replacement methodmore » using the coordinates of Erwinia chrysanthemi L-asparaginase as the starting model. The coordinates refined with the use of the CNS program package were deposited in the Protein Data Bank (PDB code 1ZCF)« less
Genome-wide Association Study Identifies Loci for the Polled Phenotype in Yak
Wu, Xiaoyun; Wang, Kun; Ding, Xuezhi; Wang, Mingcheng; Chu, Min; Xie, Xiuyue; Qiu, Qiang; Yan, Ping
2016-01-01
The absence of horns, known as the polled phenotype, is an economically important trait in modern yak husbandry, but the genomic structure and genetic basis of this phenotype have yet to be discovered. Here, we conducted a genome-wide association study with a panel of 10 horned and 10 polled yaks using whole genome sequencing. We mapped the POLLED locus to a 200-kb interval, which comprises three protein-coding genes. Further characterization of the candidate region showed recent artificial selection signals resulting from the breeding process. We suggest that expressional variations rather than structural variations in protein probably contribute to the polled phenotype. Our results not only represent the first and important step in establishing the genomic structure of the polled region in yak, but also add to our understanding of the polled trait in bovid species. PMID:27389700
The structure of the human interferon alpha/beta receptor gene.
Lutfalla, G; Gardiner, K; Proudhon, D; Vielh, E; Uzé, G
1992-02-05
Using the cDNA coding for the human interferon alpha/beta receptor (IFNAR), the IFNAR gene has been physically mapped relative to the other loci of the chromosome 21q22.1 region. 32,906 base pairs covering the IFNAR gene have been cloned and sequenced. Primer extension and solution hybridization-ribonuclease protection have been used to determine that the transcription of the gene is initiated in a broad region of 20 base pairs. Some aspects of the polymorphism of the gene, including noncoding sequences, have been analyzed; some are allelic differences in the coding sequence that induce amino acid variations in the resulting protein. The exon structure of the IFNAR gene and of that of the available genes for the receptors of the cytokine/growth hormone/prolactin/interferon receptor family have been compared with the predictions for the secondary structure of those receptors. From this analysis, we postulate a common origin and propose an hypothesis for the divergence from the immunoglobulin superfamily.
GrTEdb: the first web-based database of transposable elements in cotton (Gossypium raimondii).
Xu, Zhenzhen; Liu, Jing; Ni, Wanchao; Peng, Zhen; Guo, Yue; Ye, Wuwei; Huang, Fang; Zhang, Xianggui; Xu, Peng; Guo, Qi; Shen, Xinlian; Du, Jianchang
2017-01-01
Although several diploid and tetroploid Gossypium species genomes have been sequenced, the well annotated web-based transposable elements (TEs) database is lacking. To better understand the roles of TEs in structural, functional and evolutionary dynamics of the cotton genome, a comprehensive, specific, and user-friendly web-based database, Gossypium raimondii transposable elements database (GrTEdb), was constructed. A total of 14 332 TEs were structurally annotated and clearly categorized in G. raimondii genome, and these elements have been classified into seven distinct superfamilies based on the order of protein-coding domains, structures and/or sequence similarity, including 2929 Copia-like elements, 10 368 Gypsy-like elements, 299 L1 , 12 Mutators , 435 PIF-Harbingers , 275 CACTAs and 14 Helitrons . Meanwhile, the web-based sequence browsing, searching, downloading and blast tool were implemented to help users easily and effectively to annotate the TEs or TE fragments in genomic sequences from G. raimondii and other closely related Gossypium species. GrTEdb provides resources and information related with TEs in G. raimondii , and will facilitate gene and genome analyses within or across Gossypium species, evaluating the impact of TEs on their host genomes, and investigating the potential interaction between TEs and protein-coding genes in Gossypium species. http://www.grtedb.org/. © The Author(s) 2017. Published by Oxford University Press.
NegGOA: negative GO annotations selection using ontology structure.
Fu, Guangyuan; Wang, Jun; Yang, Bo; Yu, Guoxian
2016-10-01
Predicting the biological functions of proteins is one of the key challenges in the post-genomic era. Computational models have demonstrated the utility of applying machine learning methods to predict protein function. Most prediction methods explicitly require a set of negative examples-proteins that are known not carrying out a particular function. However, Gene Ontology (GO) almost always only provides the knowledge that proteins carry out a particular function, and functional annotations of proteins are incomplete. GO structurally organizes more than tens of thousands GO terms and a protein is annotated with several (or dozens) of these terms. For these reasons, the negative examples of a protein can greatly help distinguishing true positive examples of the protein from such a large candidate GO space. In this paper, we present a novel approach (called NegGOA) to select negative examples. Specifically, NegGOA takes advantage of the ontology structure, available annotations and potentiality of additional annotations of a protein to choose negative examples of the protein. We compare NegGOA with other negative examples selection algorithms and find that NegGOA produces much fewer false negatives than them. We incorporate the selected negative examples into an efficient function prediction model to predict the functions of proteins in Yeast, Human, Mouse and Fly. NegGOA also demonstrates improved accuracy than these comparing algorithms across various evaluation metrics. In addition, NegGOA is less suffered from incomplete annotations of proteins than these comparing methods. The Matlab and R codes are available at https://sites.google.com/site/guoxian85/neggoa gxyu@swu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui; ...
2014-10-02
Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui
Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
Pang, Erli; Wu, Xiaomei; Lin, Kui
2016-06-01
Protein evolution plays an important role in the evolution of each genome. Because of their functional nature, in general, most of their parts or sites are differently constrained selectively, particularly by purifying selection. Most previous studies on protein evolution considered individual proteins in their entirety or compared protein-coding sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each protein of a given genome. To this end, based on PfamA annotation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occurring within protein domains and those within unassigned regions. With these classifications, we found: the density of synonymous SNPs within domains is significantly greater than that of synonymous SNPs within unassigned regions; however, the density of non-synonymous SNPs shows the opposite pattern. We also found there are signatures of purifying selection on both the domain and unassigned regions. Furthermore, the selective strength on domains is significantly greater than that on unassigned regions. In addition, among all of the human protein sequences, there are 117 PfamA domains in which no SNPs are found. Our results highlight an important aspect of protein domains and may contribute to our understanding of protein evolution.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oker-Blom, C.
The order of translation in vivo of the genes coding for rubella virus structural proteins was studied in infected B-Vero cells. The proteins were sequentially pulse-chase labeled with (/sup 35/S)methionine after synchronization of translation initiation with hypertonic salt treatment. A sequential labeling procedure (window-labeling) to specifically label defined segments of the structural proteins was also used. The labeled proteins were identified by sodium dodecyl sulfate-gel electrophoresis after immunoprecipitation with specific antisera directed against the two virion glycoproteins (E1 and E2a/E2b) and the nucleocapsid (C) protein. The order of translation was found to be NH/sub 2/-C-E2-E1-COOH. We have previously shown thatmore » the structural proteins are synthesized in vitro from a cytoplasmic 24S subgenomic mRNA as a 110,000-dalton (p110) precursor. Here, it is shown that p110 is precipitated with anti-C, anti-E2, and anti-E1 sera, indicating that p110 is the precursor of all three structural proteins. Two major in vitro translation products (M/sub r/s, 66,000 and 62,000) that could represent preterminated polypeptide chains or proteolytic cleavage products were precipitated with anti-C and anti-Es sera, but not with anti-E1 serum, indicating, in conformity with the in vivo results, that the genes for the C and E2 proteins are adjacent to each other. Using these specific antisera, we have also confirmed the identity of the unglycosylated forms of E1 (M/sub r/, 53,000) and E2 (M/sub r/ 30,000) immunoprecipitated from tunicamycin-treated infected cells. 18 references, 6 figures.« less
Van Lent, Sarah; Creasy, Heather Huot; Myers, Garry S A; Vanrompay, Daisy
2016-01-01
Variation is a central trait of the polymorphic membrane protein (Pmp) family. The number of pmp coding sequences differs between Chlamydia species, but it is unknown whether the number of pmp coding sequences is constant within a Chlamydia species. The level of conservation of the Pmp proteins has previously only been determined for Chlamydia trachomatis. As different Pmp proteins might be indispensible for the pathogenesis of different Chlamydia species, this study investigated the conservation of Pmp proteins both within and across C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci. The pmp coding sequences were annotated in 16 C. trachomatis, 6 C. pneumoniae, 2 C. abortus, and 16 C. psittaci genomes. The number and organization of polymorphic membrane coding sequences differed within and across the analyzed Chlamydia species. The length of coding sequences of pmpA,pmpB, and pmpH was conserved among all analyzed genomes, while the length of pmpE/F and pmpG, and remarkably also of the subtype pmpD, differed among the analyzed genomes. PmpD, PmpA, PmpH, and PmpA were the most conserved Pmp in C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci, respectively. PmpB was the most conserved Pmp across the 4 analyzed Chlamydia species. © 2016 S. Karger AG, Basel.
An Amino Acid Code for Irregular and Mixed Protein Packing
Joo, Hyun; Chavan, Archana; Fraga, Keith; Tsai, Jerry
2015-01-01
To advance our understanding of protein tertiary structure, the development of the knob-socket model is completed in an analysis of the packing in irregular coil and turn secondary structure packing as well as between mixed secondary structure. The knob-socket model simplifies packing based on repeated patterns of 2 motifs: a 3 residue socket for packing within 2° structure and a 4 residue knob-socket for 3° packing. For coil and turn secondary structure, knob-sockets allow identification of a correlation between amino acid composition and tertiary arrangements in space. Coil contributes almost as much as α-helices to tertiary packing. Irregular secondary structure involves 3 residue cliques of consecutive contacting residues or XYZ sockets. In irregular sockets, Gly, Pro, Asp and Ser are favored, while Cys, His, Met and Trp are not. For irregular knobs, the preference order is Arg, Asp, Pro, Asn, Thr, Leu, and Gly, while Cys, His, Met and Trp are not. In mixed packing, the knob amino acid preferences are a function of the socket that they are packing into, whereas the amino acid composition of the sockets does not depend on the secondary structure of the knob. A unique motif of a coil knob with an XYZ β-sheet socket may potentially function to inhibit β-sheet extension. In addition, analysis of the preferred crossing angles for strands within a β-sheet and mixed α-helices/β-sheets identifies canonical packing patterns useful in protein design. Lastly, the knob-socket model abstracts the complexity of protein tertiary structure into an intuitive packing surface topology map. PMID:26370334
MAISTAS: a tool for automatic structural evaluation of alternative splicing products.
Floris, Matteo; Raimondo, Domenico; Leoni, Guido; Orsini, Massimiliano; Marcatili, Paolo; Tramontano, Anna
2011-06-15
Analysis of the human genome revealed that the amount of transcribed sequence is an order of magnitude greater than the number of predicted and well-characterized genes. A sizeable fraction of these transcripts is related to alternatively spliced forms of known protein coding genes. Inspection of the alternatively spliced transcripts identified in the pilot phase of the ENCODE project has clearly shown that often their structure might substantially differ from that of other isoforms of the same gene, and therefore that they might perform unrelated functions, or that they might even not correspond to a functional protein. Identifying these cases is obviously relevant for the functional assignment of gene products and for the interpretation of the effect of variations in the corresponding proteins. Here we describe a publicly available tool that, given a gene or a protein, retrieves and analyses all its annotated isoforms, provides users with three-dimensional models of the isoform(s) of his/her interest whenever possible and automatically assesses whether homology derived structural models correspond to plausible structures. This information is clearly relevant. When the homology model of some isoforms of a gene does not seem structurally plausible, the implications are that either they assume a structure unrelated to that of the other isoforms of the same gene with presumably significant functional differences, or do not correspond to functional products. We provide indications that the second hypothesis is likely to be true for a substantial fraction of the cases. http://maistas.bioinformatica.crs4.it/.
Transmembrane protein topology prediction using support vector machines.
Nugent, Timothy; Jones, David T
2009-05-26
Alpha-helical transmembrane (TM) proteins are involved in a wide range of important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion. Many are also prime drug targets, and it has been estimated that more than half of all drugs currently on the market target membrane proteins. However, due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under-represented in structural databases. In the absence of structural data, sequence-based prediction methods allow TM protein topology to be investigated. We present a support vector machine-based (SVM) TM protein topology predictor that integrates both signal peptide and re-entrant helix prediction, benchmarked with full cross-validation on a novel data set of 131 sequences with known crystal structures. The method achieves topology prediction accuracy of 89%, while signal peptides and re-entrant helices are predicted with 93% and 44% accuracy respectively. An additional SVM trained to discriminate between globular and TM proteins detected zero false positives, with a low false negative rate of 0.4%. We present the results of applying these tools to a number of complete genomes. Source code, data sets and a web server are freely available from http://bioinf.cs.ucl.ac.uk/psipred/. The high accuracy of TM topology prediction which includes detection of both signal peptides and re-entrant helices, combined with the ability to effectively discriminate between TM and globular proteins, make this method ideally suited to whole genome annotation of alpha-helical transmembrane proteins.
Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D
2017-01-04
The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
NASA Technical Reports Server (NTRS)
Lacey, J. C., Jr.; Mullins, D. W., Jr.
1983-01-01
A survey is presented of the literature on the experimental evidence for the genetic code assignments and the chemical reactions involved in the process of protein synthesis. In view of the enormous number of theoretical models that have been advanced to explain the origin of the genetic code, attention is confined to experimental studies. Since genetic coding has significance only within the context of protein synthesis, it is believed that the problem of the origin of the code must be dealt with in terms of the origin of the process of protein synthesis. It is contended that the answers must lie in the nature of the molecules, amino acids and nucleotides, the affinities they might have for one another, and the effect that those affinities must have on the chemical reactions that are related to primitive protein synthesis. The survey establishes that for the bulk of amino acids, there is a direct and significant correlation between the hydrophobicity rank of the amino acids and the hydrophobicity rank of their anticodonic dinucleotides.
Yao, Q; Fischer, K P; Tyrrell, D L; Gutfreund, K S
2015-04-01
Programmed death ligand-1 (PD-L1) plays an important role in the attenuation of adaptive immune responses in higher vertebrates. Here, we describe the identification of the Pekin duck PD-L1 orthologue (duPD-L1) and its gene structure. The duPD-L1 cDNA encodes a 311-amino acid protein that has an amino acid identity of 78% and 42% with chicken and human PD-L1, respectively. Mapping of the duPD-L1 cDNA with duck genomic sequences revealed an exonic structure of its coding sequence similar to those of other vertebrates but lacked a noncoding exon 1. Homology modelling of the duPD-L1 extracellular domain was compatible with the tandem IgV-like and IgC-like IgSF domain structure of human PD-L1 (PDB ID: 3BIS). Residues known to be important for receptor binding of human PD-L1 were mostly conserved in duPD-L1 within the N-terminus and the G sheet, and partially conserved within the F sheet but not within sheets C and C'. DuPD-L1 mRNA was constitutively expressed in all tissues examined with highest expression levels in lung and spleen and very low levels of expression in muscle, kidney and brain. Mitogen stimulation of duck peripheral blood mononuclear cells transiently increased duPD-L1 mRNA expression. Our observations demonstrate evolutionary conservation of the exonic structure of its coding sequence, the extracellular domain structure and residues implicated in receptor binding, but the role of the longer cytoplasmic tail in avian PD-L1 proteins remains to be determined. © 2014 John Wiley & Sons Ltd.
The increasing diversity of functions attributed to the SAFB family of RNA-/DNA-binding proteins.
Norman, Michael; Rivers, Caroline; Lee, Youn-Bok; Idris, Jalilah; Uney, James
2016-12-01
RNA-binding proteins play a central role in cellular metabolism by orchestrating the complex interactions of coding, structural and regulatory RNA species. The SAFB (scaffold attachment factor B) proteins (SAFB1, SAFB2 and SAFB-like transcriptional modulator, SLTM), which are highly conserved evolutionarily, were first identified on the basis of their ability to bind scaffold attachment region DNA elements, but attention has subsequently shifted to their RNA-binding and protein-protein interactions. Initial studies identified the involvement of these proteins in the cellular stress response and other aspects of gene regulation. More recently, the multifunctional capabilities of SAFB proteins have shown that they play crucial roles in DNA repair, processing of mRNA and regulatory RNA, as well as in interaction with chromatin-modifying complexes. With the advent of new techniques for identifying RNA-binding sites, enumeration of individual RNA targets has now begun. This review aims to summarise what is currently known about the functions of SAFB proteins. © 2016 The Author(s).
Huang, Chuen-Der; Lin, Chin-Teng; Pal, Nikhil Ranjan
2003-12-01
The structure classification of proteins plays a very important role in bioinformatics, since the relationships and characteristics among those known proteins can be exploited to predict the structure of new proteins. The success of a classification system depends heavily on two things: the tools being used and the features considered. For the bioinformatics applications, the role of appropriate features has not been paid adequate importance. In this investigation we use three novel ideas for multiclass protein fold classification. First, we use the gating neural network, where each input node is associated with a gate. This network can select important features in an online manner when the learning goes on. At the beginning of the training, all gates are almost closed, i.e., no feature is allowed to enter the network. Through the training, gates corresponding to good features are completely opened while gates corresponding to bad features are closed more tightly, and some gates may be partially open. The second novel idea is to use a hierarchical learning architecture (HLA). The classifier in the first level of HLA classifies the protein features into four major classes: all alpha, all beta, alpha + beta, and alpha/beta. And in the next level we have another set of classifiers, which further classifies the protein features into 27 folds. The third novel idea is to induce the indirect coding features from the amino-acid composition sequence of proteins based on the N-gram concept. This provides us with more representative and discriminative new local features of protein sequences for multiclass protein fold classification. The proposed HLA with new indirect coding features increases the protein fold classification accuracy by about 12%. Moreover, the gating neural network is found to reduce the number of features drastically. Using only half of the original features selected by the gating neural network can reach comparable test accuracy as that using all the original features. The gating mechanism also helps us to get a better insight into the folding process of proteins. For example, tracking the evolution of different gates we can find which characteristics (features) of the data are more important for the folding process. And, of course, it also reduces the computation time.
Primary structure and glycosylation of the S-layer protein of Haloferax volcanii.
Sumper, M; Berg, E; Mengele, R; Strobel, I
1990-01-01
The outer surface of the archaebacterium Haloferax volcanii (formerly named Halobacterium volcanii) is covered with a hexagonally packed surface (S) layer. The gene coding for the S-layer protein was cloned and sequenced. The mature polypeptide is composed of 794 amino acids and is preceded by a typical signal sequence of 34 amino acid residues. A highly hydrophobic stretch of 20 amino acids at the C-terminal end probably serves as a transmembrane domain. Clusters of threonine residues are located adjacent to this membrane anchor. The S-layer protein is a glycoprotein containing both N- and O-glycosidic bonds. Glucosyl-(1----2)-galactose disaccharides are linked to threonine residues. The primary structure and the glycosylation pattern of the S-layer glycoproteins from Haloferax volcanii and from Halobacterium halobium were compared and found to exhibit distinct differences, despite the fact that three-dimensional reconstructions from electron micrographs revealed no structural differences at least to the 2.5-nm level attained so far (M. Kessel, I. Wildhaber, S. Cohe, and W. Baumeister, EMBO J. 7:1549-1554, 1988). Images PMID:2123862
Primary structure and glycosylation of the S-layer protein of Haloferax volcanii.
Sumper, M; Berg, E; Mengele, R; Strobel, I
1990-12-01
The outer surface of the archaebacterium Haloferax volcanii (formerly named Halobacterium volcanii) is covered with a hexagonally packed surface (S) layer. The gene coding for the S-layer protein was cloned and sequenced. The mature polypeptide is composed of 794 amino acids and is preceded by a typical signal sequence of 34 amino acid residues. A highly hydrophobic stretch of 20 amino acids at the C-terminal end probably serves as a transmembrane domain. Clusters of threonine residues are located adjacent to this membrane anchor. The S-layer protein is a glycoprotein containing both N- and O-glycosidic bonds. Glucosyl-(1----2)-galactose disaccharides are linked to threonine residues. The primary structure and the glycosylation pattern of the S-layer glycoproteins from Haloferax volcanii and from Halobacterium halobium were compared and found to exhibit distinct differences, despite the fact that three-dimensional reconstructions from electron micrographs revealed no structural differences at least to the 2.5-nm level attained so far (M. Kessel, I. Wildhaber, S. Cohe, and W. Baumeister, EMBO J. 7:1549-1554, 1988).
Ueno, Yutaka; Ito, Shuntaro; Konagaya, Akihiko
2014-12-01
To better understand the behaviors and structural dynamics of proteins within a cell, novel software tools are being developed that can create molecular animations based on the findings of structural biology. This study proposes our method developed based on our prototypes to detect collisions and examine the soft-body dynamics of molecular models. The code was implemented with a software development toolkit for rigid-body dynamics simulation and a three-dimensional graphics library. The essential functions of the target software system included the basic molecular modeling environment, collision detection in the molecular models, and physical simulations of the movement of the model. Taking advantage of recent software technologies such as physics simulation modules and interpreted scripting language, the functions required for accurate and meaningful molecular animation were implemented efficiently.
PconsD: ultra rapid, accurate model quality assessment for protein structure prediction.
Skwark, Marcin J; Elofsson, Arne
2013-07-15
Clustering methods are often needed for accurately assessing the quality of modeled protein structures. Recent blind evaluation of quality assessment methods in CASP10 showed that there is little difference between many different methods as far as ranking models and selecting best model are concerned. When comparing many models, the computational cost of the model comparison can become significant. Here, we present PconsD, a fast, stream-computing method for distance-driven model quality assessment that runs on consumer hardware. PconsD is at least one order of magnitude faster than other methods of comparable accuracy. The source code for PconsD is freely available at http://d.pcons.net/. Supplementary benchmarking data are also available there. arne@bioinfo.se Supplementary data are available at Bioinformatics online.
Biphasic patterns of diversification and the emergence of modules
Mittenthal, Jay; Caetano-Anollés, Derek; Caetano-Anollés, Gustavo
2012-01-01
The intricate molecular and cellular structure of organisms converts energy to work, which builds and maintains structure. Evolving structure implements modules, in which parts are tightly linked. Each module performs characteristic functions. In this work we propose that a module can emerge through two phases of diversification of parts. Early in the first phase of this biphasic pattern, the parts have weak linkage—they interact weakly and associate variously. The parts diversify and compete. Under selection for performance, interactions among the parts increasingly constrain their structure and associations. As many variants are eliminated, parts self-organize into modules with tight linkage. Linkage may increase in response to exogenous stresses as well as endogenous processes. In the second phase of diversification, variants of the module and its functions evolve and become new parts for a new cycle of generation of higher-level modules. This linkage hypothesis can interpret biphasic patterns in the diversification of protein domain structure, RNA and protein shapes, and networks in metabolism, codes, and embryos, and can explain hierarchical levels of structural organization that are widespread in biology. PMID:22891076
Kim, Yu-Jung; Yoo, Won Gi; Lee, Myoung-Ro; Kang, Jung-Mi; Na, Byoung-Kuk; Cho, Shin-Hyeong; Park, Mi-Yeoun; Ju, Jung-Won
2017-03-04
The tegument, representing the membrane-bound outer surface of platyhelminth parasites, plays an important role for the regulation of the host immune response and parasite survival. A comprehensive understanding of tegumental proteins can provide drug candidates for use against helminth-associated diseases, such as clonorchiasis caused by the liver fluke Clonorchis sinensis . However, little is known regarding the physicochemical properties of C. sinensis teguments. In this study, a novel 20.6-kDa tegumental protein of the C. sinensis adult worm (CsTegu20.6) was identified and characterized by molecular and in silico methods. The complete coding sequence of 525 bp was derived from cDNA clones and encodes a protein of 175 amino acids. Homology search using BLASTX showed CsTegu20.6 identity ranging from 29% to 39% with previously-known tegumental proteins in C. sinensis . Domain analysis indicated the presence of a calcium-binding EF-hand domain containing a basic helix-loop-helix structure and a dynein light chain domain exhibiting a ferredoxin fold. We used a modified method to obtain the accurate tertiary structure of the CsTegu20.6 protein because of the unavailability of appropriate templates. The CsTegu20.6 protein sequence was split into two domains based on the disordered region, and then, the structure of each domain was modeled using I-TASSER. A final full-length structure was obtained by combining two structures and refining the whole structure. A refined CsTegu20.6 structure was used to identify a potential CsTegu20.6 inhibitor based on protein structure-compound interaction analysis. The recombinant proteins were expressed in Escherichia coli and purified by nickel-nitrilotriacetic acid affinity chromatography. In C. sinensis , CsTegu20.6 mRNAs were abundant in adult and metacercariae, but not in the egg. Immunohistochemistry revealed that CsTegu20.6 localized to the surface of the tegument in the adult fluke. Collectively, our results contribute to a better understanding of the structural and functional characteristics of CsTegu20.6 and homologs of flukes. One compound is proposed as a putative inhibitor of CsTegu20.6 to facilitate further studies for anthelmintics.
Zade, Amrutraj; Sengupta, Malavi; Kondabagil, Kiran
2015-01-01
Rab GTPases are the key regulators of intracellular membrane trafficking in eukaryotes. Many viruses and intracellular bacterial pathogens have evolved to hijack the host Rab GTPase functions, mainly through activators and effector proteins, for their benefit. Acanthamoeba polyphaga mimivirus (APMV) is one of the largest viruses and belongs to the monophyletic clade of nucleo-cytoplasmic large DNA viruses (NCLDV). The inner membrane lining is integral to the APMV virion structure. APMV assembly involves extensive host membrane modifications, like vesicle budding and fusion, leading to the formation of a membrane sheet that is incorporated into the virion. Intriguingly, APMV and all group I members of the Mimiviridae family code for a putative Rab GTPase protein. APMV is the first reported virus to code for a Rab GTPase (encoded by R214 gene). Our thorough in silico analysis of the subfamily specific (SF) region of Mimiviridae Rab GTPase sequences suggests that they are related to Rab5, a member of the group II Rab GTPases, of lower eukaryotes. Because of their high divergence from the existing three isoforms, A, B, and C of the Rab5-family, we suggest that Mimiviridae Rabs constitute a new isoform, Rab5D. Phylogenetic analysis indicated probable horizontal acquisition from a lower eukaryotic ancestor followed by selection and divergence. Furthermore, interaction network analysis suggests that vps34 (a Class III PI3K homolog, coded by APMV L615), Atg-8 and dynamin (host proteins) are recruited by APMV Rab GTPase during capsid assembly. Based on these observations, we hypothesize that APMV Rab plays a role in the acquisition of inner membrane during virion assembly.
Ion Channel Conductance Measurements on a Silicon-Based Platform
2006-01-01
calculated using the molecular dynamics code, GROMACS . Reasonable agreement is obtained in the simulated versus measured conductance over the range of...measurements of the lipid giga-seal characteristics have been performed, including AC conductance measurements and statistical analysis in order to...Dynamics kernel self-consistently coupled to Poisson equations using a P3M force field scheme and the GROMACS description of protein structure and
Transfer Learning in Integrated Cognitive Systems
2010-09-01
Psychology Press. 2. Falkenhainer, B ., Forbus, K . D., and Gentner, D. (1989). The Structure-mapping Engine: Algorithm and Examples. Artificial...Kaps, A.; Lemcke, K .; Mannhaupt, G.; Pfeiffer, F.; Schuller, C.; Stocker, S. & Weil, B ., "MIPS: A Database for Genomes and Protein Sequences...NAME OF RESPONSIBLE PERSON Deborah A. Cerino a. REPORT U b . ABSTRACT U c. THIS PAGE U 19b. TELEPHONE NUMBER (Include area code) N/A
Comparative architecture of silks, fibrous proteins and their encoding genes in insects and spiders.
Craig, Catherine L; Riekel, Christian
2002-12-01
The known silk fibroins and fibrous glues are thought to be encoded by members of the same gene family. All silk fibroins sequenced to date contain regions of long-range order (crystalline regions) and/or short-range order (non-crystalline regions). All of the sequenced fibroin silks (Flag or silk from flagelliform gland in spiders; Fhc or heavy chain fibroin silks produced by Lepidoptera larvae) are made up of hierarchically organized, repetitive arrays of amino acids. Fhc fibroin genes are characterized by a similar molecular genetic architecture of two exons and one intron, but the organization and size of these units differs. The Flag, Ser (sericin gene) and BR (Balbiani ring genes; both fibrous proteins) genes are made up of multiple exons and introns. Sequences coding for crystalline and non-crystalline protein domains are integrated in the repetitive regions of Fhc and MA exons, but not in the protein glues Ser1 and BR-1. Genetic 'hot-spots' promote recombination errors in Fhc, MA, and Flag. Codon bias, structural constraint, point mutations, and shortened coding arrays may be alternative means of stabilizing precursor mRNA transcripts. Differential regulation of gene expression and selective splicing of the mRNA transcript may allow rapid adaptation of silk functional properties to different physical environments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jufeng; Wang, Zhanli; Wei, Fang
2007-08-17
Herpes simplex virus type-1 thymidine kinase (HSV-1TK) and Escherichia coli cytosine deaminase (CD) fusion protein was designed using InsightII software. The structural rationality of the fusion proteins incorporating a series of flexible linker peptide was analyzed, and a suitable linker peptide was chosen for further investigated. The recombinant plasmid containing the coding regions of HSV-1TK and CD cDNA connected by this linker peptide coding sequence was generated and subsequently transfected into the human embryonic kidney 293 cells (HEK293). The Western blotting indicated that the recombinant fusion protein existed as a dimer with a molecular weight of approximately 90 kDa. Themore » toxicity of the prodrug on the recombinant plasmid-transfected human lung cancer cell line NCIH460 was evaluated, which showed that TKglyCD-expressing cells conferred upon cells prodrug sensitivities equivalent to that observed for each enzyme independently. Most noteworthy, cytotoxicity could be enhanced by concurrently treating TKglyCD-expressing cells with prodrugs GCV and 5-FC. The results indicate that we have successfully constructed a HSV-1TKglyCD fusion gene which might have a potential application for cancer gene therapy.« less
Feldhoff, A; Wetzel, T; Peters, D; Kellner, R; Krczal, G
1998-01-01
With the introduction of cutting-grown Petunia x hybrida plants on the European market, a new potyvirus which showed no serological reaction with antisera against any other potyviruses infecting petunias was discovered. Infected leaves contained flexuous rod-shaped virus particles of 750-800 nm in length and inclusion bodies (pinwheel structures) typical for potyviruses in ultrathin leaf sections. The purified coat protein with a Mr of approximately 36 kDa could be detected in Western immunoblots with a specific antibody to the coat protein of the petunia-infecting virus. The 3' end of the viral genome encompassing the 3' non-coding region, the coat protein gene, and part of the NIb gene was amplified from infected leaf material by IC/PCR using degenerate and specific primers. Sequences of PCR-generated cDNA clones were compared to other known sequences of potyviruses. Maximum homology of 56% was found in the 3' non-coding region between the petunia isolate and other potyviruses. A maximum homology of 69% was found between the amino acid sequence of the coat protein of the petunia isolate and corresponding sequences of other potyviruses. These data indicate that the petunia-infecting virus is a previously undescribed potyvirus and the name petunia flower mottle virus (PetFMV) is suggested.
Mathematical fundamentals for the noise immunity of the genetic code.
Fimmel, Elena; Strüngmann, Lutz
2018-02-01
Symmetry is one of the essential and most visible patterns that can be seen in nature. Starting from the left-right symmetry of the human body, all types of symmetry can be found in crystals, plants, animals and nature as a whole. Similarly, principals of symmetry are also some of the fundamental and most useful tools in modern mathematical natural science that play a major role in theory and applications. As a consequence, it is not surprising that the desire to understand the origin of life, based on the genetic code, forces us to involve symmetry as a mathematical concept. The genetic code can be seen as a key to biological self-organisation. All living organisms have the same molecular bases - an alphabet consisting of four letters (nitrogenous bases): adenine, cytosine, guanine, and thymine. Linearly ordered sequences of these bases contain the genetic information for synthesis of proteins in all forms of life. Thus, one of the most fascinating riddles of nature is to explain why the genetic code is as it is. Genetic coding possesses noise immunity which is the fundamental feature that allows to pass on the genetic information from parents to their descendants. Hence, since the time of the discovery of the genetic code, scientists have tried to explain the noise immunity of the genetic information. In this chapter we will discuss recent results in mathematical modelling of the genetic code with respect to noise immunity, in particular error-detection and error-correction. We will focus on two central properties: Degeneracy and frameshift correction. Different amino acids are encoded by different quantities of codons and a connection between this degeneracy and the noise immunity of genetic information is a long standing hypothesis. Biological implications of the degeneracy have been intensively studied and whether the natural code is a frozen accident or a highly optimised product of evolution is still controversially discussed. Symmetries in the structure of degeneracy of the genetic code are essential and give evidence of substantial advantages of the natural code over other possible ones. In the present chapter we will present a recent approach to explain the degeneracy of the genetic code by algorithmic methods from bioinformatics, and discuss its biological consequences. The biologists recognised this problem immediately after the detection of the non-overlapping structure of the genetic code, i.e., coding sequences are to be read in a unique way determined by their reading frame. But how does the reading head of the ribosome recognises an error in the grouping of codons, caused by e.g. insertion or deletion of a base, that can be fatal during the translation process and may result in nonfunctional proteins? In this chapter we will discuss possible solutions to the frameshift problem with a focus on the theory of so-called circular codes that were discovered in large gene populations of prokaryotes and eukaryotes in the early 90s. Circular codes allow to detect a frameshift of one or two positions and recently a beautiful theory of such codes has been developed using statistics, group theory and graph theory. Copyright © 2017 Elsevier B.V. All rights reserved.
Arieti, Fabiana; Gabus, Caroline; Tambalo, Margherita; Huet, Tiphaine; Round, Adam; Thore, Stéphane
2014-06-01
The Split Ends (SPEN) protein was originally discovered in Drosophila in the late 1990s. Since then, homologous proteins have been identified in eukaryotic species ranging from plants to humans. Every family member contains three predicted RNA recognition motifs (RRMs) in the N-terminal region of the protein. We have determined the crystal structure of the region of the human SPEN homolog that contains these RRMs-the SMRT/HDAC1 Associated Repressor Protein (SHARP), at 2.0 Å resolution. SHARP is a co-regulator of the nuclear receptors. We demonstrate that two of the three RRMs, namely RRM3 and RRM4, interact via a highly conserved interface. Furthermore, we show that the RRM3-RRM4 block is the main platform mediating the stable association with the H12-H13 substructure found in the steroid receptor RNA activator (SRA), a long, non-coding RNA previously shown to play a crucial role in nuclear receptor transcriptional regulation. We determine that SHARP association with SRA relies on both single- and double-stranded RNA sequences. The crystal structure of the SHARP-RRM fragment, together with the associated RNA-binding studies, extend the repertoire of nucleic acid binding properties of RRM domains suggesting a new hypothesis for a better understanding of SPEN protein functions. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Staufen1 senses overall transcript secondary structure to regulate translation
Ricci, Emiliano P; Kucukural, Alper; Cenik, Can; Mercier, Blandine C; Singh, Guramrit; Heyer, Erin E; Ashar-Patel, Ami; Peng, Lingtao; Moore, Melissa J
2015-01-01
Human Staufen1 (Stau1) is a double-stranded RNA (dsRNA)-binding protein implicated in multiple post-transcriptional gene-regulatory processes. Here we combined RNA immunoprecipitation in tandem (RIPiT) with RNase footprinting, formaldehyde cross-linking, sonication-mediated RNA fragmentation and deep sequencing to map Staufen1-binding sites transcriptome wide. We find that Stau1 binds complex secondary structures containing multiple short helices, many of which are formed by inverted Alu elements in annotated 3′ untranslated regions (UTRs) or in ‘strongly distal’ 3′ UTRs. Stau1 also interacts with actively translating ribosomes and with mRNA coding sequences (CDSs) and 3′ UTRs in proportion to their GC content and propensity to form internal secondary structure. On mRNAs with high CDS GC content, higher Stau1 levels lead to greater ribosome densities, thus suggesting a general role for Stau1 in modulating translation elongation through structured CDS regions. Our results also indicate that Stau1 regulates translation of transcription-regulatory proteins. PMID:24336223
Standing your Ground to Exoribonucleases: Function of Flavivirus Long Non-coding RNAs
Charley, Phillida A.; Wilusz, Jeffrey
2015-01-01
Members of the Flaviviridae (e.g. Dengue virus, West Nile virus, and Hepatitis C virus) contain a positive-sense RNA genome that encodes a large polyprotein. It is now also clear most if not all of these viruses also produce an abundant subgenomic long non-coding RNA. These non-coding RNAs, which are called subgenomicflavivirus RNAs (sfRNAs) or Xrn1-resistant RNAs (xrRNAs), are stable decay intermediates generated from the viral genomic RNA through the stalling of the cellular exoribonuclease Xrn1 at highly structured regions. Several functions of these flavivirus long non-coding RNAs have been revealed in recent years. The generation of these sfRNAs/xrRNAs from viral transcripts results in the repression of Xrn1 and the dysregulation of cellular mRNA stability. The abundant sfRNAs also serve directly as a decoy for important cellular protein regulators of the interferon and RNA interference antiviral pathways. Thus the generation of long non-coding RNAs from flaviviruses, hepaciviruses and pestiviruses likely disrupts aspects of innate immunity and may directly contribute to viral replication, cytopathology and pathogenesis. PMID:26368052
RNA Nuclear Export: From Neurological Disorders to Cancer.
Hautbergue, Guillaume M
2017-01-01
The presence of a nuclear envelope, also known as nuclear membrane, defines the structural framework of all eukaryotic cells by separating the nucleus, which contains the genetic material, from the cytoplasm where the synthesis of proteins takes place. Translation of proteins in Eukaryotes is thus dependent on the active transport of DNA-encoded RNA molecules through pores embedded within the nuclear membrane. Several mechanisms are involved in this process generally referred to as RNA nuclear export or nucleocytoplasmic transport of RNA. The regulated expression of genes requires the nuclear export of protein-coding messenger RNA molecules (mRNAs) as well as non-coding RNAs (ncRNAs) together with proteins and pre-assembled ribosomal subunits. The nuclear export of mRNAs is intrinsically linked to the co-transcriptional processing of nascent transcripts synthesized by the RNA polymerase II. This functional coupling is essential for the survival of cells allowing for timely nuclear export of fully processed transcripts, which could otherwise cause the translation of abnormal proteins such as the polymeric repeat proteins produced in some neurodegenerative diseases. Alterations of the mRNA nuclear export pathways can also lead to genome instability and to various forms of cancer. This chapter will describe the molecular mechanisms driving the nuclear export of RNAs with a particular emphasis on mRNAs. It will also review their known alterations in neurological disorders and cancer, and the recent opportunities they offer for the potential development of novel therapeutic strategies.
SSMART: Sequence-structure motif identification for RNA-binding proteins.
Munteanu, Alina; Mukherjee, Neelanjan; Ohler, Uwe
2018-06-11
RNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized. We developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3'UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP. Availability: SSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/. Supplementary data are available at Bioinformatics online.
Termination and read-through proteins encoded by genome segment 9 of Colorado tick fever virus.
Mohd Jaafar, Fauziah; Attoui, Houssam; De Micco, Philippe; De Lamballerie, Xavier
2004-08-01
Genome segment 9 (Seg-9) of Colorado tick fever virus (CTFV) is 1884 bp long and contains a large open reading frame (ORF; 1845 nt in length overall), although a single in-frame stop codon (at nt 1052-1054) reduces the ORF coding capacity by approximately 40 %. However, analyses of highly conserved RNA sequences in the vicinity of the stop codon indicate that it belongs to a class of 'leaky terminators'. The third nucleotide positions in codons situated both before and after the stop codon, shows the highest variability, suggesting that both regions are translated during virus replication. This also suggests that the stop signal is functionally leaky, allowing read-through translation to occur. Indeed, both the truncated 'termination' protein and the full-length 'read-through' protein (VP9 and VP9', respectively) were detected in CTFV-infected cells, in cells transfected with a plasmid expressing only Seg-9 protein products, and in the in vitro translation products from undenatured Seg-9 ssRNA. The ratios of full-length and truncated proteins generated suggest that read-through may be down-regulated by other viral proteins. Western blot analysis of infected cells and purified CTFV showed that VP9 is a structural component of the virion, while VP9' is a non-structural protein.
Babina, Arianne M; Parker, Darren J; Li, Gene-Wei; Meyer, Michelle M
2018-06-20
In many bacteria, ribosomal proteins autogenously repress their own expression by interacting with RNA structures typically located in the 5'-UTRs of their mRNA transcripts. This regulation is necessary to maintain a balance between ribosomal proteins and rRNA to ensure proper ribosome production. Despite advances in non-coding RNA discovery and validation of RNA-protein regulatory interactions, the selective pressures that govern the formation and maintenance of such RNA cis-regulators in the context of an organism remain largely undetermined. To examine the impact disruptions to this regulation have on bacterial fitness, we introduced point mutations that abolish ribosomal protein binding and regulation into the RNA structure that controls expression of ribosomal proteins L20 and L35 within the Bacillus subtilis genome. Our studies indicate that removing this regulation results in reduced log phase growth, improper rRNA maturation, and the accumulation of a kinetically trapped or mis-assembled ribosomal particle at low temperatures, suggesting defects in ribosome synthesis. Such work emphasizes the important role regulatory RNAs play in the stoichiometric production of ribosomal components for proper ribosome composition and overall organism viability and reinforces the potential of targeting ribosomal protein production and ribosome assembly with novel antimicrobials. Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Li, Na; Yan, Yunhuan; Zhang, Angke; Gao, Jiming; Zhang, Chong; Wang, Xue; Hou, Gaopeng; Zhang, Gaiping; Jia, Jinbu; Zhou, En-Min; Xiao, Shuqi
2016-12-13
Many viruses encode microRNAs (miRNAs) that are small non-coding single-stranded RNAs which play critical roles in virus-host interactions. Porcine reproductive and respiratory syndrome virus (PRRSV) is one of the most economically impactful viruses in the swine industry. The present study sought to determine whether PRRSV encodes miRNAs that could regulate PRRSV replication. Four viral small RNAs (vsRNAs) were mapped to the stem-loop structures in the ORF1a, ORF1b and GP2a regions of the PRRSV genome by bioinformatics prediction and experimental verification. Of these, the structures with the lowest minimum free energy (MFE) values predicted for PRRSV-vsRNA1 corresponded to typical stem-loop, hairpin structures. Inhibition of PRRSV-vsRNA1 function led to significant increases in viral replication. Transfection with PRRSV-vsRNA1 mimics significantly inhibited PRRSV replication in primary porcine alveolar macrophages (PAMs). The time-dependent increase in the abundance of PRRSV-vsRNA1 mirrored the gradual upregulation of PRRSV RNA expression. Knockdown of proteins associated with cellular miRNA biogenesis demonstrated that Drosha and Argonaute (Ago2) are involved in PRRSV-vsRNA1 biogenesis. Moreover, PRRSV-vsRNA1 bound specifically to the nonstructural protein 2 (NSP2)-coding sequence of PRRSV genome RNA. Collectively, the results reveal that PRRSV encodes a functional PRRSV-vsRNA1 which auto-regulates PRRSV replication by directly targeting and suppressing viral NSP2 gene expression. These findings not only provide new insights into the mechanism of the pathogenesis of PRRSV, but also explore a potential avenue for controlling PRRSV infection using viral small RNAs.
Narayanan, Ajit; Chen, Yi; Pang, Shaoning; Tao, Ban
2013-01-01
The continuous growth of malware presents a problem for internet computing due to increasingly sophisticated techniques for disguising malicious code through mutation and the time required to identify signatures for use by antiviral software systems (AVS). Malware modelling has focused primarily on semantics due to the intended actions and behaviours of viral and worm code. The aim of this paper is to evaluate a static structure approach to malware modelling using the growing malware signature databases now available. We show that, if malware signatures are represented as artificial protein sequences, it is possible to apply standard sequence alignment techniques in bioinformatics to improve accuracy of distinguishing between worm and virus signatures. Moreover, aligned signature sequences can be mined through traditional data mining techniques to extract metasignatures that help to distinguish between viral and worm signatures. All bioinformatics and data mining analysis were performed on publicly available tools and Weka.
The Effects of Different Representations on Static Structure Analysis of Computer Malware Signatures
Narayanan, Ajit; Chen, Yi; Pang, Shaoning; Tao, Ban
2013-01-01
The continuous growth of malware presents a problem for internet computing due to increasingly sophisticated techniques for disguising malicious code through mutation and the time required to identify signatures for use by antiviral software systems (AVS). Malware modelling has focused primarily on semantics due to the intended actions and behaviours of viral and worm code. The aim of this paper is to evaluate a static structure approach to malware modelling using the growing malware signature databases now available. We show that, if malware signatures are represented as artificial protein sequences, it is possible to apply standard sequence alignment techniques in bioinformatics to improve accuracy of distinguishing between worm and virus signatures. Moreover, aligned signature sequences can be mined through traditional data mining techniques to extract metasignatures that help to distinguish between viral and worm signatures. All bioinformatics and data mining analysis were performed on publicly available tools and Weka. PMID:23983644
Delcourt, Vivian; Lucier, Jean-François; Gagnon, Jules; Beaudoin, Maxime C; Vanderperre, Benoît; Breton, Marc-André; Motard, Julie; Jacques, Jean-François; Brunelle, Mylène; Gagnon-Arsenault, Isabelle; Fournier, Isabelle; Ouangraoua, Aida; Hunting, Darel J; Cohen, Alan A; Landry, Christian R; Scott, Michelle S
2017-01-01
Recent functional, proteomic and ribosome profiling studies in eukaryotes have concurrently demonstrated the translation of alternative open-reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by these altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and contain functional domains. Evolutionary analyses indicate that altORFs often show more extreme conservation patterns than their CDSs. Thousands of alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many genes are multicoding genes and code for a large protein and one or several small proteins. PMID:29083303
Kumamaru, Toshihiro; Uemura, Yuji; Inoue, Yoshimi; Takemoto, Yoko; Siddiqui, Sadar Uddin; Ogawa, Masahiro; Hara-Nishimura, Ikuko; Satoh, Hikaru
2010-01-01
To identify the function of genes that regulate the processing of proglutelin, we performed an analysis of glup3 mutants, which accumulates excess amounts of proglutelin and lack the vacuolar processing enzyme (VPE). VPE activity in developing seeds from glup3 lines was reduced remarkably compared with the wild type. DNA sequencing of the VPE gene in glup3 mutants revealed either amino acid substitutions or the appearance of a stop codon within the coding region. Microscopic observations showed that alpha-globulin and proglutelin were distributed homogeneously within glup3 protein storage vacuoles (PSVs), and that glup3 PSVs lacked the crystalline lattice structure typical of wild-type PSVs. This suggests that the processing of proglutelin by VPE in rice is essential for proper PSV structure and compartmentalization of storage proteins. Growth retardation in glup3 seedlings was also observed, indicating that the processing of proglutelin influences early seedling development. These findings indicate that storage of glutelin in its mature form as a crystalline structure in PSVs is required for the rapid use of glutelin as a source of amino acids during early seedling development. In conclusion, VPE plays an important role in the formation of protein crystalline structures in PSVs.
López-Carrasco, Amparo; Flores, Ricardo
2017-07-01
Avocado sunblotch viroid (ASBVd), the type member of the family Avsunviroidae, replicates and accumulates in chloroplasts. Whether this minimal non-protein-coding circular RNA of 246-250 nt exists in vivo as a free nucleic acid or closely associated with host proteins remains unknown. To tackle this issue, the secondary structures of the monomeric circular (mc) (+) and (-) strands of ASBVd have been examined in silico by searching those of minimal free energy, and in vitro at single-nucleotide resolution by selective 2'-hydroxyl acylation analysed by primer extension (SHAPE). Both approaches resulted in predominant rod-like secondary structures without tertiary interactions, with the mc (+) RNA being more compact than its (-) counterpart as revealed by non-denaturing polyacryamide gel electrophoresis. Moreover, in vivo SHAPE showed that the mc ASBVd (+) form accumulates in avocado leaves as a free RNA adopting a similar rod-shaped conformation unprotected by tightly bound host proteins. Hence, the mc ASBVd (+) RNA behaves in planta like the previously studied mc (+) RNA of potato spindle tuber viroid, the type member of nuclear viroids (family Pospiviroidae), indicating that two different viroids replicating and accumulating in distinct subcellular compartments, have converged into a common structural solution. Circularity and compact secondary structures confer to these RNAs, and probably to all viroids, the intrinsic stability needed to survive in their natural habitats. However, in vivo SHAPE has not revealed the (possibly transient or loose) interactions of the mc ASBVd (+) RNA with two host proteins observed previously by UV irradiation of infected avocado leaves.
Codon usage affects the structure and function of the Drosophila circadian clock protein PERIOD.
Fu, Jingjing; Murphy, Katherine A; Zhou, Mian; Li, Ying H; Lam, Vu H; Tabuloc, Christine A; Chiu, Joanna C; Liu, Yi
2016-08-01
Codon usage bias is a universal feature of all genomes, but its in vivo biological functions in animal systems are not clear. To investigate the in vivo role of codon usage in animals, we took advantage of the sensitivity and robustness of the Drosophila circadian system. By codon-optimizing parts of Drosophila period (dper), a core clock gene that encodes a critical component of the circadian oscillator, we showed that dper codon usage is important for circadian clock function. Codon optimization of dper resulted in conformational changes of the dPER protein, altered dPER phosphorylation profile and stability, and impaired dPER function in the circadian negative feedback loop, which manifests into changes in molecular rhythmicity and abnormal circadian behavioral output. This study provides an in vivo example that demonstrates the role of codon usage in determining protein structure and function in an animal system. These results suggest a universal mechanism in eukaryotes that uses a codon usage "code" within genetic codons to regulate cotranslational protein folding. © 2016 Fu et al.; Published by Cold Spring Harbor Laboratory Press.
EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation
Amidi, Afshine; Megalooikonomou, Vasileios; Paragios, Nikos
2018-01-01
During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank (PDB) has increased more than 15-fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence, however, is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The two-layer architecture was investigated on a large dataset of 63,558 enzymes from the PDB and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet. PMID:29740518
EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation.
Amidi, Afshine; Amidi, Shervine; Vlachakis, Dimitrios; Megalooikonomou, Vasileios; Paragios, Nikos; Zacharaki, Evangelia I
2018-01-01
During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank (PDB) has increased more than 15-fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence, however, is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The two-layer architecture was investigated on a large dataset of 63,558 enzymes from the PDB and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet.
NASA Astrophysics Data System (ADS)
Weigt, Martin
Over the last years, biological research has been revolutionized by experimental high-throughput techniques, in particular by next-generation sequencing technology. Unprecedented amounts of data are accumulating, and there is a growing request for computational methods unveiling the information hidden in raw data, thereby increasing our understanding of complex biological systems. Statistical-physics models based on the maximum-entropy principle have, in the last few years, played an important role in this context. To give a specific example, proteins and many non-coding RNA show a remarkable degree of structural and functional conservation in the course of evolution, despite a large variability in amino acid sequences. We have developed a statistical-mechanics inspired inference approach - called Direct-Coupling Analysis - to link this sequence variability (easy to observe in sequence alignments, which are available in public sequence databases) to bio-molecular structure and function. In my presentation I will show, how this methodology can be used (i) to infer contacts between residues and thus to guide tertiary and quaternary protein structure prediction and RNA structure prediction, (ii) to discriminate interacting from non-interacting protein families, and thus to infer conserved protein-protein interaction networks, and (iii) to reconstruct mutational landscapes and thus to predict the phenotypic effect of mutations. References [1] M. Figliuzzi, H. Jacquier, A. Schug, O. Tenaillon and M. Weigt ''Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1'', Mol. Biol. Evol. (2015), doi: 10.1093/molbev/msv211 [2] E. De Leonardis, B. Lutz, S. Ratz, S. Cocco, R. Monasson, A. Schug, M. Weigt ''Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction'', Nucleic Acids Research (2015), doi: 10.1093/nar/gkv932 [3] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. Marks, C. Sander, R. Zecchina, J.N. Onuchic, T. Hwa, M. Weigt, ''Direct-coupling analysis of residue co-evolution captures native contacts across many protein families'', Proc. Natl. Acad. Sci. 108, E1293-E1301 (2011).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ploos van Amstel, H.; Reitsma, P.H.; van der Logt, C.P.
The human protein S locus on chromosome 3 consists of two protein S genes, PS{alpha} and PS{beta}. Here the authors report the cloning and characterization of both genes. Fifteen exons of the PS{alpha} gene were identified that together code for protein S mRNA as derived from the reported protein S cDNAs. Analysis by primer extension of liver protein S mRNA, however, reveals the presence of two mRNA forms that differ in the length of their 5{prime}-noncoding region. Both transcripts contain a 5{prime}-noncoding region longer than found in the protein S cDNAs. The two products may arise from alternative splicing ofmore » an additional intron in this region or from the usage of two start sites for transcription. The intron-exon organization of the PS{alpha} gene fully supports the hypothesis that the protein S gene is the product of an evolutional assembling process in which gene modules coding for structural/functional protein units also found in other coagulation proteins have been put upstream of the ancestral gene of a steroid hormone binding protein. The PS{beta} gene is identified as a pseudogene. It contains a large variety of detrimental aberrations, viz., the absence of exon I, a splice site mutation, three stop codons, and a frame shift mutation. Overall the two genes PS{alpha} and PS{beta} show between their exonic sequences 96.5% homology. Southern analysis of primate DNA showed that the duplication of the ancestral protein S gene has occurred after the branching of the orangutan from the African apes. A nonsense mutation that is present in the pseudogene of man also could be identified in one of the two protein S genes of both chimpanzee and gorilla. This implicates that silencing of one of the two protein S genes must have taken place before the divergence of the three African apes.« less
Effect of poly and mono-unsaturated fatty acids on stability and structure of recombinant S100A8/A9.
Asghari, Hamideh; Chegini, Koorosh Goodarzvand; Amini, Abbas; Gheibi, Nematollah
2016-03-01
Recombinant pET 15b vectors containing the coding sequences S100A8 and S100A9 are expressed in Escherichia coli BL21 (DE3) and purified using Ni-NTA affinity chromatography. The structural changes of S100A8/A9 complex are analyzed upon interaction with poly/mono-unsaturated fatty acids (UFAs). The thermodynamic values, Gibbs free energy and the protein melting point, are obtained through thermal denaturation of protein both with and without UFAs by thermal scanning of protein emission using the fluorescence spectroscopy technique. The far-ultraviolet circular dichroism spectra show that all studied unsaturated fatty acids, including arachidonic, linoleic, alpha-linolenic and oleic acids, induce changes in the secondary structure of S100A8/A9 by reducing the α-helix and β-sheet structures. The tertiary structure of S100A8/A9 has fluctuations in the fluorescence emission spectra after the incubation of protein with UFAs. The blue-shift of emission maximum wavelength and the increase in fluorescence intensity of anilino naphthalene-8-sulfonic acid confirm that the partial unfolding is caused by the conformational changes in the tertiary structure in the presence of UFAs. The structural changes in S100A8/A9 and its lower stability in the presence of UFAs may be necessary for S100A8/A9 to play a biological role in the inflammatory milieu. Copyright © 2015 Elsevier B.V. All rights reserved.
The human fatty acid-binding protein family: Evolutionary divergences and functions
2011-01-01
Fatty acid-binding proteins (FABPs) are members of the intracellular lipid-binding protein (iLBP) family and are involved in reversibly binding intracellular hydrophobic ligands and trafficking them throughout cellular compartments, including the peroxisomes, mitochondria, endoplasmic reticulum and nucleus. FABPs are small, structurally conserved cytosolic proteins consisting of a water-filled, interior-binding pocket surrounded by ten anti-parallel beta sheets, forming a beta barrel. At the superior surface, two alpha-helices cap the pocket and are thought to regulate binding. FABPs have broad specificity, including the ability to bind long-chain (C16-C20) fatty acids, eicosanoids, bile salts and peroxisome proliferators. FABPs demonstrate strong evolutionary conservation and are present in a spectrum of species including Drosophila melanogaster, Caenorhabditis elegans, mouse and human. The human genome consists of nine putatively functional protein-coding FABP genes. The most recently identified family member, FABP12, has been less studied. PMID:21504868
Turning limited experimental information into 3D models of RNA.
Flores, Samuel Coulbourn; Altman, Russ B
2010-09-01
Our understanding of RNA functions in the cell is evolving rapidly. As for proteins, the detailed three-dimensional (3D) structure of RNA is often key to understanding its function. Although crystallography and nuclear magnetic resonance (NMR) can determine the atomic coordinates of some RNA structures, many 3D structures present technical challenges that make these methods difficult to apply. The great flexibility of RNA, its charged backbone, dearth of specific surface features, and propensity for kinetic traps all conspire with its long folding time, to challenge in silico methods for physics-based folding. On the other hand, base-pairing interactions (either in runs to form helices or isolated tertiary contacts) and motifs are often available from relatively low-cost experiments or informatics analyses. We present RNABuilder, a novel code that uses internal coordinate mechanics to satisfy user-specified base pairing and steric forces under chemical constraints. The code recapitulates the topology and characteristic L-shape of tRNA and obtains an accurate noncrystallographic structure of the Tetrahymena ribozyme P4/P6 domain. The algorithm scales nearly linearly with molecule size, opening the door to the modeling of significantly larger structures.
Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E
2013-08-15
Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.
Deciphering functional glycosaminoglycan motifs in development.
Townley, Robert A; Bülow, Hannes E
2018-03-23
Glycosaminoglycans (GAGs) such as heparan sulfate, chondroitin/dermatan sulfate, and keratan sulfate are linear glycans, which when attached to protein backbones form proteoglycans. GAGs are essential components of the extracellular space in metazoans. Extensive modifications of the glycans such as sulfation, deacetylation and epimerization create structural GAG motifs. These motifs regulate protein-protein interactions and are thereby repsonsible for many of the essential functions of GAGs. This review focusses on recent genetic approaches to characterize GAG motifs and their function in defined signaling pathways during development. We discuss a coding approach for GAGs that would enable computational analyses of GAG sequences such as alignments and the computation of position weight matrices to describe GAG motifs. Copyright © 2018 Elsevier Ltd. All rights reserved.
Kekenes-Huskey, P. M.; Gillette, A.; Hake, J.; McCammon, J. A.
2012-01-01
We introduce a computational pipeline and suite of software tools for the approximation of diffusion-limited binding based on a recently developed theoretical framework. Our approach handles molecular geometries generated from high-resolution structural data and can account for active sites buried within the protein or behind gating mechanisms. Using tools from the FEniCS library and the APBS solver, we implement a numerical code for our method and study two Ca2+-binding proteins: Troponin C and the Sarcoplasmic Reticulum Ca2+ ATPase (SERCA). We find that a combination of diffusional encounter and internal ‘buried channel’ descriptions provide superior descriptions of association rates, improving estimates by orders of magnitude. PMID:23293662
Kekenes-Huskey, P M; Gillette, A; Hake, J; McCammon, J A
2012-10-31
We introduce a computational pipeline and suite of software tools for the approximation of diffusion-limited binding based on a recently developed theoretical framework. Our approach handles molecular geometries generated from high-resolution structural data and can account for active sites buried within the protein or behind gating mechanisms. Using tools from the FEniCS library and the APBS solver, we implement a numerical code for our method and study two Ca(2+)-binding proteins: Troponin C and the Sarcoplasmic Reticulum Ca(2+) ATPase (SERCA). We find that a combination of diffusional encounter and internal 'buried channel' descriptions provide superior descriptions of association rates, improving estimates by orders of magnitude.
NASA Astrophysics Data System (ADS)
Kekenes-Huskey, P. M.; Gillette, A.; Hake, J.; McCammon, J. A.
2012-01-01
We introduce a computational pipeline and suite of software tools for the approximation of diffusion-limited binding based on a recently developed theoretical framework. Our approach handles molecular geometries generated from high-resolution structural data and can account for active sites buried within the protein or behind gating mechanisms. Using tools from the FEniCS library and the APBS solver, we implement a numerical code for our method and study two Ca2+-binding proteins: troponin C and the sarcoplasmic reticulum Ca2+ ATPase. We find that a combination of diffusional encounter and internal ‘buried channel’ descriptions provides superior descriptions of association rates, improving estimates by orders of magnitude.
Comprehensive comparison of two protein family of P-ATPases (13A1 and 13A3) in insects.
Seddigh, Samin
2017-06-01
The P-type ATPases (P-ATPases) are present in all living cells where they mediate ion transport across membranes on the expense of ATP hydrolysis. Different ions which are transported by these pumps are protons like calcium, sodium, potassium, and heavy metals such as manganese, iron, copper, and zinc. Maintenance of the proper gradients for essential ions across cellular membranes makes P-ATPases crucial for cell survival. In this study, characterization of two families of P-ATPases including P-ATPase 13A1 and P-ATPase 13A3 protein was compared in two different insect species from different orders. According to the conserved motifs found with MEME, nine motifs were shared by insects of 13A1 family but eight in 13A3 family. Seven different insect species from 13A1 and five samples from 13A3 family were selected as the representative samples for functional and structural analyses. The structural and functional analyses were performed with ProtParam, SOPMA, SignalP 4.1, TMHMM 2.0, ProtScale and ProDom tools in the ExPASy database. The tertiary structure of Bombus terrestris as a sample of each family of insects were predicted by the Phyre2 and TM-score servers and their similarities were verified by SuperPose server. The tertiary structures were predicted via the "c3b9bA" model (PDB Accession Code: 3B9B) in P-ATPase 13A1 family and "c2zxeA" model (PDB Accession Code: 2ZXE) in P-ATPase 13A3 family. A phylogenetic tree was constructed with MEGA 6.06 software using the Neighbor-joining method. According to the results, there was a high identity of P-ATPase families so that they should be derived from a common ancestor however they belonged to separate groups. In protein-protein interaction analysis by STRING 10.0, six common enriched pathways of KEGG were identified in B. terrestris in both families. The obtained data provide a background for bioinformatic studies of the function and evolution of other insects and organisms. Copyright © 2017 Elsevier Ltd. All rights reserved.
Plaga, W; Lottspeich, F; Oesterhelt, D
1992-04-01
An improved purification procedure, including nickel chelate affinity chromatography, is reported which resulted in a crystallizable pyruvate:ferredoxin oxidoreductase preparation from Halobacterium halobium. Crystals of the enzyme were obtained using potassium citrate as the precipitant. The genes coding for pyruvate:ferredoxin oxidoreductase were cloned and their nucleotide sequences determined. The genes of both subunits were adjacent to one another on the halobacterial genome. The derived amino acid sequences were confirmed by partial primary structure analysis of the purified protein. The structural motif of thiamin-diphosphate-binding enzymes was unequivocally located in the deduced amino acid sequence of the small subunit.
Parsy, Christophe; Alexandre, François-René; Brandt, Guillaume; Caillet, Catherine; Cappelle, Sylvie; Chaves, Dominique; Convard, Thierry; Derock, Michel; Gloux, Damien; Griffon, Yann; Lallos, Lisa; Leroy, Frédéric; Liuzzi, Michel; Loi, Anna-Giulia; Moulat, Laure; Musiu, Chiara; Rahali, Houcine; Roques, Virginie; Seifer, Maria; Standring, David; Surleraux, Dominique
2014-09-15
Structural homology between thrombin inhibitors and the early tetrapeptide HCV protease inhibitor led to the bioisosteric replacement of the P2 proline by a 2,4-disubstituted azetidine within the macrocyclic β-strand mimic. Molecular modeling guided the design of the series. This approach was validated by the excellent activity and selectivity in biochemical and cell based assays of this novel series and confirmed by the co-crystal structure of the inhibitor with the NS3/4A protein (PDB code: 4TYD). Copyright © 2014 Elsevier Ltd. All rights reserved.
Complete Mitochondrial Genome of Eruca sativa Mill. (Garden Rocket)
Yang, Qing; Chang, Shengxin; Chen, Jianmei; Hu, Maolong; Guan, Rongzhan
2014-01-01
Eruca sativa (Cruciferae family) is an ancient crop of great economic and agronomic importance. Here, the complete mitochondrial genome of Eruca sativa was sequenced and annotated. The circular molecule is 247 696 bp long, with a G+C content of 45.07%, containing 33 protein-coding genes, three rRNA genes, and 18 tRNA genes. The Eruca sativa mitochondrial genome may be divided into six master circles and four subgenomic molecules via three pairwise large repeats, resulting in a more dynamic structure of the Eruca sativa mtDNA compared with other cruciferous mitotypes. Comparison with the Brassica napus MtDNA revealed that most of the genes with known function are conserved between these two mitotypes except for the ccmFN2 and rrn18 genes, and 27 point mutations were scattered in the 14 protein-coding genes. Evolutionary relationships analysis suggested that Eruca sativa is more closely related to the Brassica species and to Raphanus sativus than to Arabidopsis thaliana. PMID:25157569
Akhter, Yusuf; Ehebauer, Matthias T; Mukhopadhyay, Sangita; Hasnain, Seyed E
2012-01-01
The PE/PPE multigene family codes for approximately 10% of the Mycobacterium tuberculosis proteome and is encoded by 176 open reading frames. These proteins possess, and have been named after, the conserved proline-glutamate (PE) or proline-proline-glutamate (PPE) motifs at their N-terminus. Their genes have a conserved structure and repeat motifs that could be a potential source of antigenic variation in M. tuberculosis. PE/PPE genes are scattered throughout the genome and PE/PPE pairs are usually encoded in bicistronic operons although this is not universally so. This gene family has evolved by specific gene duplication events. PE/PPE proteins are either secreted or localized to the cell surface. Several are thought to be virulence factors, which participate in evasion of the host immune response. This review summarizes the current knowledge about the gene family in order to better understand its biological function. Copyright © 2011 Elsevier Masson SAS. All rights reserved.
The complete mitochondrial genome of the Longnose skate: Raja rhina (Rajiformes, Rajidae).
Jeong, Dageum; Lee, Youn-Ho
2015-02-01
The complete sequence of mitochondrial DNA of a longnose skate, Raja rhina was determined for the first time. It is 16,910 bp in length containing 2 rRNA, 22 tRNA and 13 protein coding genes with the same gene order and structure as those of other Rajidae species. The nucleotide of L-strand is composed of 30.1% A, 27.2% C, 28.5% T and 14.2% G, showing a slight A + T bias. The G is the least used base and markedly lower at the third codon position (5.4%). Twelve of the 13 protein coding genes use ATG as their start codon while the COX1 starts with GTG. As for stop codon, only ND4 shows incomplete stop codon TA. This mitogenome is the first report for a species of the genus Raja, and providing a valuable resource of genetic information for understanding the phylogenetic relationship and the evolution of the genus Raja as well as the family, Rajidae.
Complete mitochondrial genome of the Yellownose skate: Zearaja chilensis (Rajiformes, Rajidae).
Jeong, Dageum; Lee, Youn-Ho
2016-01-01
The complete sequence of mitochondrial DNA of a Yellownose skate, Zearaja chilensis was determined for the first time. It is 16,909 bp in length covering 2 rRNA, 22 tRNA and 13 protein coding genes with the identical gene order and structure as those of other Rajidae species. The nucleotide of L-strand is composed of low G (14.3%), and slightly high A + T (58.9%) nucleotides. The strong codon usage bias against the use of G (6.0%) is found at the third codon positions. Twelve of the 13 protein coding genes use ATG as the start codon while COX1 starts with GTG. As for the stop codon, only ND4 shows an incomplete stop codon TA. This is the first report of the mitogenome for a species in the genus Zearaja, providing a valuable source of genetic information on the evolution of the family Rajidae and the genus Zearaja as well as for establishment of a sustainble fishery management plan of the species.
Complete mitochondrial genome of yellow meal worm(Tenebrio molitor)
LIU, Li-Na; WANG, Cheng-Ye
2014-01-01
The yellow meal worm(Tenebrio molitor L.) is an important resource insect typically used as animal feed additive. It is also widely used for biological research. The first complete mitochondrial genome of T. molitor was determined for the first time by long PCR and conserved primer walking approaches. The results showed that the entire mitogenome of T. molitor was 15 785 bp long, with 72.35% A+T content [deposited in GenBank with accession number KF418153]. The gene order and orientation were the same as the most common type suggested as ancestral for insects. Two protein-coding genes used atypical start codons(CTA in ND2 and AAT in COX1), and the remaining 11 protein-coding genes started with a typical insect initiation codon ATN. All tRNAs showed standard clover-leaf structure, except for tRNASer(AGN), which lacked a dihydrouridine(DHU) arm. The newly added T. molitor mitogenome could provide information for future studies on yellow meal worm. PMID:25465087
Complete mitochondrial genome of yellow meal worm (Tenebrio molitor).
Liu, Li-Na; Wang, Cheng-Ye
2014-11-18
The yellow meal worm (Tenebrio molitor L.) is an important resource insect typically used as animal feed additive. It is also widely used for biological research. The first complete mitochondrial genome of T. molitor was determined for the first time by long PCR and conserved primer walking approaches. The results showed that the entire mitogenome of T. molitor was 15 785 bp long, with 72.35% A+T content [deposited in GenBank with accession number KF418153]. The gene order and orientation were the same as the most common type suggested as ancestral for insects. Two protein-coding genes used atypical start codons (CTA in ND2 and AAT in COX1), and the remaining 11 protein-coding genes started with a typical insect initiation codon ATN. All tRNAs showed standard clover-leaf structure, except for tRNA(Ser) (AGN), which lacked a dihydrouridine (DHU) arm. The newly added T. molitor mitogenome could provide information for future studies on yellow meal worm.
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae).
Pan, Hong-Chun; Fang, Hong-Yan; Li, Shi-Wei; Liu, Jun-Hong; Wang, Ying; Wang, An-Tai
2014-12-01
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae) is composed of two linear DNA molecules. The mitochondrial DNA (mtDNA) molecule 1 is 8010 bp long and contains six protein-coding genes, large subunit rRNA, methionine and tryptophan tRNAs, two pseudogenes consisting respectively of a partial copy of COI, and terminal sequences at two ends of the linear mtDNA, while the mtDNA molecule 2 is 7576 bp long and contains seven protein-coding genes, small subunit rRNA, methionine tRNA, a pseudogene consisting of a partial copy of COI and terminal sequences at two ends of the linear mtDNA. COI gene begins with GTG as start codon, whereas other 12 protein-coding genes start with a typical ATG initiation codon. In addition, all protein-coding genes are terminated with TAA as stop codon.
GCView: the genomic context viewer for protein homology searches
Grin, Iwan; Linke, Dirk
2011-01-01
Genomic neighborhood can provide important insights into evolution and function of a protein or gene. When looking at operons, changes in operon structure and composition can only be revealed by looking at the operon as a whole. To facilitate the analysis of the genomic context of a query in multiple organisms we have developed Genomic Context Viewer (GCView). GCView accepts results from one or multiple protein homology searches such as BLASTp as input. For each hit, the neighboring protein-coding genes are extracted, the regions of homology are labeled for each input and the results are presented as a clear, interactive graphical output. It is also possible to add more searches to iteratively refine the output. GCView groups outputs by the hits for different proteins. This allows for easy comparison of different operon compositions and structures. The tool is embedded in the framework of the Bioinformatics Toolkit of the Max-Planck Institute for Developmental Biology (MPI Toolkit). Job results from the homology search tools inside the MPI Toolkit can be forwarded to GCView and results can be subsequently analyzed by sequence analysis tools. Results are stored online, allowing for later reinspection. GCView is freely available at http://toolkit.tuebingen.mpg.de/gcview. PMID:21609955
Nafissi, Maryam; Chau, Jeannette; Xu, Jimin
2012-01-01
Synthesis of the Fis nucleoid protein rapidly increases in response to nutrient upshifts, and Fis is one of the most abundant DNA binding proteins in Escherichia coli under nutrient-rich growth conditions. Previous work has shown that control of Fis synthesis occurs at transcription initiation of the dusB-fis operon. We show here that while translation of the dihydrouridine synthase gene dusB is low, unusual mechanisms operate to enable robust translation of fis. At least two RNA sequence elements located within the dusB coding region are responsible for high fis translation. The most important is an AU element centered 35 nucleotides (nt) upstream of the fis AUG, which may function as a binding site for ribosomal protein S1. In addition, a 44-nt segment located upstream of the AU element and predicted to form a stem-loop secondary structure plays a prominent role in enhancing fis translation. On the other hand, mutations close to the AUG, including over a potential Shine-Dalgarno sequence, have little effect on Fis protein levels. The AU element and stem-loop regions are phylogenetically conserved within dusB-fis operons of representative enteric bacteria. PMID:22389479
Genetic Code Expansion of Mammalian Cells with Unnatural Amino Acids.
Brown, Kalyn A; Deiters, Alexander
2015-09-01
The expansion of the genetic code of mammalian cells enables the incorporation of unnatural amino acids into proteins. This is achieved by adding components to the protein biosynthetic machinery, specifically an engineered aminoacyl-tRNA synthetase/tRNA pair. The unnatural amino acids are chemically synthesized and supplemented to the growth medium. Using this methodology, fundamental new chemistries can be added to the functional repertoire of the genetic code of mammalian cells. This protocol outlines the steps necessary to incorporate a photocaged lysine into proteins and showcases its application in the optical triggering of protein translocation to the nucleus. Copyright © 2015 John Wiley & Sons, Inc.
Novotny, Peter; Tang, Xiaojia; Kalari, Krishna R.; Gorodkin, Jan
2014-01-01
Traditional mutation assessment methods generally focus on predicting disruptive changes in protein-coding regions rather than non-coding regulatory regions like untranslated regions (UTRs) of mRNAs. The UTRs, however, are known to have many sequence and structural motifs that can regulate translational and transcriptional efficiency and stability of mRNAs through interaction with RNA-binding proteins and other non-coding RNAs like microRNAs (miRNAs). In a recent study, transcriptomes of tumor cells harboring mutant and wild-type KRAS (V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog) genes in patients with non-small cell lung cancer (NSCLC) have been sequenced to identify single nucleotide variations (SNVs). About 40% of the total SNVs (73,717) identified were mapped to UTRs, but omitted in the previous analysis. To meet this obvious demand for analysis of the UTRs, we designed a comprehensive pipeline to predict the effect of SNVs on two major regulatory elements, secondary structure and miRNA target sites. Out of 29,290 SNVs in 6462 genes, we predict 472 SNVs (in 408 genes) affecting local RNA secondary structure, 490 SNVs (in 447 genes) affecting miRNA target sites and 48 that do both. Together these disruptive SNVs were present in 803 different genes, out of which 188 (23.4%) were previously known to be cancer-associated. Notably, this ratio is significantly higher (one-sided Fisher's exact test p-value = 0.032) than the ratio (20.8%) of known cancer-associated genes (n = 1347) in our initial data set (n = 6462). Network analysis shows that the genes harboring disruptive SNVs were involved in molecular mechanisms of cancer, and the signaling pathways of LPS-stimulated MAPK, IL-6, iNOS, EIF2 and mTOR. In conclusion, we have found hundreds of SNVs which are highly disruptive with respect to changes in the secondary structure and miRNA target sites within UTRs. These changes hold the potential to alter the expression of known cancer genes or genes linked to cancer-associated pathways. PMID:24416147
Sabarinathan, Radhakrishnan; Wenzel, Anne; Novotny, Peter; Tang, Xiaojia; Kalari, Krishna R; Gorodkin, Jan
2014-01-01
Traditional mutation assessment methods generally focus on predicting disruptive changes in protein-coding regions rather than non-coding regulatory regions like untranslated regions (UTRs) of mRNAs. The UTRs, however, are known to have many sequence and structural motifs that can regulate translational and transcriptional efficiency and stability of mRNAs through interaction with RNA-binding proteins and other non-coding RNAs like microRNAs (miRNAs). In a recent study, transcriptomes of tumor cells harboring mutant and wild-type KRAS (V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog) genes in patients with non-small cell lung cancer (NSCLC) have been sequenced to identify single nucleotide variations (SNVs). About 40% of the total SNVs (73,717) identified were mapped to UTRs, but omitted in the previous analysis. To meet this obvious demand for analysis of the UTRs, we designed a comprehensive pipeline to predict the effect of SNVs on two major regulatory elements, secondary structure and miRNA target sites. Out of 29,290 SNVs in 6462 genes, we predict 472 SNVs (in 408 genes) affecting local RNA secondary structure, 490 SNVs (in 447 genes) affecting miRNA target sites and 48 that do both. Together these disruptive SNVs were present in 803 different genes, out of which 188 (23.4%) were previously known to be cancer-associated. Notably, this ratio is significantly higher (one-sided Fisher's exact test p-value = 0.032) than the ratio (20.8%) of known cancer-associated genes (n = 1347) in our initial data set (n = 6462). Network analysis shows that the genes harboring disruptive SNVs were involved in molecular mechanisms of cancer, and the signaling pathways of LPS-stimulated MAPK, IL-6, iNOS, EIF2 and mTOR. In conclusion, we have found hundreds of SNVs which are highly disruptive with respect to changes in the secondary structure and miRNA target sites within UTRs. These changes hold the potential to alter the expression of known cancer genes or genes linked to cancer-associated pathways.
Freije, J M; Díez-Itza, I; Balbín, M; Sánchez, L M; Blasco, R; Tolivia, J; López-Otín, C
1994-06-17
A cDNA coding for a new human matrix metalloproteinase (MMP) has been cloned from a cDNA library derived from a breast tumor. The isolated cDNA contains an open reading frame coding for a polypeptide of 471 amino acids. The predicted protein sequence displays extensive similarity to the previously known MMPs and presents all the structural features characteristic of the members of this protein family, including the well conserved PRCGXPD motif, involved in the latency of the enzyme and the zinc-binding domain (HEXGHXXXXXHS). In addition, this novel human MMP contains in its amino acid sequence several residues specific to the collagenase subfamily (Tyr-214, Asp-235, and Gly-237) and lacks the 9-residue insertion present in the stromelysins. According to these structural characteristics, the MMP described herein has been tentatively called collagenase-3, since it represents the third member of this subfamily, composed at present of fibroblast and neutrophil collagenases. The collagenase-3 cDNA was expressed in a vaccinia virus system, and the recombinant protein was able to degrade fibrillar collagens, providing support to the hypothesis that the isolated cDNA codes for an authentic collagenase. Northern blot analysis of RNA from normal and pathological tissues demonstrated the existence in breast tumors of three different mRNA species, which seem to be the result of the utilization of different polyadenylation sites present in the 3'-noncoding region of the gene. By contrast, no collagenase-3 mRNA was detected either by Northern blot or RNA polymerase chain reaction analysis with RNA from other human tissues, including normal breast, mammary fibroadenomas, liver, placenta, ovary, uterus, prostate, and parotid gland. On the basis of the increased expression of collagenase-3 in breast carcinomas and the absence of detectable expression in normal tissues, a possible role for this metalloproteinase in the tumoral process is proposed.
Schwizer, Sarah; Tasara, Taurai; Zurfluh, Katrin; Stephan, Roger; Lehner, Angelika
2013-02-15
Cronobacter spp. are opportunistic pathogens that can cause septicemia and infections of the central nervous system primarily in premature, low-birth weight and/or immune-compromised neonates. Serum resistance is a crucial virulence factor for the development of systemic infections, including bacteremia. It was the aim of the current study to identify genes involved in serum tolerance in a selected Cronobacter sakazakii strain of clinical origin. Screening of 2749 random transposon knock out mutants of a C. sakazakii ES 5 library for modified serum tolerance (compared to wild type) revealed 10 mutants showing significantly increased/reduced resistance to serum killing. Identification of the affected sites in mutants displaying reduced serum resistance revealed genes encoding for surface and membrane proteins as well as regulatory elements or chaperones. By this approach, the involvement of the yet undescribed Wzy_C superfamily domain containing coding region in serum tolerance was observed and experimentally confirmed. Additionally, knock out mutants with enhanced serum tolerance were observed. Examination of respective transposon insertion loci revealed regulatory (repressor) elements, coding regions for chaperones and efflux systems as well as the coding region for the protein YbaJ. Real time expression analysis experiments revealed, that knock out of the gene for this protein negatively affects the expression of the fimA gene, which is a key structural component of the formation of fimbriae. Fimbriae are structures of high immunogenic potential and it is likely that absence/truncation of the ybaJ gene resulted in a non-fimbriated phenotype accounting for the enhanced survival of this mutant in human serum. By using a transposon knock out approach we were able to identify genes involved in both increased and reduced serum tolerance in Cronobacter sakazakii ES5. This study reveals first insights in the complex nature of serum tolerance of Cronobacter spp.
Whole mitochondrial genome sequence for an osteoarthritis model of Guinea pig (Caviidae; Cavia).
Cui, Xin-Gang; Liu, Cheng-Yao; Wei, Bo; Zhao, Wen-Jian; Zhang, Wen-Feng
2016-11-01
Animal models played an important role in osteoarthritis studies. Here, the complete mitochondrial genome sequence of the Guinea pig was reported for the first time. The total length of the mitogenome was 16,797 bp. It contained the typical structure, including two ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and one non-coding control region (D-loop region). The overall composition of the mitogenome was estimated to be 34.9% for A, 26.1% for T, 26.0% for C and 13.0% for G showing an A-T (61.0%)-rich feature. This mitochondrial genome sequence will provide new genetic resource into osteoarthritis disease.
The complete mitochondrial genome of Pholis nebulosus (Perciformes: Pholidae).
Wang, Zhongquan; Qin, Kaili; Liu, Jingxi; Song, Na; Han, Zhiqiang; Gao, Tianxiang
2016-11-01
In this study, the complete mitochondrial genome (mitogenome) sequence of Pholis nebulosus has been determined by long polymerase chain reaction and primer-walking methods. The mitogenome is a circular molecule of 16 524 bp in length, including the typical structure of 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 2 non-coding regions (L-strand replication origin and control region), the gene contents of which are identical to those observed in most bony fishes. Within the control region, we identified the termination-associated sequence domain (TAS), and the conserved sequence block domain (CSB-F, CSB-E, CSB-D, CSB-C, CSB-B, CSB-A, CSB-1, CSB-2, CSB-3).
Complete mitochondrial genome of the Tyto longimembris (Strigiformes: Tytonidae).
Xu, Peng; Li, Yankuo; Miao, Lujun; Xie, Guangyong; Huang, Yan
2016-07-01
The complete mitochondrial genome of Tyto longimembris has been determined in this study. It is 18,466 bp in length and consists of 13 protein-coding genes, 22 transfer RNA (tRNA) genes, 2 ribosomal RNA (rRNA) genes and a non-coding control region (D-loop). The overall base composition of the heavy strand of the T. longimembris mitochondrial genome is A: 30.1%, T: 23.5%, C: 31.8% and G: 14.6%. The structure of control region should be characterized by a region containing tandem repeats as two definitely separated clusters of tandem repeats were found. This study provided an important data set for phylogenetic and taxonomic analyses of Tyto species.
The complete mitochondrial genome of the sandbar shark Carcharhinus plumbeus.
Blower, Dean C; Ovenden, Jennifer R
2016-01-01
The sandbar shark, Carcharhinus plumbeus, a major representative species in shark fisheries worldwide is now considered vulnerable to overfishing. A pool of 774,234 Roche 454 shotgun sequences from one individual were assembled into a 16,706 bp mitogenome with 33× average coverage depth. It comprised 13 protein coding genes, 22 transfer RNA's, 2 ribosomal genes and 2 non-coding regions, typical of a vertebrate mitogenome. As expected for sharks, an A-T nucleotide bias was evident. This adds to rapidly growing number of mitogenome assemblies for the economically important Carcharhinidae family. The C. plumbeus mitogenome will assist researchers, fisheries and conservation managers interested in shark molecular systematics, phylogeography, conservation genetics, population and stock structure.
Arbitrariness is not enough: towards a functional approach to the genetic code.
Lacková, Ľudmila; Matlach, Vladimír; Faltýnek, Dan
2017-12-01
Arbitrariness in the genetic code is one of the main reasons for a linguistic approach to molecular biology: the genetic code is usually understood as an arbitrary relation between amino acids and nucleobases. However, from a semiotic point of view, arbitrariness should not be the only condition for definition of a code, consequently it is not completely correct to talk about "code" in this case. Yet we suppose that there exist a code in the process of protein synthesis, but on a higher level than the nucleic bases chains. Semiotically, a code should be always associated with a function and we propose to define the genetic code not only relationally (in basis of relation between nucleobases and amino acids) but also in terms of function (function of a protein as meaning of the code). Even if the functional definition of meaning in the genetic code has been discussed in the field of biosemiotics, its further implications have not been considered. In fact, if the function of a protein represents the meaning of the genetic code (the sign's object), then it is crucial to reconsider the notion of its expression (the sign) as well. In our contribution, we will show that the actual model of the genetic code is not the only possible and we will propose a more appropriate model from a semiotic point of view.
Rahpeyma, Mehdi; Fotouhi, Fatemeh; Makvandi, Manouchehr; Ghadiri, Ata; Samarbaf-Zadeh, Alireza
2015-11-01
Crimean-Congo hemorrhagic fever virus (CCHFV) is a member of the nairovirus, a genus in the Bunyaviridae family, which causes a life threatening disease in human. Currently, there is no vaccine against CCHFV and detailed structural analysis of CCHFV proteins remains undefined. The CCHFV M RNA segment encodes two viral surface glycoproteins known as Gn and Gc. Viral glycoproteins can be considered as key targets for vaccine development. The current study aimed to investigate structural bioinformatics of CCHFV Gn protein and design a construct to make a recombinant bacmid to express by baculovirus system. To express the Gn protein in insect cells that can be used as antigen in animal model vaccine studies. Bioinformatic analysis of CCHFV Gn protein was performed and designed a construct and cloned into pFastBacHTb vector and a recombinant Gn-bacmid was generated by Bac to Bac system. Primary, secondary, and 3D structure of CCHFV Gn were obtained and PCR reaction with M13 forward and reverse primers confirmed the generation of recombinant bacmid DNA harboring Gn coding region under polyhedron promoter. Characterization of the detailed structure of CCHFV Gn by bioinformatics software provides the basis for development of new experiments and construction of a recombinant bacmid harboring CCHFV Gn, which is valuable for designing a recombinant vaccine against deadly pathogens like CCHFV.
Guo, Emily Z.; Xu, Zhaohui
2015-01-01
The endosomal sorting complex required for transport (ESCRT) machinery is responsible for membrane remodeling in a number of biological processes including multivesicular body biogenesis, cytokinesis, and enveloped virus budding. In mammalian cells, efficient abscission during cytokinesis requires proper function of the ESCRT-III protein IST1, which binds to the microtubule interacting and trafficking (MIT) domains of VPS4, LIP5, and Spartin via its C-terminal MIT-interacting motif (MIM). Here, we studied the molecular interactions between IST1 and the three MIT domain-containing proteins to understand the structural basis that governs pairwise MIT-MIM interaction. Crystal structures of the three molecular complexes revealed that IST1 binds to the MIT domains of VPS4, LIP5, and Spartin using two different mechanisms (MIM1 mode versus MIM3 mode). Structural comparison revealed that structural features in both MIT and MIM contribute to determine the specific binding mechanism. Within the IST1 MIM sequence, two phenylalanine residues were shown to be important in discriminating MIM1 versus MIM3 binding. These observations enabled us to deduce a preliminary binding code, which we applied to provide CHMP2A, a protein that normally only binds the MIT domain in the MIM1 mode, the additional ability to bind the MIT domain of Spartin in the MIM3 mode. PMID:25657007
An Amino Acid Code to Define a Protein’s Tertiary Packing Surface
Fraga, Keith J.; Joo, Hyun; Tsai, Jerry
2015-01-01
One difficult aspect of the protein-folding problem is characterizing the non-specific interactions that define packing in protein tertiary structure. To better understand tertiary structure, this work extends the knob-socket model by classifying the interactions of a single knob residue packed into a set of contiguous sockets, or a pocket made up of 4 or more residues. The knob-socket construct allows for a symbolic two-dimensional mapping of pockets. The two-dimensional mapping of pockets provides a simple method to investigate the variety of pocket shapes in order to understand the geometry of protein tertiary surfaces. The diversity of pocket geometries can be organized into groups of pockets that share a common core, which suggests that some interactions in pockets are ancillary to packing. Further analysis of pocket geometries displays a preferred configuration that is right-handed in α-helices and left-handed in β-sheets. The amino acid composition of pockets illustrates the importance of non-polar amino acids in packing as well as position specificity. As expected, all pocket shapes prefer to pack with hydrophobic knobs; however, knobs are not selective for the pockets they pack. Investigating side-chain rotamer preferences for certain pocket shapes uncovers no strong correlations. These findings allow a simple vocabulary based on knobs and sockets to describe protein tertiary packing that supports improved analysis, design and prediction of protein structure. PMID:26575337
Frameshifting in alphaviruses: a diversity of 3' stimulatory structures.
Chung, Betty Y-W; Firth, Andrew E; Atkins, John F
2010-03-26
Programmed ribosomal frameshifting allows the synthesis of alternative, N-terminally coincident, C-terminally distinct proteins from the same RNA. Many viruses utilize frameshifting to optimize the coding potential of compact genomes, to circumvent the host cell's canonical rule of one functional protein per mRNA, or to express alternative proteins in a fixed ratio. Programmed frameshifting is also used in the decoding of a small number of cellular genes. Recently, specific ribosomal -1 frameshifting was discovered at a conserved U_UUU_UUA motif within the sequence encoding the alphavirus 6K protein. In this case, frameshifting results in the synthesis of an additional protein, termed TF (TransFrame). This new case of frameshifting is unusual in that the -1 frame ORF is very short and completely embedded within the sequence encoding the overlapping polyprotein. The present work shows that there is remarkable diversity in the 3' sequences that are functionally important for efficient frameshifting at the U_UUU_UUA motif. While many alphavirus species utilize a 3' RNA structure such as a hairpin or pseudoknot, some species (such as Semliki Forest virus) apparently lack any intra-mRNA stimulatory structure, yet just 20 nt 3'-adjacent to the shift site stimulates up to 10% frameshifting. The analysis, both experimental and bioinformatic, significantly expands the known repertoire of -1 frameshifting stimulators in mammalian and insect systems.
Fort, Joana; de la Ballina, Laura R; Burghardt, Hans E; Ferrer-Costa, Carles; Turnay, Javier; Ferrer-Orta, Cristina; Usón, Isabel; Zorzano, Antonio; Fernández-Recio, Juan; Orozco, Modesto; Lizarbe, María Antonia; Fita, Ignacio; Palacín, Manuel
2007-10-26
4F2hc (CD98hc) is a multifunctional type II membrane glycoprotein involved in amino acid transport and cell fusion, adhesion, and transformation. The structure of the ectodomain of human 4F2hc has been solved using monoclinic (Protein Data Bank code 2DH2) and orthorhombic (Protein Data Bank code 2DH3) crystal forms at 2.1 and 2.8 A, respectively. It is composed of a (betaalpha)(8) barrel and an antiparallel beta(8) sandwich related to bacterial alpha-glycosidases, although lacking key catalytic residues and consequently catalytic activity. 2DH3 is a dimer with Zn(2+) coordination at the interface. Human 4F2hc expressed in several cell types resulted in cell surface and Cys(109) disulfide bridge-linked homodimers with major architectural features of the crystal dimer, as demonstrated by cross-linking experiments. 4F2hc has no significant hydrophobic patches at the surface. Monomer and homodimer have a polarized charged surface. The N terminus of the solved structure, including the position of Cys(109) residue located four residues apart from the transmembrane domain, is adjacent to the positive face of the ectodomain. This location of the N terminus and the Cys(109)-intervening disulfide bridge imposes space restrictions sufficient to support a model for electrostatic interaction of the 4F2hc ectodomain with membrane phospholipids. These results provide the first crystal structure of heteromeric amino acid transporters and suggest a dynamic interaction of the 4F2hc ectodomain with the plasma membrane.
Watanabe, H; Narai, A; Shimizu, M
1999-06-01
A new protein that decreases transepithelial electrical resistance (TEER) in the human intestinal Caco-2 cell monolayer was found in a water-soluble fraction of the mushroom Flammulina velutipes. This protein, termed TEER-decreasing protein (TDP), is not cytotoxic and does not induce cell detachment, but rapidly increases the tight junctional permeability for water-soluble marker substances such as Lucifer Yellow CH (Mr 457) through the paracellular pathway. TDP was isolated and purified from the aqueous extract of F. velutipes by chromatographic means. Purified TDP was found to be a simple, nonglycosylated protein without intermolecular disulfide bonds, and the apparent molecular mass as estimated by SDS/PAGE and gel filtration is 30 kDa. It was revealed that the N-terminal amino-acid sequence of purified TDP is identical to the recently reported N-terminal sequence of flammutoxin, a membrane-perturbing hemolytic protein, for which the complete primary structure has not yet been reported [Tomita, T., Ishikawa, D., Noguchi, T., Katayama, E., and Hashimoto, Y. (1998) Biochem. J. 333, 24794-24799]. The cDNA coding for TDP was cloned by 5' and 3' rapid amplification of cDNA ends. The ORF encodes a protein with 272 amino-acid residues showing no homology to known proteins. Relevant studies using TDP cDNA will provide insight into the structure-function relationships of membrane pore-forming toxins.
USDA-ARS?s Scientific Manuscript database
Deletions within the 3A coding region of foot-and-mouth disease virus (FMDV) are associated with decreased virulence in cattle; however, the mechanisms are unknown. We compared experimental infection of cattle with virulent FMDV O1Campos (O1Ca) and a mutant derivative (O1Ca-delta3A) lacking residues...
Kim, Sanghee; Lim, Byung-Jin; Min, Gi-Sik; Choi, Han-Gu
2013-05-10
Copepoda is the most diverse and abundant group of crustaceans, but its phylogenetic relationships are ambiguous. Mitochondrial (mt) genomes are useful for studying evolutionary history, but only six complete Copepoda mt genomes have been made available and these have extremely rearranged genome structures. This study determined the mt genome of Calanus hyperboreus, making it the first reported Arctic copepod mt genome and the first complete mt genome of a calanoid copepod. The mt genome of C. hyperboreus is 17,910 bp in length and it contains the entire set of 37 mt genes, including 13 protein-coding genes, 2 rRNAs, and 22 tRNAs. It has a very unusual gene structure, including the longest control region reported for a crustacean, a large tRNA gene cluster, and reversed GC skews in 11 out of 13 protein-coding genes (84.6%). Despite the unusual features, comparing this genome to published copepod genomes revealed retained pan-crustacean features, as well as a conserved calanoid-specific pattern. Our data provide a foundation for exploring the calanoid pattern and the mechanisms of mt gene rearrangement in the evolutionary history of the copepod mt genome. Copyright © 2012 Elsevier B.V. All rights reserved.
Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin
ERIC Educational Resources Information Center
Offner, Susan
2010-01-01
The beta hemoglobin protein is identical in humans and chimpanzees. In this tutorial, students see that even though the proteins are identical, the genes that code for them are not. There are many more differences in the introns than in the exons, which indicates that coding regions of DNA are more highly conserved than non-coding regions.
FPGA acceleration of rigid-molecule docking codes
Sukhwani, B.; Herbordt, M.C.
2011-01-01
Modelling the interactions of biological molecules, or docking, is critical both to understanding basic life processes and to designing new drugs. The field programmable gate array (FPGA) based acceleration of a recently developed, complex, production docking code is described. The authors found that it is necessary to extend their previous three-dimensional (3D) correlation structure in several ways, most significantly to support simultaneous computation of several correlation functions. The result for small-molecule docking is a 100-fold speed-up of a section of the code that represents over 95% of the original run-time. An additional 2% is accelerated through a previously described method, yielding a total acceleration of 36× over a single core and 10× over a quad-core. This approach is found to be an ideal complement to graphics processing unit (GPU) based docking, which excels in the protein–protein domain. PMID:21857870
Structural and sequence features of two residue turns in beta-hairpins.
Madan, Bharat; Seo, Sung Yong; Lee, Sun-Gu
2014-09-01
Beta-turns in beta-hairpins have been implicated as important sites in protein folding. In particular, two residue β-turns, the most abundant connecting elements in beta-hairpins, have been a major target for engineering protein stability and folding. In this study, we attempted to investigate and update the structural and sequence properties of two residue turns in beta-hairpins with a large data set. For this, 3977 beta-turns were extracted from 2394 nonhomologous protein chains and analyzed. First, the distribution, dihedral angles and twists of two residue turn types were determined, and compared with previous data. The trend of turn type occurrence and most structural features of the turn types were similar to previous results, but for the first time Type II turns in beta-hairpins were identified. Second, sequence motifs for the turn types were devised based on amino acid positional potentials of two-residue turns, and their distributions were examined. From this study, we could identify code-like sequence motifs for the two residue beta-turn types. Finally, structural and sequence properties of beta-strands in the beta-hairpins were analyzed, which revealed that the beta-strands showed no specific sequence and structural patterns for turn types. The analytical results in this study are expected to be a reference in the engineering or design of beta-hairpin turn structures and sequences. © 2014 Wiley Periodicals, Inc.