conserved protein coding: Topics by Science.gov

Sample records for conserved protein coding

A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes.

PubMed

Hezroni, Hadas; Ben-Tov Perry, Rotem; Meir, Zohar; Housman, Gali; Lubelsky, Yoav; Ulitsky, Igor

2017-08-30

Only a small portion of human long non-coding RNAs (lncRNAs) appear to be conserved outside of mammals, but the events underlying the birth of new lncRNAs in mammals remain largely unknown. One potential source is remnants of protein-coding genes that transitioned into lncRNAs. We systematically compare lncRNA and protein-coding loci across vertebrates, and estimate that up to 5% of conserved mammalian lncRNAs are derived from lost protein-coding genes. These lncRNAs have specific characteristics, such as broader expression domains, that set them apart from other lncRNAs. Fourteen lncRNAs have sequence similarity with the loci of the contemporary homologs of the lost protein-coding genes. We propose that selection acting on enhancer sequences is mostly responsible for retention of these regions. As an example of an RNA element from a protein-coding ancestor that was retained in the lncRNA, we describe in detail a short translated ORF in the JPX lncRNA that was derived from an upstream ORF in a protein-coding gene and retains some of its functionality. We estimate that ~ 55 annotated conserved human lncRNAs are derived from parts of ancestral protein-coding genes, and loss of coding potential is thus a non-negligible source of new lncRNAs. Some lncRNAs inherited regulatory elements influencing transcription and translation from their protein-coding ancestors and those elements can influence the expression breadth and functionality of these lncRNAs.
The Number, Organization, and Size of Polymorphic Membrane Protein Coding Sequences as well as the Most Conserved Pmp Protein Differ within and across Chlamydia Species.

PubMed

Van Lent, Sarah; Creasy, Heather Huot; Myers, Garry S A; Vanrompay, Daisy

2016-01-01

Variation is a central trait of the polymorphic membrane protein (Pmp) family. The number of pmp coding sequences differs between Chlamydia species, but it is unknown whether the number of pmp coding sequences is constant within a Chlamydia species. The level of conservation of the Pmp proteins has previously only been determined for Chlamydia trachomatis. As different Pmp proteins might be indispensible for the pathogenesis of different Chlamydia species, this study investigated the conservation of Pmp proteins both within and across C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci. The pmp coding sequences were annotated in 16 C. trachomatis, 6 C. pneumoniae, 2 C. abortus, and 16 C. psittaci genomes. The number and organization of polymorphic membrane coding sequences differed within and across the analyzed Chlamydia species. The length of coding sequences of pmpA,pmpB, and pmpH was conserved among all analyzed genomes, while the length of pmpE/F and pmpG, and remarkably also of the subtype pmpD, differed among the analyzed genomes. PmpD, PmpA, PmpH, and PmpA were the most conserved Pmp in C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci, respectively. PmpB was the most conserved Pmp across the 4 analyzed Chlamydia species. © 2016 S. Karger AG, Basel.
Identification of a Conserved Non-Protein-Coding Genomic Element that Plays an Essential Role in Alphabaculovirus Pathogenesis

PubMed Central

Kikhno, Irina

2014-01-01

Highly homologous sequences 154–157 bp in length grouped under the name of “conserved non-protein-coding element” (CNE) were revealed in all of the sequenced genomes of baculoviruses belonging to the genus Alphabaculovirus. A CNE alignment led to the detection of a set of highly conserved nucleotide clusters that occupy strictly conserved positions in the CNE sequence. The significant length of the CNE and conservation of both its length and cluster architecture were identified as a combination of characteristics that make this CNE different from known viral non-coding functional sequences. The essential role of the CNE in the Alphabaculovirus life cycle was demonstrated through the use of a CNE-knockout Autographa californica multiple nucleopolyhedrovirus (AcMNPV) bacmid. It was shown that the essential function of the CNE was not mediated by the presumed expression activities of the protein- and non-protein-coding genes that overlap the AcMNPV CNE. On the basis of the presented data, the AcMNPV CNE was categorized as a complex-structured, polyfunctional genomic element involved in an essential DNA transaction that is associated with an undefined function of the baculovirus genome. PMID:24740153
Delineating slowly and rapidly evolving fractions of the Drosophila genome.

PubMed

Keith, Jonathan M; Adams, Peter; Stephen, Stuart; Mattick, John S

2008-05-01

Evolutionary conservation is an important indicator of function and a major component of bioinformatic methods to identify non-protein-coding genes. We present a new Bayesian method for segmenting pairwise alignments of eukaryotic genomes while simultaneously classifying segments into slowly and rapidly evolving fractions. We also describe an information criterion similar to the Akaike Information Criterion (AIC) for determining the number of classes. Working with pairwise alignments enables detection of differences in conservation patterns among closely related species. We analyzed three whole-genome and three partial-genome pairwise alignments among eight Drosophila species. Three distinct classes of conservation level were detected. Sequences comprising the most slowly evolving component were consistent across a range of species pairs, and constituted approximately 62-66% of the D. melanogaster genome. Almost all (>90%) of the aligned protein-coding sequence is in this fraction, suggesting much of it (comprising the majority of the Drosophila genome, including approximately 56% of non-protein-coding sequences) is functional. The size and content of the most rapidly evolving component was species dependent, and varied from 1.6% to 4.8%. This fraction is also enriched for protein-coding sequence (while containing significant amounts of non-protein-coding sequence), suggesting it is under positive selection. We also classified segments according to conservation and GC content simultaneously. This analysis identified numerous sub-classes of those identified on the basis of conservation alone, but was nevertheless consistent with that classification. Software, data, and results available at www.maths.qut.edu.au/-keithj/. Genomic segments comprising the conservation classes available in BED format.
Comparative sequence analysis of acid sensitive/resistance proteins in Escherichia coli and Shigella flexneri

PubMed Central

Manikandan, Selvaraj; Balaji, Seetharaaman; Kumar, Anil; Kumar, Rita

2007-01-01

The molecular basis for the survival of bacteria under extreme conditions in which growth is inhibited is a question of great current interest. A preliminary study was carried out to determine residue pattern conservation among the antiporters of enteric bacteria, responsible for extreme acid sensitivity especially in Escherichia coli and Shigella flexneri. Here we found the molecular evidence that proved the relationship between E. coli and S. flexneri. Multiple sequence alignment of the gadC coded acid sensitive antiporter showed many conserved residue patterns at regular intervals at the N-terminal region. It was observed that as the alignment approaches towards the C-terminal, the number of conserved residues decreases, indicating that the N-terminal region of this protein has much active role when compared to the carboxyl terminal. The motif, FHLVFFLLLGG, is well conserved within the entire gadC coded protein at the amino terminal. The motif is also partially conserved among other antiporters (which are not coded by gadC) but involved in acid sensitive/resistance mechanism. Phylogenetic cluster analysis proves the relationship of Escherichia coli and Shigella flexneri. The gadC coded proteins are converged as a clade and diverged from other antiporters belongs to the amino acid-polyamine-organocation (APC) superfamily. PMID:21670792
Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi.

PubMed

Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M

2010-12-15

Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.
Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin

ERIC Educational Resources Information Center

Offner, Susan

2010-01-01

The beta hemoglobin protein is identical in humans and chimpanzees. In this tutorial, students see that even though the proteins are identical, the genes that code for them are not. There are many more differences in the introns than in the exons, which indicates that coding regions of DNA are more highly conserved than non-coding regions.
Cell cycle, oncogenic and tumor suppressor pathways regulate numerous long and macro non-protein-coding RNAs

PubMed Central

2014-01-01

Background The genome is pervasively transcribed but most transcripts do not code for proteins, constituting non-protein-coding RNAs. Despite increasing numbers of functional reports of individual long non-coding RNAs (lncRNAs), assessing the extent of functionality among the non-coding transcriptional output of mammalian cells remains intricate. In the protein-coding world, transcripts differentially expressed in the context of processes essential for the survival of multicellular organisms have been instrumental in the discovery of functionally relevant proteins and their deregulation is frequently associated with diseases. We therefore systematically identified lncRNAs expressed differentially in response to oncologically relevant processes and cell-cycle, p53 and STAT3 pathways, using tiling arrays. Results We found that up to 80% of the pathway-triggered transcriptional responses are non-coding. Among these we identified very large macroRNAs with pathway-specific expression patterns and demonstrated that these are likely continuous transcripts. MacroRNAs contain elements conserved in mammals and sauropsids, which in part exhibit conserved RNA secondary structure. Comparing evolutionary rates of a macroRNA to adjacent protein-coding genes suggests a local action of the transcript. Finally, in different grades of astrocytoma, a tumor disease unrelated to the initially used cell lines, macroRNAs are differentially expressed. Conclusions It has been shown previously that the majority of expressed non-ribosomal transcripts are non-coding. We now conclude that differential expression triggered by signaling pathways gives rise to a similar abundance of non-coding content. It is thus unlikely that the prevalence of non-coding transcripts in the cell is a trivial consequence of leaky or random transcription events. PMID:24594072
A conserved predicted pseudoknot in the NS2A-encoding sequence of West Nile and Japanese encephalitis flaviviruses suggests NS1' may derive from ribosomal frameshifting

PubMed Central

Firth, Andrew E; Atkins, John F

2009-01-01

Japanese encephalitis, West Nile, Usutu and Murray Valley encephalitis viruses form a tight subgroup within the larger Flavivirus genus. These viruses utilize a single-polyprotein expression strategy, resulting in ~10 mature proteins. Plotting the conservation at synonymous sites along the polyprotein coding sequence reveals strong conservation peaks at the very 5' end of the coding sequence, and also at the 5' end of the sequence encoding the NS2A protein. Such peaks are generally indicative of functionally important non-coding sequence elements. The second peak corresponds to a predicted stable pseudoknot structure whose biological importance is supported by compensatory mutations that preserve the structure. The pseudoknot is preceded by a conserved slippery heptanucleotide (Y CCU UUU), thus forming a classical stimulatory motif for -1 ribosomal frameshifting. We hypothesize, therefore, that the functional importance of the pseudoknot is to stimulate a portion of ribosomes to shift -1 nt into a short (45 codon), conserved, overlapping open reading frame, termed foo. Since cleavage at the NS1-NS2A boundary is known to require synthesis of NS2A in cis, the resulting transframe fusion protein is predicted to be NS1-NS2AN-term-FOO. We hypothesize that this may explain the origin of the previously identified NS1 'extension' protein in JEV-group flaviviruses, known as NS1'. PMID:19196463
Long non-coding RNA discovery across the genus anopheles reveals conserved secondary structures within and beyond the Gambiae complex.

PubMed

Jenkins, Adam M; Waterhouse, Robert M; Muskavitch, Marc A T

2015-04-23

Long non-coding RNAs (lncRNAs) have been defined as mRNA-like transcripts longer than 200 nucleotides that lack significant protein-coding potential, and many of them constitute scaffolds for ribonucleoprotein complexes with critical roles in epigenetic regulation. Various lncRNAs have been implicated in the modulation of chromatin structure, transcriptional and post-transcriptional gene regulation, and regulation of genomic stability in mammals, Caenorhabditis elegans, and Drosophila melanogaster. The purpose of this study is to identify the lncRNA landscape in the malaria vector An. gambiae and assess the evolutionary conservation of lncRNAs and their secondary structures across the Anopheles genus. Using deep RNA sequencing of multiple Anopheles gambiae life stages, we have identified 2,949 lncRNAs and more than 300 previously unannotated putative protein-coding genes. The lncRNAs exhibit differential expression profiles across life stages and adult genders. We find that across the genus Anopheles, lncRNAs display much lower sequence conservation than protein-coding genes. Additionally, we find that lncRNA secondary structure is highly conserved within the Gambiae complex, but diverges rapidly across the rest of the genus Anopheles. This study offers one of the first lncRNA secondary structure analyses in vector insects. Our description of lncRNAs in An. gambiae offers the most comprehensive genome-wide insights to date into lncRNAs in this vector mosquito, and defines a set of potential targets for the development of vector-based interventions that may further curb the human malaria burden in disease-endemic countries.
Transcriptional landscapes of Axolotl (Ambystoma mexicanum).

PubMed

Caballero-Pérez, Juan; Espinal-Centeno, Annie; Falcon, Francisco; García-Ortega, Luis F; Curiel-Quesada, Everardo; Cruz-Hernández, Andrés; Bako, Laszlo; Chen, Xuemei; Martínez, Octavio; Alberto Arteaga-Vázquez, Mario; Herrera-Estrella, Luis; Cruz-Ramírez, Alfredo

2018-01-15

The axolotl (Ambystoma mexicanum) is the vertebrate model system with the highest regeneration capacity. Experimental tools established over the past 100 years have been fundamental to start unraveling the cellular and molecular basis of tissue and limb regeneration. In the absence of a reference genome for the Axolotl, transcriptomic analysis become fundamental to understand the genetic basis of regeneration. Here we present one of the most diverse transcriptomic data sets for Axolotl by profiling coding and non-coding RNAs from diverse tissues. We reconstructed a population of 115,906 putative protein coding mRNAs as full ORFs (including isoforms). We also identified 352 conserved miRNAs and 297 novel putative mature miRNAs. Systematic enrichment analysis of gene expression allowed us to identify tissue-specific protein-coding transcripts. We also found putative novel and conserved microRNAs which potentially target mRNAs which are reported as important disease candidates in heart and liver. Copyright © 2017 Elsevier Inc. All rights reserved.
Diversity and Divergence of Dinoflagellate Histone Proteins

PubMed Central

Marinov, Georgi K.; Lynch, Michael

2015-01-01

Histone proteins and the nucleosomal organization of chromatin are near-universal eukaroytic features, with the exception of dinoflagellates. Previous studies have suggested that histones do not play a major role in the packaging of dinoflagellate genomes, although several genomic and transcriptomic surveys have detected a full set of core histone genes. Here, transcriptomic and genomic sequence data from multiple dinoflagellate lineages are analyzed, and the diversity of histone proteins and their variants characterized, with particular focus on their potential post-translational modifications and the conservation of the histone code. In addition, the set of putative epigenetic mark readers and writers, chromatin remodelers and histone chaperones are examined. Dinoflagellates clearly express the most derived set of histones among all autonomous eukaryote nuclei, consistent with a combination of relaxation of sequence constraints imposed by the histone code and the presence of numerous specialized histone variants. The histone code itself appears to have diverged significantly in some of its components, yet others are conserved, implying conservation of the associated biochemical processes. Specifically, and with major implications for the function of histones in dinoflagellates, the results presented here strongly suggest that transcription through nucleosomal arrays happens in dinoflagellates. Finally, the plausible roles of histones in dinoflagellate nuclei are discussed. PMID:26646152
Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses

PubMed Central

Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael

2013-01-01

Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343
Conserved thioredoxin fold is present in Pisum sativum L. sieve element occlusion-1 protein

PubMed Central

Umate, Pavan; Tuteja, Renu

2010-01-01

Homology-based three-dimensional model for Pisum sativum sieve element occlusion 1 (Ps.SEO1) (forisomes) protein was constructed. A stretch of amino acids (residues 320 to 456) which is well conserved in all known members of forisomes proteins was used to model the 3D structure of Ps.SEO1. The structural prediction was done using Protein Homology/analogY Recognition Engine (PHYRE) web server. Based on studies of local sequence alignment, the thioredoxin-fold containing protein [Structural Classification of Proteins (SCOP) code d1o73a_], a member of the glutathione peroxidase family was selected as a template for modeling the spatial structure of Ps.SEO1. Selection was based on comparison of primary sequence, higher match quality and alignment accuracy. Motif 1 (EVF) is conserved in Ps.SEO1, Vicia faba (Vf.For1) and Medicago truncatula (MT.SEO3); motif 2 (KKED) is well conserved across all forisomes proteins and motif 3 (IGYIGNP) is conserved in Ps.SEO1 and Vf.For1. PMID:20404566
Nmf9 Encodes a Highly Conserved Protein Important to Neurological Function in Mice and Flies.

PubMed

Zhang, Shuxiao; Ross, Kevin D; Seidner, Glen A; Gorman, Michael R; Poon, Tiffany H; Wang, Xiaobo; Keithley, Elizabeth M; Lee, Patricia N; Martindale, Mark Q; Joiner, William J; Hamilton, Bruce A

2015-07-01

Many protein-coding genes identified by genome sequencing remain without functional annotation or biological context. Here we define a novel protein-coding gene, Nmf9, based on a forward genetic screen for neurological function. ENU-induced and genome-edited null mutations in mice produce deficits in vestibular function, fear learning and circadian behavior, which correlated with Nmf9 expression in inner ear, amygdala, and suprachiasmatic nuclei. Homologous genes from unicellular organisms and invertebrate animals predict interactions with small GTPases, but the corresponding domains are absent in mammalian Nmf9. Intriguingly, homozygotes for null mutations in the Drosophila homolog, CG45058, show profound locomotor defects and premature death, while heterozygotes show striking effects on sleep and activity phenotypes. These results link a novel gene orthology group to discrete neurological functions, and show conserved requirement across wide phylogenetic distance and domain level structural changes.
Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins

PubMed Central

Delcourt, Vivian; Lucier, Jean-François; Gagnon, Jules; Beaudoin, Maxime C; Vanderperre, Benoît; Breton, Marc-André; Motard, Julie; Jacques, Jean-François; Brunelle, Mylène; Gagnon-Arsenault, Isabelle; Fournier, Isabelle; Ouangraoua, Aida; Hunting, Darel J; Cohen, Alan A; Landry, Christian R; Scott, Michelle S

2017-01-01

Recent functional, proteomic and ribosome profiling studies in eukaryotes have concurrently demonstrated the translation of alternative open-reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by these altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and contain functional domains. Evolutionary analyses indicate that altORFs often show more extreme conservation patterns than their CDSs. Thousands of alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many genes are multicoding genes and code for a large protein and one or several small proteins. PMID:29083303
Mutant phenotypes for thousands of bacterial genes of unknown function

DOE PAGES

Price, Morgan N.; Wetmore, Kelly M.; Waters, R. Jordan; ...

2018-05-16

One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly annotated genes, 2,316 had associations that have high confidence because theymore » are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein families. Lastly, our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.« less
Mutant phenotypes for thousands of bacterial genes of unknown function

DOE Office of Scientific and Technical Information (OSTI.GOV)

Price, Morgan N.; Wetmore, Kelly M.; Waters, R. Jordan

One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly annotated genes, 2,316 had associations that have high confidence because theymore » are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein families. Lastly, our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.« less
Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana

PubMed Central

Itoh, Takeshi; Tanaka, Tsuyoshi; Barrero, Roberto A.; Yamasaki, Chisato; Fujii, Yasuyuki; Hilton, Phillip B.; Antonio, Baltazar A.; Aono, Hideo; Apweiler, Rolf; Bruskiewich, Richard; Bureau, Thomas; Burr, Frances; Costa de Oliveira, Antonio; Fuks, Galina; Habara, Takuya; Haberer, Georg; Han, Bin; Harada, Erimi; Hiraki, Aiko T.; Hirochika, Hirohiko; Hoen, Douglas; Hokari, Hiroki; Hosokawa, Satomi; Hsing, Yue; Ikawa, Hiroshi; Ikeo, Kazuho; Imanishi, Tadashi; Ito, Yukiyo; Jaiswal, Pankaj; Kanno, Masako; Kawahara, Yoshihiro; Kawamura, Toshiyuki; Kawashima, Hiroaki; Khurana, Jitendra P.; Kikuchi, Shoshi; Komatsu, Setsuko; Koyanagi, Kanako O.; Kubooka, Hiromi; Lieberherr, Damien; Lin, Yao-Cheng; Lonsdale, David; Matsumoto, Takashi; Matsuya, Akihiro; McCombie, W. Richard; Messing, Joachim; Miyao, Akio; Mulder, Nicola; Nagamura, Yoshiaki; Nam, Jongmin; Namiki, Nobukazu; Numa, Hisataka; Nurimoto, Shin; O’Donovan, Claire; Ohyanagi, Hajime; Okido, Toshihisa; OOta, Satoshi; Osato, Naoki; Palmer, Lance E.; Quetier, Francis; Raghuvanshi, Saurabh; Saichi, Naomi; Sakai, Hiroaki; Sakai, Yasumichi; Sakata, Katsumi; Sakurai, Tetsuya; Sato, Fumihiko; Sato, Yoshiharu; Schoof, Heiko; Seki, Motoaki; Shibata, Michie; Shimizu, Yuji; Shinozaki, Kazuo; Shinso, Yuji; Singh, Nagendra K.; Smith-White, Brian; Takeda, Jun-ichi; Tanino, Motohiko; Tatusova, Tatiana; Thongjuea, Supat; Todokoro, Fusano; Tsugane, Mika; Tyagi, Akhilesh K.; Vanavichit, Apichart; Wang, Aihui; Wing, Rod A.; Yamaguchi, Kaori; Yamamoto, Mayu; Yamamoto, Naoyuki; Yu, Yeisoo; Zhang, Hao; Zhao, Qiang; Higo, Kenichi; Burr, Benjamin; Gojobori, Takashi; Sasaki, Takuji

2007-01-01

We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ∼32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene. PMID:17210932
Conserved Curvature of RNA Polymerase I Core Promoter Beyond rRNA Genes: The Case of the Tritryps

PubMed Central

Smircich, Pablo; Duhagon, María Ana; Garat, Beatriz

2015-01-01

In trypanosomatids, the RNA polymerase I (RNAPI)-dependent promoters controlling the ribosomal RNA (rRNA) genes have been well identified. Although the RNAPI transcription machinery recognizes the DNA conformation instead of the DNA sequence of promoters, no conformational study has been reported for these promoters. Here we present the in silico analysis of the intrinsic DNA curvature of the rRNA gene core promoters in Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major. We found that, in spite of the absence of sequence conservation, these promoters hold conformational properties similar to other eukaryotic rRNA promoters. Our results also indicated that the intrinsic DNA curvature pattern is conserved within the Leishmania genus and also among strains of T. cruzi and T. brucei. Furthermore, we analyzed the impact of point mutations on the intrinsic curvature and their impact on the promoter activity. Furthermore, we found that the core promoters of protein-coding genes transcribed by RNAPI in T. brucei show the same conserved conformational characteristics. Overall, our results indicate that DNA intrinsic curvature of the rRNA gene core promoters is conserved in these ancient eukaryotes and such conserved curvature might be a requirement of RNAPI machinery for transcription of not only rRNA genes but also protein-coding genes. PMID:26718450

Comparative analysis of human protein-coding and noncoding RNAs between brain and 10 mixed cell lines by RNA-Seq.

PubMed

Chen, Geng; Yin, Kangping; Shi, Leming; Fang, Yuanzhang; Qi, Ya; Li, Peng; Luo, Jian; He, Bing; Liu, Mingyao; Shi, Tieliu

2011-01-01

In their expression process, different genes can generate diverse functional products, including various protein-coding or noncoding RNAs. Here, we investigated the protein-coding capacities and the expression levels of their isoforms for human known genes, the conservation and disease association of long noncoding RNAs (ncRNAs) with two transcriptome sequencing datasets from human brain tissues and 10 mixed cell lines. Comparative analysis revealed that about two-thirds of the genes expressed between brain and cell lines are the same, but less than one-third of their isoforms are identical. Besides those genes specially expressed in brain and cell lines, about 66% of genes expressed in common encoded different isoforms. Moreover, most genes dominantly expressed one isoform and some genes only generated protein-coding (or noncoding) RNAs in one sample but not in another. We found 282 human genes could encode both protein-coding and noncoding RNAs through alternative splicing in the two samples. We also identified more than 1,000 long ncRNAs, and most of those long ncRNAs contain conserved elements across either 46 vertebrates or 33 placental mammals or 10 primates. Further analysis showed that some long ncRNAs differentially expressed in human breast cancer or lung cancer, several of those differentially expressed long ncRNAs were validated by RT-PCR. In addition, those validated differentially expressed long ncRNAs were found significantly correlated with certain breast cancer or lung cancer related genes, indicating the important biological relevance between long ncRNAs and human cancers. Our findings reveal that the differences of gene expression profile between samples mainly result from the expressed gene isoforms, and highlight the importance of studying genes at the isoform level for completely illustrating the intricate transcriptome.
Expanding and reprogramming the genetic code.

PubMed

Chin, Jason W

2017-10-04

Nature uses a limited, conservative set of amino acids to synthesize proteins. The ability to genetically encode an expanded set of building blocks with new chemical and physical properties is transforming the study, manipulation and evolution of proteins, and is enabling diverse applications, including approaches to probe, image and control protein function, and to precisely engineer therapeutics. Underpinning this transformation are strategies to engineer and rewire translation. Emerging strategies aim to reprogram the genetic code so that noncanonical biopolymers can be synthesized and evolved, and to test the limits of our ability to engineer the translational machinery and systematically recode genomes.
NetCoffee: a fast and accurate global alignment approach to identify functionally conserved proteins in multiple networks.

PubMed

Hu, Jialu; Kehr, Birte; Reinert, Knut

2014-02-15

Owing to recent advancements in high-throughput technologies, protein-protein interaction networks of more and more species become available in public databases. The question of how to identify functionally conserved proteins across species attracts a lot of attention in computational biology. Network alignments provide a systematic way to solve this problem. However, most existing alignment tools encounter limitations in tackling this problem. Therefore, the demand for faster and more efficient alignment tools is growing. We present a fast and accurate algorithm, NetCoffee, which allows to find a global alignment of multiple protein-protein interaction networks. NetCoffee searches for a global alignment by maximizing a target function using simulated annealing on a set of weighted bipartite graphs that are constructed using a triplet approach similar to T-Coffee. To assess its performance, NetCoffee was applied to four real datasets. Our results suggest that NetCoffee remedies several limitations of previous algorithms, outperforms all existing alignment tools in terms of speed and nevertheless identifies biologically meaningful alignments. The source code and data are freely available for download under the GNU GPL v3 license at https://code.google.com/p/netcoffee/.
Mitochondrial genome of Pteronotus personatus (Chiroptera: Mormoopidae): comparison with selected bats and phylogenetic considerations.

PubMed

López-Wilchis, Ricardo; Del Río-Portilla, Miguel Ángel; Guevara-Chumacero, Luis Manuel

2017-02-01

We described the complete mitochondrial genome (mitogenome) of the Wagner's mustached bat, Pteronotus personatus, a species belonging to the family Mormoopidae, and compared it with other published mitogenomes of bats (Chiroptera). The mitogenome of P. personatus was 16,570 bp long and contained a typically conserved structure including 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and one control region (D-loop). Most of the genes were encoded on the H-strand, except for eight tRNA and the ND6 genes. The order of protein-coding and rRNA genes was highly conserved in all mitogenomes. All protein-coding genes started with an ATG codon, except for ND2, ND3, and ND5, which initiated with ATA, and terminated with the typical stop codon TAA/TAG or the codon AGA. Phylogenetic trees constructed using Maximum Parsimony, Maximum Likelihood, and Bayesian inference methods showed an identical topology and indicated the monophyly of different families of bats (Mormoopidae, Phyllostomidae, Vespertilionidae, Rhinolophidae, and Pteropopidae) and the existence of two major clades corresponding to the suborders Yangochiroptera and Yinpterochiroptera. The mitogenome sequence provided here will be useful for further phylogenetic analyses and population genetic studies in mormoopid bats.
Comparative Proteomics Reveals a Significant Bias Toward Alternative Protein Isoforms with Conserved Structure and Function

PubMed Central

Ezkurdia, Iakes; del Pozo, Angela; Frankish, Adam; Rodriguez, Jose Manuel; Harrow, Jennifer; Ashman, Keith; Valencia, Alfonso; Tress, Michael L.

2012-01-01

Advances in high-throughput mass spectrometry are making proteomics an increasingly important tool in genome annotation projects. Peptides detected in mass spectrometry experiments can be used to validate gene models and verify the translation of putative coding sequences (CDSs). Here, we have identified peptides that cover 35% of the genes annotated by the GENCODE consortium for the human genome as part of a comprehensive analysis of experimental spectra from two large publicly available mass spectrometry databases. We detected the translation to protein of “novel” and “putative” protein-coding transcripts as well as transcripts annotated as pseudogenes and nonsense-mediated decay targets. We provide a detailed overview of the population of alternatively spliced protein isoforms that are detectable by peptide identification methods. We found that 150 genes expressed multiple alternative protein isoforms. This constitutes the largest set of reliably confirmed alternatively spliced proteins yet discovered. Three groups of genes were highly overrepresented. We detected alternative isoforms for 10 of the 25 possible heterogeneous nuclear ribonucleoproteins, proteins with a key role in the splicing process. Alternative isoforms generated from interchangeable homologous exons and from short indels were also significantly enriched, both in human experiments and in parallel analyses of mouse and Drosophila proteomics experiments. Our results show that a surprisingly high proportion (almost 25%) of the detected alternative isoforms are only subtly different from their constitutive counterparts. Many of the alternative splicing events that give rise to these alternative isoforms are conserved in mouse. It was striking that very few of these conserved splicing events broke Pfam functional domains or would damage globular protein structures. This evidence of a strong bias toward subtle differences in CDS and likely conserved cellular function and structure is remarkable and strongly suggests that the translation of alternative transcripts may be subject to selective constraints. PMID:22446687
Dynamic and Widespread lncRNA Expression in a Sponge and the Origin of Animal Complexity

PubMed Central

Gaiti, Federico; Fernandez-Valverde, Selene L.; Nakanishi, Nagayasu; Calcino, Andrew D.; Yanai, Itai; Tanurdzic, Milos; Degnan, Bernard M.

2015-01-01

Long noncoding RNAs (lncRNAs) are important developmental regulators in bilaterian animals. A correlation has been claimed between the lncRNA repertoire expansion and morphological complexity in vertebrate evolution. However, this claim has not been tested by examining morphologically simple animals. Here, we undertake a systematic investigation of lncRNAs in the demosponge Amphimedon queenslandica, a morphologically simple, early-branching metazoan. We combine RNA-Seq data across multiple developmental stages of Amphimedon with a filtering pipeline to conservatively predict 2,935 lncRNAs. These include intronic overlapping lncRNAs, exonic antisense overlapping lncRNAs, long intergenic nonprotein coding RNAs, and precursors for small RNAs. Sponge lncRNAs are remarkably similar to their bilaterian counterparts in being relatively short with few exons and having low primary sequence conservation relative to protein-coding genes. As in bilaterians, a majority of sponge lncRNAs exhibit typical hallmarks of regulatory molecules, including high temporal specificity and dynamic developmental expression. Specific lncRNA expression profiles correlate tightly with conserved protein-coding genes likely involved in a range of developmental and physiological processes, such as the Wnt signaling pathway. Although the majority of Amphimedon lncRNAs appears to be taxonomically restricted with no identifiable orthologs, we find a few cases of conservation between demosponges in lncRNAs that are antisense to coding sequences. Based on the high similarity in the structure, organization, and dynamic expression of sponge lncRNAs to their bilaterian counterparts, we propose that these noncoding RNAs are an ancient feature of the metazoan genome. These results are consistent with lncRNAs regulating the development of animals, regardless of their level of morphological complexity. PMID:25976353
The LINE-1 DNA sequences in four mammalian orders predict proteins that conserve homologies to retrovirus proteins.

PubMed Central

Fanning, T; Singer, M

1987-01-01

Recent work suggests that one or more members of the highly repeated LINE-1 (L1) DNA family found in all mammals may encode one or more proteins. Here we report the sequence of a portion of an L1 cloned from the domestic cat (Felis catus). These data permit comparison of the L1 sequences in four mammalian orders (Carnivore, Lagomorph, Rodent and Primate) and the comparison supports the suggested coding potential. In two separate, noncontiguous regions in the carboxy terminal half of the proteins predicted from the DNA sequences, there are several strongly conserved segments. In one region, these share homology with known or suspected reverse transcriptases, as described by others in rodents and primates. In the second region, closer to the carboxy terminus, the strongly conserved segments are over 90% homologous among the four orders. One of the latter segments is cysteine rich and resembles the putative metal binding domains of nucleic acid binding proteins, including those of TFIIIA and retroviruses. PMID:3562227
Genomewide analysis of Drosophila circular RNAs reveals their structural and sequence properties and age-dependent neural accumulation

PubMed Central

Westholm, Jakub O.; Miura, Pedro; Olson, Sara; Shenker, Sol; Joseph, Brian; Sanfilippo, Piero; Celniker, Susan E.; Graveley, Brenton R.; Lai, Eric C.

2014-01-01

Circularization was recently recognized to broadly expand transcriptome complexity. Here, we exploit massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues and cultured cells, to rigorously annotate >2500 fruitfly circular RNAs. These mostly derive from back-splicing of protein-coding genes and lack poly(A) tails, and circularization of hundreds of genes is conserved across multiple Drosophila species. We elucidate structural and sequence properties of Drosophila circular RNAs, which exhibit commonalities and distinctions from mammalian circles. Notably, Drosophila circular RNAs harbor >1000 well-conserved canonical miRNA seed matches, especially within coding regions, and coding conserved miRNA sites reside preferentially within circularized exons. Finally, we analyze the developmental and tissue specificity of circular RNAs, and note their preferred derivation from neural genes and enhanced accumulation in neural tissues. Interestingly, circular isoforms increase dramatically relative to linear isoforms during CNS aging, and constitute a novel aging biomarker. PMID:25544350
Characterization of the complete mitochondrial genome of the hybrid Epinephelus moara♀ × Epinephelus lanceolatus♂, and phylogenetic analysis in subfamily epinephelinae

NASA Astrophysics Data System (ADS)

Gao, Fengtao; Wei, Min; Zhu, Ying; Guo, Hua; Chen, Songlin; Yang, Guanpin

2017-06-01

This study presents the complete mitochondrial genome of the hybrid Epinephelus moara♀× Epinephelus lanceolatus♂. The genome is 16886 bp in length, and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes, a light-strand replication origin and a control region. Additionally, phylogenetic analysis based on the nucleotide sequences of 13 conserved protein-coding genes using the maximum likelihood method indicated that the mitochondrial genome is maternally inherited. This study presents genomic data for studying phylogenetic relationships and breeding of hybrid Epinephelinae.
Unraveling the molecular mechanisms of nitrogenase conformational protection against oxygen in diazotrophic bacteria.

PubMed

Lery, Letícia M S; Bitar, Mainá; Costa, Mauricio G S; Rössle, Shaila C S; Bisch, Paulo M

2010-12-22

G. diazotrophicus and A. vinelandii are aerobic nitrogen-fixing bacteria. Although oxygen is essential for the survival of these organisms, it irreversibly inhibits nitrogenase, the complex responsible for nitrogen fixation. Both microorganisms deal with this paradox through compensatory mechanisms. In A. vinelandii a conformational protection mechanism occurs through the interaction between the nitrogenase complex and the FeSII protein. Previous studies suggested the existence of a similar system in G. diazotrophicus, but the putative protein involved was not yet described. This study intends to identify the protein coding gene in the recently sequenced genome of G. diazotrophicus and also provide detailed structural information of nitrogenase conformational protection in both organisms. Genomic analysis of G. diazotrophicus sequences revealed a protein coding ORF (Gdia0615) enclosing a conserved "fer2" domain, typical of the ferredoxin family and found in A. vinelandii FeSII. Comparative models of both FeSII and Gdia0615 disclosed a conserved beta-grasp fold. Cysteine residues that coordinate the 2[Fe-S] cluster are in conserved positions towards the metallocluster. Analysis of solvent accessible residues and electrostatic surfaces unveiled an hydrophobic dimerization interface. Dimers assembled by molecular docking presented a stable behaviour and a proper accommodation of regions possibly involved in binding of FeSII to nitrogenase throughout molecular dynamics simulations in aqueous solution. Molecular modeling of the nitrogenase complex of G. diazotrophicus was performed and models were compared to the crystal structure of A. vinelandii nitrogenase. Docking experiments of FeSII and Gdia0615 with its corresponding nitrogenase complex pointed out in both systems a putative binding site presenting shape and charge complementarities at the Fe-protein/MoFe-protein complex interface. The identification of the putative FeSII coding gene in G. diazotrophicus genome represents a large step towards the understanding of the conformational protection mechanism of nitrogenase against oxygen. In addition, this is the first study regarding the structural complementarities of FeSII-nitrogenase interactions in diazotrophic bacteria. The combination of bioinformatic tools for genome analysis, comparative protein modeling, docking calculations and molecular dynamics provided a powerful strategy for the elucidation of molecular mechanisms and structural features of FeSII-nitrogenase interaction.
LncRNA, a new component of expanding RNA-protein regulatory network important for animal sperm development.

PubMed

Zhang, Chenwang; Gao, Liuze; Xu, Eugene Yujun

2016-11-01

Spermatogenesis is one of the fundamental processes of sexual reproduction, present in almost all metazoan animals. Like many other reproductive traits, developmental features and traits of spermatogenesis are under strong selective pressure to change, both at morphological and underlying molecular levels. Yet evidence suggests that some fundamental features of spermatogenesis may be ancient and conserved among metazoan species. Identifying the underlying conserved molecular mechanisms could reveal core components of metazoan spermatogenic machinery and provide novel insight into causes of human infertility. Conserved RNA-binding proteins and their interacting RNA network emerge to be a common theme important for animal sperm development. We review research on the recent addition to the RNA family - Long non-coding RNA (lncRNA) and its roles in spermatogenesis in the context of the expanding RNA-protein network. Copyright Â© 2016 Elsevier Ltd. All rights reserved.
The mitochondrial genome of the phytopathogenic basidiomycete Moniliophthora perniciosa is 109 kb in size and contains a stable integrated plasmid.

PubMed

Formighieri, Eduardo F; Tiburcio, Ricardo A; Armas, Eduardo D; Medrano, Francisco J; Shimo, Hugo; Carels, Nicolas; Góes-Neto, Aristóteles; Cotomacci, Carolina; Carazzolle, Marcelo F; Sardinha-Pinto, Naiara; Thomazella, Daniela P T; Rincones, Johana; Digiampietri, Luciano; Carraro, Dirce M; Azeredo-Espin, Ana M; Reis, Sérgio F; Deckmann, Ana C; Gramacho, Karina; Gonçalves, Marilda S; Moura Neto, José P; Barbosa, Luciana V; Meinhardt, Lyndel W; Cascardo, Júlio C M; Pereira, Gonçalo A G

2008-10-01

We present here the sequence of the mitochondrial genome of the basidiomycete phytopathogenic hemibiotrophic fungus Moniliophthora perniciosa, causal agent of the Witches' Broom Disease in Theobroma cacao. The DNA is a circular molecule of 109,103 base pairs, with 31.9% GC, and is the largest sequenced so far. This size is due essentially to the presence of numerous non-conserved hypothetical ORFs. It contains the 14 genes coding for proteins involved in the oxidative phosphorylation, the two rRNA genes, one ORF coding for a ribosomal protein (rps3), and a set of 26 tRNA genes that recognize codons for all amino acids. Seven homing endonucleases are located inside introns. Except atp8, all conserved known genes are in the same orientation. Phylogenetic analysis based on the cox genes agrees with the commonly accepted fungal taxonomy. An uncommon feature of this mitochondrial genome is the presence of a region that contains a set of four, relatively small, nested, inverted repeats enclosing two genes coding for polymerases with an invertron-type structure and three conserved hypothetical genes interpreted as the stable integration of a mitochondrial linear plasmid. The integration of this plasmid seems to be a recent evolutionary event that could have implications in fungal biology. This sequence is available under GenBank accession number AY376688.
Specific and Modular Binding Code for Cytosine Recognition in Pumilio/FBF (PUF) RNA-binding Domains

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dong, Shuyun; Wang, Yang; Cassidy-Amstutz, Caleb

2011-10-28

Pumilio/fem-3 mRNA-binding factor (PUF) proteins possess a recognition code for bases A, U, and G, allowing designed RNA sequence specificity of their modular Pumilio (PUM) repeats. However, recognition side chains in a PUM repeat for cytosine are unknown. Here we report identification of a cytosine-recognition code by screening random amino acid combinations at conserved RNA recognition positions using a yeast three-hybrid system. This C-recognition code is specific and modular as specificity can be transferred to different positions in the RNA recognition sequence. A crystal structure of a modified PUF domain reveals specific contacts between an arginine side chain and themore » cytosine base. We applied the C-recognition code to design PUF domains that recognize targets with multiple cytosines and to generate engineered splicing factors that modulate alternative splicing. Finally, we identified a divergent yeast PUF protein, Nop9p, that may recognize natural target RNAs with cytosine. This work deepens our understanding of natural PUF protein target recognition and expands the ability to engineer PUF domains to recognize any RNA sequence.« less
Protein Kinases in Mammary Gland Development and Carcinogenesis

DTIC Science & Technology

1999-09-01

studies identical at the amino acid level to calcium/calmodulin-dependent may provide insight into mechanisms of growth control and DNA protein kinase I...human homologues of these kinases(19, 20 ). Amino acid conservation in the coding region between mouse and human Hunk is greater than 90% identical. While...genes (13, 14). Over the past 4 years , several of the mRNA and protein levels (39-46). These findings clearly dem- these breast cancer susceptibility
Genetic evidence for conserved non-coding element function across species–the ears have it

PubMed Central

Turner, Eric E.; Cox, Timothy C.

2014-01-01

Comparison of genomic sequences from diverse vertebrate species has revealed numerous highly conserved regions that do not appear to encode proteins or functional RNAs. Often these “conserved non-coding elements,” or CNEs, can direct gene expression to specific tissues in transgenic models, demonstrating they have regulatory function. CNEs are frequently found near “developmental” genes, particularly transcription factors, implying that these elements have essential regulatory roles in development. However, actual examples demonstrating CNE regulatory functions across species have been few, and recent loss-of-function studies of several CNEs in mice have shown relatively minor effects. In this Perspectives article, we discuss new findings in “fancy” rats and Highland cattle demonstrating that function of a CNE near the Hmx1 gene is crucial for normal external ear development and when disrupted can mimic loss-of function Hmx1 coding mutations in mice and humans. These findings provide important support for conserved developmental roles of CNEs in divergent species, and reinforce the concept that CNEs should be examined systematically in the ongoing search for genetic causes of human developmental disorders in the era of genome-scale sequencing. PMID:24478720
The complete mitochondrial genome and phylogenetic analysis of the giant panda (Ailuropoda melanoleuca).

PubMed

Peng, Rui; Zeng, Bo; Meng, Xiuxiang; Yue, Bisong; Zhang, Zhihe; Zou, Fangdong

2007-08-01

The complete mitochondrial genome sequence of the giant panda, Ailuropoda melanoleuca, was determined by the long and accurate polymerase chain reaction (LA-PCR) with conserved primers and primer walking sequence methods. The complete mitochondrial DNA is 16,805 nucleotides in length and contains two ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and one control region. The total length of the 13 protein-coding genes is longer than the American black bear, brown bear and polar bear by 3 amino acids at the end of ND5 gene. The codon usage also followed the typical vertebrate pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 5 (ND5) gene. The molecular phylogenetic analysis was performed on the sequences of 12 concatenated heavy-strand encoded protein-coding genes, and suggested that the giant panda is most closely related to bears.
Single Amino Acid Repeats in the Proteome World: Structural, Functional, and Evolutionary Insights

PubMed Central

Kumar, Amitha Sampath; Sowpati, Divya Tej; Mishra, Rakesh K.

2016-01-01

Microsatellites or simple sequence repeats (SSR) are abundant, highly diverse stretches of short DNA repeats present in all genomes. Tandem mono/tri/hexanucleotide repeats in the coding regions contribute to single amino acids repeats (SAARs) in the proteome. While SSRs in the coding region always result in amino acid repeats, a majority of SAARs arise due to a combination of various codons representing the same amino acid and not as a consequence of SSR events. Certain amino acids are abundant in repeat regions indicating a positive selection pressure behind the accumulation of SAARs. By analysing 22 proteomes including the human proteome, we explored the functional and structural relationship of amino acid repeats in an evolutionary context. Only ~15% of repeats are present in any known functional domain, while ~74% of repeats are present in the disordered regions, suggesting that SAARs add to the functionality of proteins by providing flexibility, stability and act as linker elements between domains. Comparison of SAAR containing proteins across species reveals that while shorter repeats are conserved among orthologs, proteins with longer repeats, >15 amino acids, are unique to the respective organism. Lysine repeats are well conserved among orthologs with respect to their length and number of occurrences in a protein. Other amino acids such as glutamic acid, proline, serine and alanine repeats are generally conserved among the orthologs with varying repeat lengths. These findings suggest that SAARs have accumulated in the proteome under positive selection pressure and that they provide flexibility for optimal folding of functional/structural domains of proteins. The insights gained from our observations can help in effective designing and engineering of proteins with novel features. PMID:27893794
The complete mitochondrial genome of the sandbar shark Carcharhinus plumbeus.

PubMed

Blower, Dean C; Ovenden, Jennifer R

2016-01-01

The sandbar shark, Carcharhinus plumbeus, a major representative species in shark fisheries worldwide is now considered vulnerable to overfishing. A pool of 774,234 Roche 454 shotgun sequences from one individual were assembled into a 16,706 bp mitogenome with 33× average coverage depth. It comprised 13 protein coding genes, 22 transfer RNA's, 2 ribosomal genes and 2 non-coding regions, typical of a vertebrate mitogenome. As expected for sharks, an A-T nucleotide bias was evident. This adds to rapidly growing number of mitogenome assemblies for the economically important Carcharhinidae family. The C. plumbeus mitogenome will assist researchers, fisheries and conservation managers interested in shark molecular systematics, phylogeography, conservation genetics, population and stock structure.
Rapid functional diversification in the structurally conserved ELAV family of neuronal RNA binding proteins

PubMed Central

Samson, Marie-Laure

2008-01-01

Background The Drosophila gene embryonic lethal abnormal visual system (elav) is the prototype of a gene family present in all metazoans. Its members encode structurally conserved neuronal proteins with three RNA Recognition Motifs (RRM) but they paradoxically act at diverse levels of post-transcriptional regulation. In an attempt to understand the history of this family, we searched for orthologs in eleven completely sequenced genomes, including those of humans, D. melanogaster and C. elegans, for which cDNAs are available. Results We analyzed 23 orthologs/paralogs of elav, and found evidence of gain/loss of gene copy number. For one set of genes, including elav itself, the coding sequences are free of introns and their products most resemble ELAV. The remaining genes show remarkable conservation of their exon organization, and their products most resemble FNE and RBP9, proteins encoded by the two elav paralogs of Drosophila. Remarkably, three of the conserved exon junctions are both close to structural elements, involved respectively in protein-RNA interactions and in the regulation of sub-cellular localization, and in the vicinity of diverse sequence variations. Conclusion The data indicate that the essential elav gene of Drosophila is newly emerged, restricted to dipterans and of retrotransposed origin. We propose that the conserved exon junctions constitute potential sites for sequence/function modifications, and that RRM binding proteins, whose function relies upon plastic RNA-protein interactions, may have played an important role in brain evolution. PMID:18715504
CombAlign: a code for generating a one-to-many sequence alignment from a set of pairwise structure-based sequence alignments.

PubMed

Zhou, Carol L Ecale

2015-01-01

In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.

The human fatty acid-binding protein family: Evolutionary divergences and functions

PubMed Central

2011-01-01

Fatty acid-binding proteins (FABPs) are members of the intracellular lipid-binding protein (iLBP) family and are involved in reversibly binding intracellular hydrophobic ligands and trafficking them throughout cellular compartments, including the peroxisomes, mitochondria, endoplasmic reticulum and nucleus. FABPs are small, structurally conserved cytosolic proteins consisting of a water-filled, interior-binding pocket surrounded by ten anti-parallel beta sheets, forming a beta barrel. At the superior surface, two alpha-helices cap the pocket and are thought to regulate binding. FABPs have broad specificity, including the ability to bind long-chain (C16-C20) fatty acids, eicosanoids, bile salts and peroxisome proliferators. FABPs demonstrate strong evolutionary conservation and are present in a spectrum of species including Drosophila melanogaster, Caenorhabditis elegans, mouse and human. The human genome consists of nine putatively functional protein-coding FABP genes. The most recently identified family member, FABP12, has been less studied. PMID:21504868
Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.

PubMed

2004-12-09

We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Forty-four novel protein-coding loci discovered using a proteomics informed by transcriptomics (PIT) approach in rat male germ cells.

PubMed

Chocu, Sophie; Evrard, Bertrand; Lavigne, Régis; Rolland, Antoine D; Aubry, Florence; Jégou, Bernard; Chalmel, Frédéric; Pineau, Charles

2014-11-01

Spermatogenesis is a complex process, dependent upon the successive activation and/or repression of thousands of gene products, and ends with the production of haploid male gametes. RNA sequencing of male germ cells in the rat identified thousands of novel testicular unannotated transcripts (TUTs). Although such RNAs are usually annotated as long noncoding RNAs (lncRNAs), it is possible that some of these TUTs code for protein. To test this possibility, we used a "proteomics informed by transcriptomics" (PIT) strategy combining RNA sequencing data with shotgun proteomics analyses of spermatocytes and spermatids in the rat. Among 3559 TUTs and 506 lncRNAs found in meiotic and postmeiotic germ cells, 44 encoded at least one peptide. We showed that these novel high-confidence protein-coding loci exhibit several genomic features intermediate between those of lncRNAs and mRNAs. We experimentally validated the testicular expression pattern of two of these novel protein-coding gene candidates, both highly conserved in mammals: one for a vesicle-associated membrane protein we named VAMP-9, and the other for an enolase domain-containing protein. This study confirms the potential of PIT approaches for the discovery of protein-coding transcripts initially thought to be untranslated or unknown transcripts. Our results contribute to the understanding of spermatogenesis by characterizing two novel proteins, implicated by their strong expression in germ cells. The mass spectrometry proteomics data have been deposited with the ProteomeXchange Consortium under the data set identifier PXD000872. © 2014 by the Society for the Study of Reproduction, Inc.
Ferritin gene organization: differences between plants and animals suggest possible kingdom-specific selective constraints.

PubMed

Proudhon, D; Wei, J; Briat, J; Theil, E C

1996-03-01

Ferritin, a protein widespread in nature, concentrates iron approximately 10(11)-10(12)-fold above the solubility within a spherical shell of 24 subunits; it derives in plants and animals from a common ancestor (based on sequence) but displays a cytoplasmic location in animals compared to the plastid in contemporary plants. Ferritin gene regulation in plants and animals is altered by development, hormones, and excess iron; iron signals target DNA in plants but mRNA in animals. Evolution has thus conserved the two end points of ferritin gene expression, the physiological signals and the protein structure, while allowing some divergence of the genetic mechanisms. Comparison of ferritin gene organization in plants and animals, made possible by the cloning of a dicot (soybean) ferritin gene presented here and the recent cloning of two monocot (maize) ferritin genes, shows evolutionary divergence in ferritin gene organization between plants and animals but conservation among plants or among animals; divergence in the genetic mechanism for iron regulation is reflected by the absence in all three plant genes of the IRE, a highly conserved, noncoding sequence in vertebrate animal ferritin mRNA. In plant ferritin genes, the number of introns (n = 7) is higher than in animals (n = 3). Second, no intron positions are conserved when ferritin genes of plants and animals are compared, although all ferritin gene introns are in the coding region; within kingdoms, the intron positions in ferritin genes are conserved. Finally, secondary protein structure has no apparent relationship to intron/exon boundaries in plant ferritin genes, whereas in animal ferritin genes the correspondence is high. The structural differences in introns/exons among phylogenetically related ferritin coding sequences and the high conservation of the gene structure within plant or animal kingdoms of the gene structure within plant or animal kingdoms suggest that kingdom-specific functional constraints may exist to maintain a particular intron/exon pattern within ferritin genes. In the case of plants, where ferritin gene intron placement is unrelated to triplet codons or protein structure, and where ferritin is targeted to the plastid, the selection pressure on gene organization may relate to RNA function and plastid/nuclear signaling.
The PE/PPE multigene family codes for virulence factors and is a possible source of mycobacterial antigenic variation: perhaps more?

PubMed

Akhter, Yusuf; Ehebauer, Matthias T; Mukhopadhyay, Sangita; Hasnain, Seyed E

2012-01-01

The PE/PPE multigene family codes for approximately 10% of the Mycobacterium tuberculosis proteome and is encoded by 176 open reading frames. These proteins possess, and have been named after, the conserved proline-glutamate (PE) or proline-proline-glutamate (PPE) motifs at their N-terminus. Their genes have a conserved structure and repeat motifs that could be a potential source of antigenic variation in M. tuberculosis. PE/PPE genes are scattered throughout the genome and PE/PPE pairs are usually encoded in bicistronic operons although this is not universally so. This gene family has evolved by specific gene duplication events. PE/PPE proteins are either secreted or localized to the cell surface. Several are thought to be virulence factors, which participate in evasion of the host immune response. This review summarizes the current knowledge about the gene family in order to better understand its biological function. Copyright © 2011 Elsevier Masson SAS. All rights reserved.
Evolutionary conservation of the polyproline II conformation surrounding intrinsically disordered phosphorylation sites.

PubMed

Elam, W Austin; Schrank, Travis P; Campagnolo, Andrew J; Hilser, Vincent J

2013-04-01

Intrinsically disordered (ID) proteins function in the absence of a unique stable structure and appear to challenge the classic structure-function paradigm. The extent to which ID proteins take advantage of subtle conformational biases to perform functions, and whether signals for such mechanism can be identified in proteome-wide studies is not well understood. Of particular interest is the polyproline II (PII) conformation, suggested to be highly populated in unfolded proteins. We experimentally determine a complete calorimetric propensity scale for the PII conformation. Projection of the scale into representative eukaryotic proteomes reveals significant PII bias in regions coding for ID proteins. Importantly, enrichment of PII in ID proteins, or protein segments, is also captured by other PII scales, indicating that this enrichment is robustly encoded and universally detectable regardless of the method of PII propensity determination. Gene ontology (GO) terms obtained using our PII scale and other scales demonstrate a consensus for molecular functions performed by high PII proteins across the proteome. Perhaps the most striking result of the GO analysis is conserved enrichment (P < 10(-8) ) of phosphorylation sites in high PII regions found by all PII scales. Subsequent conformational analysis reveals a phosphorylation-dependent modulation of PII, suggestive of a conserved "tunability" within these regions. In summary, the application of an experimentally determined polyproline II (PII) propensity scale to proteome-wide sequence analysis and gene ontology reveals an enrichment of PII bias near disordered phosphorylation sites that is conserved throughout eukaryotes. Copyright © 2013 The Protein Society.
Molecular Evolution of the Non-Coding Eosinophil Granule Ontogeny Transcript

PubMed Central

Rose, Dominic; Stadler, Peter F.

2011-01-01

Eukaryotic genomes are pervasively transcribed. A large fraction of the transcriptional output consists of long, mRNA-like, non-protein-coding transcripts (mlncRNAs). The evolutionary history of mlncRNAs is still largely uncharted territory. In this contribution, we explore in detail the evolutionary traces of the eosinophil granule ontogeny transcript (EGOT), an experimentally confirmed representative of an abundant class of totally intronic non-coding transcripts (TINs). EGOT is located antisense to an intron of the ITPR1 gene. We computationally identify putative EGOT orthologs in the genomes of 32 different amniotes, including orthologs from primates, rodents, ungulates, carnivores, afrotherians, and xenarthrans, as well as putative candidates from basal amniotes, such as opossum or platypus. We investigate the EGOT gene phylogeny, analyze patterns of sequence conservation, and the evolutionary conservation of the EGOT gene structure. We show that EGO-B, the spliced isoform, may be present throughout the placental mammals, but most likely dates back even further. We demonstrate here for the first time that the whole EGOT locus is highly structured, containing several evolutionary conserved, and thermodynamic stable secondary structures. Our analyses allow us to postulate novel functional roles of a hitherto poorly understood region at the intron of EGO-B which is highly conserved at the sequence level. The region contains a novel ITPR1 exon and also conserved RNA secondary structures together with a conserved TATA-like element, which putatively acts as a promoter of an independent regulatory element. PMID:22303364
The kinetoplast DNA of the Australian trypanosome, Trypanosoma copemani, shares features with Trypanosoma cruzi and Trypanosoma lewisi.

PubMed

Botero, Adriana; Kapeller, Irit; Cooper, Crystal; Clode, Peta L; Shlomai, Joseph; Thompson, R C Andrew

2018-05-17

Kinetoplast DNA (kDNA) is the mitochondrial genome of trypanosomatids. It consists of a few dozen maxicircles and several thousand minicircles, all catenated topologically to form a two-dimensional DNA network. Minicircles are heterogeneous in size and sequence among species. They present one or several conserved regions that contain three highly conserved sequence blocks. CSB-1 (10 bp sequence) and CSB-2 (8 bp sequence) present lower interspecies homology, while CSB-3 (12 bp sequence) or the Universal Minicircle Sequence is conserved within most trypanosomatids. The Universal Minicircle Sequence is located at the replication origin of the minicircles, and is the binding site for the UMS binding protein, a protein involved in trypanosomatid survival and virulence. Here, we describe the structure and organisation of the kDNA of Trypanosoma copemani, a parasite that has been shown to infect mammalian cells and has been associated with the drastic decline of the endangered Australian marsupial, the woylie (Bettongia penicillata). Deep genomic sequencing showed that T. copemani presents two classes of minicircles that share sequence identity and organisation in the conserved sequence blocks with those of Trypanosoma cruzi and Trypanosoma lewisi. A 19,257 bp partial region of the maxicircle of T. copemani that contained the entire coding region was obtained. Comparative analysis of the T. copemani entire maxicircle coding region with the coding regions of T. cruzi and T. lewisi showed they share 71.05% and 71.28% identity, respectively. The shared features in the maxicircle/minicircle organisation and sequence between T. copemani and T. cruzi/T. lewisi suggest similarities in their process of kDNA replication, and are of significance in understanding the evolution of Australian trypanosomes. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
A two-dimensional proteome reference map of Herbaspirillum seropedicae proteins.

PubMed

Chaves, Daniela Fojo Seixas; Ferrer, Pércio Pereira; de Souza, Emanuel Maltempi; Gruz, Leonardo Magalhães; Monteiro, Rose Adele; de Oliveira Pedrosa, Fábio

2007-10-01

Herbaspirillum seropedicae is an endophytic diazotroph associated with economically important crops such as rice, sugarcane, and wheat. Here, we present a 2-D reference map for H. seropedicae. Using MALDI-TOF-MS we identified 205 spots representing 173 different proteins with a calculated average of 1.18 proteins/gene. Seventeen hypothetical or conserved hypothetical ORFs were shown to code for true gene products. These data will support the genome annotation process and provide a basis on which to undertake comparative proteomic studies.
Hidden Structural Codes in Protein Intrinsic Disorder.

PubMed

Borkosky, Silvia S; Camporeale, Gabriela; Chemes, Lucía B; Risso, Marikena; Noval, María Gabriela; Sánchez, Ignacio E; Alonso, Leonardo G; de Prat Gay, Gonzalo

2017-10-17

Intrinsic disorder is a major structural category in biology, accounting for more than 30% of coding regions across the domains of life, yet consists of conformational ensembles in equilibrium, a major challenge in protein chemistry. Anciently evolved papillomavirus genomes constitute an unparalleled case for sequence to structure-function correlation in cases in which there are no folded structures. E7, the major transforming oncoprotein of human papillomaviruses, is a paradigmatic example among the intrinsically disordered proteins. Analysis of a large number of sequences of the same viral protein allowed for the identification of a handful of residues with absolute conservation, scattered along the sequence of its N-terminal intrinsically disordered domain, which intriguingly are mostly leucine residues. Mutation of these led to a pronounced increase in both α-helix and β-sheet structural content, reflected by drastic effects on equilibrium propensities and oligomerization kinetics, and uncovers the existence of local structural elements that oppose canonical folding. These folding relays suggest the existence of yet undefined hidden structural codes behind intrinsic disorder in this model protein. Thus, evolution pinpoints conformational hot spots that could have not been identified by direct experimental methods for analyzing or perturbing the equilibrium of an intrinsically disordered protein ensemble.
JDet: interactive calculation and visualization of function-related conservation patterns in multiple sequence alignments and structures.

PubMed

Muth, Thilo; García-Martín, Juan A; Rausell, Antonio; Juan, David; Valencia, Alfonso; Pazos, Florencio

2012-02-15

We have implemented in a single package all the features required for extracting, visualizing and manipulating fully conserved positions as well as those with a family-dependent conservation pattern in multiple sequence alignments. The program allows, among other things, to run different methods for extracting these positions, combine the results and visualize them in protein 3D structures and sequence spaces. JDet is a multiplatform application written in Java. It is freely available, including the source code, at http://csbg.cnb.csic.es/JDet. The package includes two of our recently developed programs for detecting functional positions in protein alignments (Xdet and S3Det), and support for other methods can be added as plug-ins. A help file and a guided tutorial for JDet are also available.
Protein structure and the sequential structure of mRNA: alpha-helix and beta-sheet signals at the nucleotide level.

PubMed

Brunak, S; Engelbrecht, J

1996-06-01

A direct comparison of experimentally determined protein structures and their corresponding protein coding mRNA sequences has been performed. We examine whether real world data support the hypothesis that clusters of rare codons correlate with the location of structural units in the resulting protein. The degeneracy of the genetic code allows for a biased selection of codons which may control the translational rate of the ribosome, and may thus in vivo have a catalyzing effect on the folding of the polypeptide chain. A complete search for GenBank nucleotide sequences coding for structural entries in the Brookhaven Protein Data Bank produced 719 protein chains with matching mRNA sequence, amino acid sequence, and secondary structure assignment. By neural network analysis, we found strong signals in mRNA sequence regions surrounding helices and sheets. These signals do not originate from the clustering of rare codons, but from the similarity of codons coding for very abundant amino acid residues at the N- and C-termini of helices and sheets. No correlation between the positioning of rare codons and the location of structural units was found. The mRNA signals were also compared with conserved nucleotide features of 16S-like ribosomal RNA sequences and related to mechanisms for maintaining the correct reading frame by the ribosome.
Crystal structure of Bacillus subtilis YabJ, a purine regulatory protein and member of the highly conserved YjgF family

PubMed Central

Sinha, Sangita; Rappu, Pekka; Lange, S. C.; Mäntsälä, Pekka; Zalkin, Howard; Smith, Janet L.

1999-01-01

The yabJ gene in Bacillus subtilis is required for adenine-mediated repression of purine biosynthetic genes in vivo and codes for an acid-soluble, 14-kDa protein. The molecular mechanism of YabJ is unknown. YabJ is a member of a large, widely distributed family of proteins of unknown biochemical function. The 1.7-Å crystal structure of YabJ reveals a trimeric organization with extensive buried hydrophobic surface and an internal water-filled cavity. The most important finding in the structure is a deep, narrow cleft between subunits lined with nine side chains that are invariant among the 25 most similar homologs. This conserved site is proposed to be a binding or catalytic site for a ligand or substrate that is common to YabJ and other members of the YER057c/YjgF/UK114 family of proteins. PMID:10557275
Genome analysis of the platypus reveals unique signatures of evolution.

PubMed

Warren, Wesley C; Hillier, LaDeana W; Marshall Graves, Jennifer A; Birney, Ewan; Ponting, Chris P; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P; Miethke, Pat; Waters, Paul D; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S; López-Otín, Carlos; Ordóñez, Gonzalo R; Eichler, Evan E; Chen, Lin; Cheng, Ze; Deakin, Janine E; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T; Wakefield, Matthew J; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A; Smit, Arian F A; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A; Walker, Jerilyn A; Konkel, Miriam K; Harris, Robert S; Whittington, Camilla M; Wong, Emily S W; Gemmell, Neil J; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R; Ray, David A; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H; Taylor, James; Jones, Russell C; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N; Pohl, Craig S; Smith, Scott M; Hou, Shunfeng; Nefedov, Mikhail; de Jong, Pieter J; Renfree, Marilyn B; Mardis, Elaine R; Wilson, Richard K

2008-05-08

We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.
Genome analysis of the platypus reveals unique signatures of evolution

PubMed Central

Warren, Wesley C.; Hillier, LaDeana W.; Marshall Graves, Jennifer A.; Birney, Ewan; Ponting, Chris P.; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T.; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P.; Miethke, Pat; Waters, Paul D.; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S.; López-Otín, Carlos; Ordóñez, Gonzalo R.; Eichler, Evan E.; Chen, Lin; Cheng, Ze; Deakin, Janine E.; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T.; Wakefield, Matthew J.; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A.; Smit, Arian F. A.; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A.; Walker, Jerilyn A.; Konkel, Miriam K.; Harris, Robert S.; Whittington, Camilla M.; Wong, Emily S. W.; Gemmell, Neil J.; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M.; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P.; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J.; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M.; Sharp, Julie A.; Nicholas, Kevin R.; Ray, David A.; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H.; Taylor, James; Jones, Russell C.; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N.; Pohl, Craig S.; Smith, Scott M.; Hou, Shunfeng; Renfree, Marilyn B.; Mardis, Elaine R.; Wilson, Richard K.

2009-01-01

We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation. PMID:18464734
Evolution and Diversity of the Human Hepatitis D Virus Genome

PubMed Central

Huang, Chi-Ruei; Lo, Szecheng J.

2010-01-01

Human hepatitis delta virus (HDV) is the smallest RNA virus in genome. HDV genome is divided into a viroid-like sequence and a protein-coding sequence which could have originated from different resources and the HDV genome was eventually constituted through RNA recombination. The genome subsequently diversified through accumulation of mutations selected by interactions between the mutated RNA and proteins with host factors to successfully form the infectious virions. Therefore, we propose that the conservation of HDV nucleotide sequence is highly related with its functionality. Genome analysis of known HDV isolates shows that the C-terminal coding sequences of large delta antigen (LDAg) are the highest diversity than other regions of protein-coding sequences but they still retain biological functionality to interact with the heavy chain of clathrin can be selected and maintained. Since viruses interact with many host factors, including escaping the host immune response, how to design a program to predict RNA genome evolution is a great challenging work. PMID:20204073
Evolution of the alternative AQP2 gene: Acquisition of a novel protein-coding sequence in dolphins.

PubMed

Kishida, Takushi; Suzuki, Miwa; Takayama, Asuka

2018-01-01

Taxon-specific de novo protein-coding sequences are thought to be important for taxon-specific environmental adaptation. A recent study revealed that bottlenose dolphins acquired a novel isoform of aquaporin 2 generated by alternative splicing (alternative AQP2), which helps dolphins to live in hyperosmotic seawater. The AQP2 gene consists of four exons, but the alternative AQP2 gene lacks the fourth exon and instead has a longer third exon that includes the original third exon and a part of the original third intron. Here, we show that the latter half of the third exon of the alternative AQP2 arose from a non-protein-coding sequence. Intact ORF of this de novo sequence is shared not by all cetaceans, but only by delphinoids. However, this sequence is conservative in all modern cetaceans, implying that this de novo sequence potentially plays important roles for marine adaptation in cetaceans. Copyright © 2017 Elsevier Inc. All rights reserved.
Evolution of coding and non-coding genes in HOX clusters of a marsupial.

PubMed

Yu, Hongshi; Lindsay, James; Feng, Zhi-Ping; Frankenberg, Stephen; Hu, Yanqiu; Carone, Dawn; Shaw, Geoff; Pask, Andrew J; O'Neill, Rachel; Papenfuss, Anthony T; Renfree, Marilyn B

2012-06-18

The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial.
Evolution of coding and non-coding genes in HOX clusters of a marsupial

PubMed Central

2012-01-01

Background The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Results Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. Conclusions This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial. PMID:22708672
Sounds of silence: synonymous nucleotides as a key to biological regulation and complexity

PubMed Central

Shabalina, Svetlana A.; Spiridonov, Nikolay A.; Kashina, Anna

2013-01-01

Messenger RNA is a key component of an intricate regulatory network of its own. It accommodates numerous nucleotide signals that overlap protein coding sequences and are responsible for multiple levels of regulation and generation of biological complexity. A wealth of structural and regulatory information, which mRNA carries in addition to the encoded amino acid sequence, raises the question of how these signals and overlapping codes are delineated along non-synonymous and synonymous positions in protein coding regions, especially in eukaryotes. Silent or synonymous codon positions, which do not determine amino acid sequences of the encoded proteins, define mRNA secondary structure and stability and affect the rate of translation, folding and post-translational modifications of nascent polypeptides. The RNA level selection is acting on synonymous sites in both prokaryotes and eukaryotes and is more common than previously thought. Selection pressure on the coding gene regions follows three-nucleotide periodic pattern of nucleotide base-pairing in mRNA, which is imposed by the genetic code. Synonymous positions of the coding regions have a higher level of hybridization potential relative to non-synonymous positions, and are multifunctional in their regulatory and structural roles. Recent experimental evidence and analysis of mRNA structure and interspecies conservation suggest that there is an evolutionary tradeoff between selective pressure acting at the RNA and protein levels. Here we provide a comprehensive overview of the studies that define the role of silent positions in regulating RNA structure and processing that exert downstream effects on proteins and their functions. PMID:23293005

Computer analysis of protein functional sites projection on exon structure of genes in Metazoa.

PubMed

Medvedeva, Irina V; Demenkov, Pavel S; Ivanisenko, Vladimir A

2015-01-01

Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residues that are distantly located from each other in the amino acid sequence. They are highly conserved within their functional group and vary significantly in structure between such groups. According to this facts analysis of the general properties of the structural organization of the functional sites at the protein level and, at the level of exon-intron structure of the coding gene is still an actual problem. One approach to this analysis is the projection of amino acid residue positions of the functional sites along with the exon boundaries to the gene structure. In this paper, we examined the discontinuity of the functional sites in the exon-intron structure of genes and the distribution of lengths and phases of the functional site encoding exons in vertebrate genes. We have shown that the DNA fragments coding the functional sites were in the same exons, or in close exons. The observed tendency to cluster the exons that code functional sites which could be considered as the unit of protein evolution. We studied the characteristics of the structure of the exon boundaries that code, and do not code, functional sites in 11 Metazoa species. This is accompanied by a reduced frequency of intercodon gaps (phase 0) in exons encoding the amino acid residue functional site, which may be evidence of the existence of evolutionary limitations to the exon shuffling. These results characterize the features of the coding exon-intron structure that affect the functionality of the encoded protein and allow a better understanding of the emergence of biological diversity.
RNAi mediates post-transcriptional repression of gene expression in fission yeast Schizosaccharomyces pombe

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smialowska, Agata, E-mail: smialowskaa@gmail.com; School of Life Sciences, Södertörn Högskola, Huddinge 141-89; Djupedal, Ingela

Highlights: • Protein coding genes accumulate anti-sense sRNAs in fission yeast S. pombe. • RNAi represses protein-coding genes in S. pombe. • RNAi-mediated gene repression is post-transcriptional. - Abstract: RNA interference (RNAi) is a gene silencing mechanism conserved from fungi to mammals. Small interfering RNAs are products and mediators of the RNAi pathway and act as specificity factors in recruiting effector complexes. The Schizosaccharomyces pombe genome encodes one of each of the core RNAi proteins, Dicer, Argonaute and RNA-dependent RNA polymerase (dcr1, ago1, rdp1). Even though the function of RNAi in heterochromatin assembly in S. pombe is established, its rolemore » in controlling gene expression is elusive. Here, we report the identification of small RNAs mapped anti-sense to protein coding genes in fission yeast. We demonstrate that these genes are up-regulated at the protein level in RNAi mutants, while their mRNA levels are not significantly changed. We show that the repression by RNAi is not a result of heterochromatin formation. Thus, we conclude that RNAi is involved in post-transcriptional gene silencing in S. pombe.« less
A Novel Family in Medicago truncatula Consisting of More Than 300 Nodule-Specific Genes Coding for Small, Secreted Polypeptides with Conserved Cysteine Motifs1[w

PubMed Central

Mergaert, Peter; Nikovics, Krisztina; Kelemen, Zsolt; Maunoury, Nicolas; Vaubert, Danièle; Kondorosi, Adam; Kondorosi, Eva

2003-01-01

Transcriptome analysis of Medicago truncatula nodules has led to the discovery of a gene family named NCR (nodule-specific cysteine rich) with more than 300 members. The encoded polypeptides were short (60–90 amino acids), carried a conserved signal peptide, and, except for a conserved cysteine motif, displayed otherwise extensive sequence divergence. Family members were found in pea (Pisum sativum), broad bean (Vicia faba), white clover (Trifolium repens), and Galega orientalis but not in other plants, including other legumes, suggesting that the family might be specific for galegoid legumes forming indeterminate nodules. Gene expression of all family members was restricted to nodules except for two, also expressed in mycorrhizal roots. NCR genes exhibited distinct temporal and spatial expression patterns in nodules and, thus, were coupled to different stages of development. The signal peptide targeted the polypeptides in the secretory pathway, as shown by green fluorescent protein fusions expressed in onion (Allium cepa) epidermal cells. Coregulation of certain NCR genes with genes coding for a potentially secreted calmodulin-like protein and for a signal peptide peptidase suggests a concerted action in nodule development. Potential functions of the NCR polypeptides in cell-to-cell signaling and creation of a defense system are discussed. PMID:12746522
Antisense Transcription Is Pervasive but Rarely Conserved in Enteric Bacteria

PubMed Central

Raghavan, Rahul; Sloan, Daniel B.; Ochman, Howard

2012-01-01

ABSTRACT Noncoding RNAs, including antisense RNAs (asRNAs) that originate from the complementary strand of protein-coding genes, are involved in the regulation of gene expression in all domains of life. Recent application of deep-sequencing technologies has revealed that the transcription of asRNAs occurs genome-wide in bacteria. Although the role of the vast majority of asRNAs remains unknown, it is often assumed that their presence implies important regulatory functions, similar to those of other noncoding RNAs. Alternatively, many antisense transcripts may be produced by chance transcription events from promoter-like sequences that result from the degenerate nature of bacterial transcription factor binding sites. To investigate the biological relevance of antisense transcripts, we compared genome-wide patterns of asRNA expression in closely related enteric bacteria, Escherichia coli and Salmonella enterica serovar Typhimurium, by performing strand-specific transcriptome sequencing. Although antisense transcripts are abundant in both species, less than 3% of asRNAs are expressed at high levels in both species, and only about 14% appear to be conserved among species. And unlike the promoters of protein-coding genes, asRNA promoters show no evidence of sequence conservation between, or even within, species. Our findings suggest that many or even most bacterial asRNAs are nonadaptive by-products of the cell’s transcription machinery. PMID:22872780
DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats

PubMed Central

de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

2015-01-01

Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. PMID:26481363
Detection of hyper-conserved regions in hepatitis B virus X gene potentially useful for gene therapy.

PubMed

González, Carolina; Tabernero, David; Cortese, Maria Francesca; Gregori, Josep; Casillas, Rosario; Riveiro-Barciela, Mar; Godoy, Cristina; Sopena, Sara; Rando, Ariadna; Yll, Marçal; Lopez-Martinez, Rosa; Quer, Josep; Esteban, Rafael; Buti, Maria; Rodríguez-Frías, Francisco

2018-05-21

To detect hyper-conserved regions in the hepatitis B virus (HBV) X gene ( HBX ) 5' region that could be candidates for gene therapy. The study included 27 chronic hepatitis B treatment-naive patients in various clinical stages (from chronic infection to cirrhosis and hepatocellular carcinoma, both HBeAg-negative and HBeAg-positive), and infected with HBV genotypes A-F and H. In a serum sample from each patient with viremia > 3.5 log IU/mL, the HBX 5' end region [nucleotide (nt) 1255-1611] was PCR-amplified and submitted to next-generation sequencing (NGS). We assessed genotype variants by phylogenetic analysis, and evaluated conservation of this region by calculating the information content of each nucleotide position in a multiple alignment of all unique sequences (haplotypes) obtained by NGS. Conservation at the HBx protein amino acid (aa) level was also analyzed. NGS yielded 1333069 sequences from the 27 samples, with a median of 4578 sequences/sample (2487-9279, IQR 2817). In 14/27 patients (51.8%), phylogenetic analysis of viral nucleotide haplotypes showed a complex mixture of genotypic variants. Analysis of the information content in the haplotype multiple alignments detected 2 hyper-conserved nucleotide regions, one in the HBX upstream non-coding region (nt 1255-1286) and the other in the 5' end coding region (nt 1519-1603). This last region coded for a conserved amino acid region (aa 63-76) that partially overlaps a Kunitz-like domain. Two hyper-conserved regions detected in the HBX 5' end may be of value for targeted gene therapy, regardless of the patients' clinical stage or HBV genotype.
Transcriptome interrogation of human myometrium identifies differentially expressed sense-antisense pairs of protein-coding and long non-coding RNA genes in spontaneous labor at term.

PubMed

Romero, Roberto; Tarca, Adi L; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S; Kalita, Cynthia A; Cai, Juan; Yeo, Lami; Lipovich, Leonard

2014-09-01

To identify differentially expressed long non-coding RNA (lncRNA) genes in human myometrium in women with spontaneous labor at term. Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n = 19) and women in spontaneous labor at term (n = 20). RNA was extracted and profiled using an Illumina® microarray platform. We have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. We identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an experimental method completely independent of the microarray analysis. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site, that lacked evolutionary conservation beyond primates. We provide, for the first time, evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term.
Diversity of Antisense and Other Non-Coding RNAs in Archaea Revealed by Comparative Small RNA Sequencing in Four Pyrobaculum Species

PubMed Central

Bernick, David L.; Dennis, Patrick P.; Lui, Lauren M.; Lowe, Todd M.

2012-01-01

A great diversity of small, non-coding RNA (ncRNA) molecules with roles in gene regulation and RNA processing have been intensely studied in eukaryotic and bacterial model organisms, yet our knowledge of possible parallel roles for small RNAs (sRNA) in archaea is limited. We employed RNA-seq to identify novel sRNA across multiple species of the hyperthermophilic genus Pyrobaculum, known for unusual RNA gene characteristics. By comparing transcriptional data collected in parallel among four species, we were able to identify conserved RNA genes fitting into known and novel families. Among our findings, we highlight three novel cis-antisense sRNAs encoded opposite to key regulatory (ferric uptake regulator), metabolic (triose-phosphate isomerase), and core transcriptional apparatus genes (transcription factor B). We also found a large increase in the number of conserved C/D box sRNA genes over what had been previously recognized; many of these genes are encoded antisense to protein coding genes. The conserved opposition to orthologous genes across the Pyrobaculum genus suggests similarities to other cis-antisense regulatory systems. Furthermore, the genus-specific nature of these sRNAs indicates they are relatively recent, stable adaptations. PMID:22783241
The Arabidopsis TOR Kinase Specifically Regulates the Expression of Nuclear Genes Coding for Plastidic Ribosomal Proteins and the Phosphorylation of the Cytosolic Ribosomal Protein S6

PubMed Central

Dobrenel, Thomas; Mancera-Martínez, Eder; Forzani, Céline; Azzopardi, Marianne; Davanture, Marlène; Moreau, Manon; Schepetilnikov, Mikhail; Chicher, Johana; Langella, Olivier; Zivy, Michel; Robaglia, Christophe; Ryabova, Lyubov A.; Hanson, Johannes; Meyer, Christian

2016-01-01

Protein translation is an energy consuming process that has to be fine-tuned at both the cell and organism levels to match the availability of resources. The target of rapamycin kinase (TOR) is a key regulator of a large range of biological processes in response to environmental cues. In this study, we have investigated the effects of TOR inactivation on the expression and regulation of Arabidopsis ribosomal proteins at different levels of analysis, namely from transcriptomic to phosphoproteomic. TOR inactivation resulted in a coordinated down-regulation of the transcription and translation of nuclear-encoded mRNAs coding for plastidic ribosomal proteins, which could explain the chlorotic phenotype of the TOR silenced plants. We have identified in the 5′ untranslated regions (UTRs) of this set of genes a conserved sequence related to the 5′ terminal oligopyrimidine motif, which is known to confer translational regulation by the TOR kinase in other eukaryotes. Furthermore, the phosphoproteomic analysis of the ribosomal fraction following TOR inactivation revealed a lower phosphorylation of the conserved Ser240 residue in the C-terminal region of the 40S ribosomal protein S6 (RPS6). These results were confirmed by Western blot analysis using an antibody that specifically recognizes phosphorylated Ser240 in RPS6. Finally, this antibody was used to follow TOR activity in plants. Our results thus uncover a multi-level regulation of plant ribosomal genes and proteins by the TOR kinase. PMID:27877176
Antisense transcription is pervasive but rarely conserved in enteric bacteria.

PubMed

Raghavan, Rahul; Sloan, Daniel B; Ochman, Howard

2012-01-01

Noncoding RNAs, including antisense RNAs (asRNAs) that originate from the complementary strand of protein-coding genes, are involved in the regulation of gene expression in all domains of life. Recent application of deep-sequencing technologies has revealed that the transcription of asRNAs occurs genome-wide in bacteria. Although the role of the vast majority of asRNAs remains unknown, it is often assumed that their presence implies important regulatory functions, similar to those of other noncoding RNAs. Alternatively, many antisense transcripts may be produced by chance transcription events from promoter-like sequences that result from the degenerate nature of bacterial transcription factor binding sites. To investigate the biological relevance of antisense transcripts, we compared genome-wide patterns of asRNA expression in closely related enteric bacteria, Escherichia coli and Salmonella enterica serovar Typhimurium, by performing strand-specific transcriptome sequencing. Although antisense transcripts are abundant in both species, less than 3% of asRNAs are expressed at high levels in both species, and only about 14% appear to be conserved among species. And unlike the promoters of protein-coding genes, asRNA promoters show no evidence of sequence conservation between, or even within, species. Our findings suggest that many or even most bacterial asRNAs are nonadaptive by-products of the cell's transcription machinery. IMPORTANCE Application of high-throughput methods has revealed the expression throughout bacterial genomes of transcripts encoded on the strand complementary to protein-coding genes. Because transcription is costly, it is usually assumed that these transcripts, termed antisense RNAs (asRNAs), serve some function; however, the role of most asRNAs is unclear, raising questions about their relevance in cellular processes. Because natural selection conserves functional elements, comparisons between related species provide a method for assessing functionality genome-wide. Applying such an approach, we assayed all transcripts in two closely related bacteria, Escherichia coli and Salmonella enterica serovar Typhimurium, and demonstrate that, although the levels of genome-wide antisense transcription are similarly high in both bacteria, only a small fraction of asRNAs are shared across species. Moreover, the promoters associated with asRNAs show no evidence of sequence conservation between, or even within, species. These findings indicate that despite the genome-wide transcription of asRNAs, many of these transcripts are likely nonfunctional.
Mistranslation: from adaptations to applications.

PubMed

Hoffman, Kyle S; O'Donoghue, Patrick; Brandl, Christopher J

2017-11-01

The conservation of the genetic code indicates that there was a single origin, but like all genetic material, the cell's interpretation of the code is subject to evolutionary pressure. Single nucleotide variations in tRNA sequences can modulate codon assignments by altering codon-anticodon pairing or tRNA charging. Either can increase translation errors and even change the code. The frozen accident hypothesis argued that changes to the code would destabilize the proteome and reduce fitness. In studies of model organisms, mistranslation often acts as an adaptive response. These studies reveal evolutionary conserved mechanisms to maintain proteostasis even during high rates of mistranslation. This review discusses the evolutionary basis of altered genetic codes, how mistranslation is identified, and how deviations to the genetic code are exploited. We revisit early discoveries of genetic code deviations and provide examples of adaptive mistranslation events in nature. Lastly, we highlight innovations in synthetic biology to expand the genetic code. The genetic code is still evolving. Mistranslation increases proteomic diversity that enables cells to survive stress conditions or suppress a deleterious allele. Genetic code variants have been identified by genome and metagenome sequence analyses, suppressor genetics, and biochemical characterization. Understanding the mechanisms of translation and genetic code deviations enables the design of new codes to produce novel proteins. Engineering the translation machinery and expanding the genetic code to incorporate non-canonical amino acids are valuable tools in synthetic biology that are impacting biomedical research. This article is part of a Special Issue entitled "Biochemistry of Synthetic Biology - Recent Developments" Guest Editor: Dr. Ilka Heinemann and Dr. Patrick O'Donoghue. Copyright © 2017 Elsevier B.V. All rights reserved.
Shannon Entropy of the Canonical Genetic Code

NASA Astrophysics Data System (ADS)

Nemzer, Louis

The probability that a non-synonymous point mutation in DNA will adversely affect the functionality of the resultant protein is greatly reduced if the substitution is conservative. In that case, the amino acid coded by the mutated codon has similar physico-chemical properties to the original. Many simplified alphabets, which group the 20 common amino acids into families, have been proposed. To evaluate these schema objectively, we introduce a novel, quantitative method based on the inherent redundancy in the canonical genetic code. By calculating the Shannon information entropy carried by 1- or 2-bit messages, groupings that best leverage the robustness of the code are identified. The relative importance of properties related to protein folding - like hydropathy and size - and function, including side-chain acidity, can also be estimated. In addition, this approach allows us to quantify the average information value of nucleotide codon positions, and explore the physiological basis for distinguishing between transition and transversion mutations. Supported by NSU PFRDG Grant #335347.
Dissecting non-coding RNA mechanisms in cellulo by single-molecule high-resolution localization and counting

PubMed Central

Pitchiaya, Sethuramasundaram; Krishnan, Vishalakshi; Custer, Thomas C.; Walter, Nils G.

2013-01-01

Non-coding RNAs (ncRNAs) recently were discovered to outnumber their protein-coding counterparts, yet their diverse functions are still poorly understood. Here we report on a method for the intracellular Single-molecule High Resolution Localization and Counting (iSHiRLoC) of microRNAs (miRNAs), a conserved, ubiquitous class of regulatory ncRNAs that controls the expression of over 60% of all mammalian protein coding genes post-transcriptionally, by a mechanism shrouded by seemingly contradictory observations. We present protocols to execute single particle tracking (SPT) and single-molecule counting of functional microinjected, fluorophore-labeled miRNAs and thereby extract diffusion coefficients and molecular stoichiometries of micro-ribonucleoprotein (miRNP) complexes from living and fixed cells, respectively. This probing of miRNAs at the single molecule level sheds new light on the intracellular assembly/disassembly of miRNPs, thus beginning to unravel the dynamic nature of this important gene regulatory pathway and facilitating the development of a parsimonious model for their obscured mechanism of action. PMID:23820309
[Studies on antigencity of human immunodeficiency virus type 1 (HIV-1) external glycoprotein as well as its expression in Pichia pastoris].

PubMed

Zhao, Li-Hui; Yu, Xiang-Hui; Jiang, Chun-Lai; Wu, Yong-Ge; Shen, Jia-Cong; Kong, Wei

2007-05-01

Based on the computer simulation, we analyzed hydrophobicity, potential epitope of recombined subtypes HIV-1 Env protein (851 amino acids) from Guangxi in China. Compared with conservative peptides of other subtypes in env protein, three sequences (469-511aa, 538-674aa, 700-734aa) were selected to recombine into a chimeric gene that codes three conservative epitope peptides with stronger antigencity, and was constructed in the yeast expression plasmid pPICZB. Chimeric proteins were expressed in Pichia pastoris under the induction of methanol, and were analyzed by SDS-PAGE and Westernblot. The results showed that fusion proteins of three-segment antigen were expressed in Pichia pastoris and that specific protein band at the site of 40kD was target protein, which is interacted with HIV-1 serum. The target proteins were purified by metal Ni-sepharose 4B, and were demonstrated to possess good antigenic specificity from the data of ELISA. This chimeric antigen may be used as research and developed into HIV diagnostic reagents.
Analysis of the complete genome of the first Irkut virus isolate from China: comparison across the Lyssavirus genus.

PubMed

Liu, Ye; Li, Nan; Zhang, Shoufeng; Zhang, Fei; Lian, Hai; Wang, Ying; Zhang, Jinxia; Hu, Rongliang

2013-12-01

The genome of Irkut virus, isolate IRKV-THChina12, the first non-rabies lyssavirus from China (of bat origin), has been completely sequenced. In general, coding and non-coding regions of this viral genome are similar to those of other lyssaviruses. However, alignment of the deduced amino acid sequences of the structural proteins of IRKV-THChina12 with those of other lyssavirus representatives revealed significant variability between viral species. The nucleoprotein and matrix protein were found to be the most conserved, followed by the large protein, glycoprotein and phosphoprotein. Differences in the antigenic sites in glycoprotein may result in only partial protection of the available rabies biologics against Irkut virus, which is of particular concern for pre- and post-exposure rabies prophylaxis. Copyright © 2013 Elsevier Inc. All rights reserved.
An unusual internal ribosomal entry site of inverted symmetry directs expression of a potato leafroll polerovirus replication-associated protein

PubMed Central

Jaag, Hannah Miriam; Kawchuk, Lawrence; Rohde, Wolfgang; Fischer, Rainer; Emans, Neil; Prüfer, Dirk

2003-01-01

Potato leafroll polerovirus (PLRV) genomic RNA acts as a polycistronic mRNA for the production of proteins P0, P1, and P2 translated from the 5′-proximal half of the genome. Within the P1 coding region we identified a 5-kDa replication-associated protein 1 (Rap1) essential for viral multiplication. An internal ribosome entry site (IRES) with unusual structure and location was identified that regulates Rap1 translation. Core structural elements for internal ribosome entry include a conserved AUG codon and a downstream GGAGAGAGAGG motif with inverted symmetry. Reporter gene expression in potato protoplasts confirmed the internal ribosome entry function. Unlike known IRES motifs, the PLRV IRES is located completely within the coding region of Rap1 at the center of the PLRV genome. PMID:12835413
Genome-Wide Discovery of Long Non-Coding RNAs in Rainbow Trout.

PubMed

Al-Tobasei, Rafet; Paneru, Bam; Salem, Mohamed

2016-01-01

The ENCODE project revealed that ~70% of the human genome is transcribed. While only 1-2% of the RNAs encode for proteins, the rest are non-coding RNAs. Long non-coding RNAs (lncRNAs) form a diverse class of non-coding RNAs that are longer than 200 nt. Emerging evidence indicates that lncRNAs play critical roles in various cellular processes including regulation of gene expression. LncRNAs show low levels of gene expression and sequence conservation, which make their computational identification in genomes difficult. In this study, more than two billion Illumina sequence reads were mapped to the genome reference using the TopHat and Cufflinks software. Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed. In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test. Depending on the filtering stringency conditions, between 31,195 and 54,503 lncRNAs were identified, with only 421 matching known lncRNAs in other species. A digital gene expression atlas revealed 2,935 tissue-specific and 3,269 ubiquitously-expressed lncRNAs. This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.
Genome-wide Analysis of Drosophila Circular RNAs Reveals Their Structural and Sequence Properties and Age-Dependent Neural Accumulation

DOE PAGES

Westholm, Jakub O.; Miura, Pedro; Olson, Sara; ...

2014-11-26

Circularization was recently recognized to broadly expand transcriptome complexity. Here, we exploit massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues, and cultured cells, to rigorously annotate >2,500 fruit fly circular RNAs. These mostly derive from back-splicing of protein-coding genes and lack poly(A) tails, and the circularization of hundreds of genes is conserved across multiple Drosophila species. We elucidate structural and sequence properties of Drosophila circular RNAs, which exhibit commonalities and distinctions from mammalian circles. Notably, Drosophila circular RNAs harbor >1,000 well-conserved canonical miRNA seed matches, especially within coding regions, and codingmore » conserved miRNA sites reside preferentially within circularized exons. Finally, we analyze the developmental and tissue specificity of circular RNAs and note their preferred derivation from neural genes and enhanced accumulation in neural tissues. Interestingly, circular isoforms increase substantially relative to linear isoforms during CNS aging and constitute an aging biomarker.« less
Genome-wide Analysis of Drosophila Circular RNAs Reveals Their Structural and Sequence Properties and Age-Dependent Neural Accumulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Westholm, Jakub O.; Miura, Pedro; Olson, Sara

Circularization was recently recognized to broadly expand transcriptome complexity. Here, we exploit massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues, and cultured cells, to rigorously annotate >2,500 fruit fly circular RNAs. These mostly derive from back-splicing of protein-coding genes and lack poly(A) tails, and the circularization of hundreds of genes is conserved across multiple Drosophila species. We elucidate structural and sequence properties of Drosophila circular RNAs, which exhibit commonalities and distinctions from mammalian circles. Notably, Drosophila circular RNAs harbor >1,000 well-conserved canonical miRNA seed matches, especially within coding regions, and codingmore » conserved miRNA sites reside preferentially within circularized exons. Finally, we analyze the developmental and tissue specificity of circular RNAs and note their preferred derivation from neural genes and enhanced accumulation in neural tissues. Interestingly, circular isoforms increase substantially relative to linear isoforms during CNS aging and constitute an aging biomarker.« less
Effective comparative analysis of protein-protein interaction networks by measuring the steady-state network flow using a Markov model.

PubMed

Jeong, Hyundoo; Qian, Xiaoning; Yoon, Byung-Jun

2016-10-06

Comparative analysis of protein-protein interaction (PPI) networks provides an effective means of detecting conserved functional network modules across different species. Such modules typically consist of orthologous proteins with conserved interactions, which can be exploited to computationally predict the modules through network comparison. In this work, we propose a novel probabilistic framework for comparing PPI networks and effectively predicting the correspondence between proteins, represented as network nodes, that belong to conserved functional modules across the given PPI networks. The basic idea is to estimate the steady-state network flow between nodes that belong to different PPI networks based on a Markov random walk model. The random walker is designed to make random moves to adjacent nodes within a PPI network as well as cross-network moves between potential orthologous nodes with high sequence similarity. Based on this Markov random walk model, we estimate the steady-state network flow - or the long-term relative frequency of the transitions that the random walker makes - between nodes in different PPI networks, which can be used as a probabilistic score measuring their potential correspondence. Subsequently, the estimated scores can be used for detecting orthologous proteins in conserved functional modules through network alignment. Through evaluations based on multiple real PPI networks, we demonstrate that the proposed scheme leads to improved alignment results that are biologically more meaningful at reduced computational cost, outperforming the current state-of-the-art algorithms. The source code and datasets can be downloaded from http://www.ece.tamu.edu/~bjyoon/CUFID .

Evidence for ribosomal frameshifting and a novel overlapping gene in the genomes of insect-specific flaviviruses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Firth, Andrew E., E-mail: a.firth@ucc.i; Blitvich, Bradley J., E-mail: blitvich@iastate.ed; Wills, Norma M., E-mail: nwills@genetics.utah.ed

2010-03-30

Flaviviruses have a positive-sense, single-stranded RNA genome of approx11 kb, encoding a large polyprotein that is cleaved to produce approx10 mature proteins. Cell fusing agent virus, Kamiti River virus, Culex flavivirus and several recently discovered flaviviruses have no known vertebrate host and apparently infect only insects. We present compelling bioinformatic evidence for a 253-295 codon overlapping gene (designated fifo) conserved throughout these insect-specific flaviviruses and immunofluorescent detection of its product. Fifo overlaps the NS2A/NS2B coding sequence in the - 1/+ 2 reading frame and is most likely expressed as a trans-frame fusion protein via ribosomal frameshifting at a conserved GGAUUUYmore » slippery heptanucleotide with 3'-adjacent RNA secondary structure (which stimulates efficient frameshifting in vitro). The discovery bears striking parallels to the recently discovered ribosomal frameshifting site in the NS2A coding sequence of the Japanese encephalitis serogroup of flaviviruses and suggests that programmed ribosomal frameshifting may be more widespread in flaviviruses than currently realized.« less
L-GRAAL: Lagrangian graphlet-based network aligner.

PubMed

Malod-Dognin, Noël; Pržulj, Nataša

2015-07-01

Discovering and understanding patterns in networks of protein-protein interactions (PPIs) is a central problem in systems biology. Alignments between these networks aid functional understanding as they uncover important information, such as evolutionary conserved pathways, protein complexes and functional orthologs. A few methods have been proposed for global PPI network alignments, but because of NP-completeness of underlying sub-graph isomorphism problem, producing topologically and biologically accurate alignments remains a challenge. We introduce a novel global network alignment tool, Lagrangian GRAphlet-based ALigner (L-GRAAL), which directly optimizes both the protein and the interaction functional conservations, using a novel alignment search heuristic based on integer programming and Lagrangian relaxation. We compare L-GRAAL with the state-of-the-art network aligners on the largest available PPI networks from BioGRID and observe that L-GRAAL uncovers the largest common sub-graphs between the networks, as measured by edge-correctness and symmetric sub-structures scores, which allow transferring more functional information across networks. We assess the biological quality of the protein mappings using the semantic similarity of their Gene Ontology annotations and observe that L-GRAAL best uncovers functionally conserved proteins. Furthermore, we introduce for the first time a measure of the semantic similarity of the mapped interactions and show that L-GRAAL also uncovers best functionally conserved interactions. In addition, we illustrate on the PPI networks of baker's yeast and human the ability of L-GRAAL to predict new PPIs. Finally, L-GRAAL's results are the first to show that topological information is more important than sequence information for uncovering functionally conserved interactions. L-GRAAL is coded in C++. Software is available at: http://bio-nets.doc.ic.ac.uk/L-GRAAL/. n.malod-dognin@imperial.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
GATA: A graphic alignment tool for comparative sequenceanalysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nix, David A.; Eisen, Michael B.

2005-01-01

Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less
Organization patterns of the AGFG genes: an evolutionary study.

PubMed

Panaro, Maria Antonietta; Acquafredda, Angela; Calvello, Rosa; Lisi, Sabrina; Dragone, Teresa; Cianciulli, Antonia

2011-03-01

A number of proteins which are needed for the building of new immunodeficiency virus type 1 virions can only be translated from unspliced virus-derived pre-mRNAs. These unspliced mRNAs are shuttled through the nuclear pores reaching the cytosol when bound to the viral protein Rev. However, as a cellular co-factor Rev requires a Rev-binding protein of the AGFG family (nucleoporin-related Arf-GAP domain and FG repeats-containing proteins). In this article we address the evolution of the AGFGs by analyzing the first section of the coding mRNAs. This contains a "core module" which can be traced from Drosophilae to fish, amphibia, birds, and mammals, including man. In the subfamily of AGFG1 molecules the estimated conservation from Drosophilae to primates is 67% (with limited gaps). In some Drosophilae the core module is preceded by a long stretch of more than 300 coding nucleotides, but this additional module is absent in other Drosophilae and in all AGFG1s of other species. The AGFG2 molecules emerged later in evolution, possibly deriving from a duplication of AGFG1s. AGFG2s, present in mammals only, exhibit an additional module of about 50 coding nucleotides ahead of the core module, which is significantly less conserved (54%, with more remarkable gaps). This additional module does not seem to have homologies with the additional module of Drosophilae nor with the precoding section of AGFG1s. Interestingly, in birds a highly re-edited form of the AGFG1 core module (Gallus gallus, Galliformes) coexists with a typical form of the AGFG1 core module (Taeniopygia guttata, Passeriformes).
Cloning and expression analysis of a small HSP26 gene of Pacific abalone (Haliotis discus hannai).

PubMed

Park, Eun Mi; Kim, Young Ok; Nam, Bo Hye; Kong, Hee Jeong; Kim, Woo Jin; Lee, Sang Jun; Jee, Young Ju; Kong, In Soo; Choi, Tae Jin

2008-07-01

Heat shock proteins (HSPs) are evolutionally conserved from micro organism to mammals and play important roles in many biological processes including thermal tolerance. We isolated a homologue of small HSP26 (sHSP26) from a subtracted cDNA library of heat shock-treated abalone (Haliotis discus hannai). The abalone sHSP26 encompossed 793 nt, including a coding region of 501 nt. The deduced amino acid sequence of the abalone sHSP26 contained well conserved alpha-crystallin domain and showed overall identities of 27-31% with the other species' sHSP proteins. The abalone sHSP26 transcript was induced by heat shock treatment, but not by cold shock treatment.
A novel TaqMan® assay for Nosema ceranae quantification in honey bee, based on the protein coding gene Hsp70.

PubMed

Cilia, Giovanni; Cabbri, Riccardo; Maiorana, Giacomo; Cardaio, Ilaria; Dall'Olio, Raffaele; Nanetti, Antonio

2018-04-01

Nosema ceranae is now a widespread honey bee pathogen with high incidence in apiculture. Rapid and reliable detection and quantification methods are a matter of concern for research community, nowadays mainly relying on the use of biomolecular techniques such as PCR, RT-PCR or HRMA. The aim of this technical paper is to provide a new qPCR assay, based on the highly-conserved protein coding gene Hsp70, to detect and quantify the microsporidian Nosema ceranae affecting the western honey bee Apis mellifera. The validation steps to assess efficiency, sensitivity, specificity and robustness of the assay are described also. Copyright © 2018 Elsevier GmbH. All rights reserved.
PUF Proteins: Cellular Functions and Potential Applications.

PubMed

Kiani, Seyed Jalal; Taheri, Tahereh; Rafati, Sima; Samimi-Rad, Katayoun

2017-01-01

RNA-binding proteins play critical roles in the regulation of gene expression. Among several families of RNA-binding proteins, PUF (Pumilio and FBF) proteins have been the subject of extensive investigations, as they can bind RNA in a sequence-specific manner and they are evolutionarily conserved among a wide range of organisms. The outstanding feature of these proteins is a highly conserved RNA-binding domain, which is known as the Pumilio-homology domain (PUM-HD) that mostly consists of eight tandem repeats. Each repeat recognizes an RNA base with a simple three-letter code that can be programmed in order to change the sequence-specificity of the protein. Using this tailored architecture, researchers have been able to change the specificity of the PUM-HD and target desired transcripts in the cell, even in subcellular compartments. The potential applications of this versatile tool in molecular cell biology seem unbounded and the use of these factors in pharmaceutics might be an interesting field of study in near future. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Molecular characterization of the pL40 protein in Leptospira interrogans.

PubMed

Zhao, Wei; Chen, Chun-Yan; Zhang, Xiang-Yan; Lai, Wei-Qiang; Hu, Bao-Yu; Zhao, Guo-Ping; Qin, Jin-Hong; Guo, Xiao-Kui

2009-06-01

Leptospirosis is a widespread zoonotic disease caused by pathogenic leptospires. The identification of outer membrane proteins (OMPs) conserved among pathogenic leptospires, which are exposed on the leptospiral surface and expressed during mammalian infection, has become a major focus of leptospirosis research. pL40, a 40 kDa protein coded by the LA3744 gene in Leptospira interrogans, was found to be unique to Leptospira. Triton X-114 fractionation and flow cytometry analyses indicate that pL40 is a component of the leptospiral outer membrane. The conservation of pL40 among Leptospira strains prevalent in China was confirmed by both Western blotting and PCR screening. Furthermore, the pL40 antigen could be recognized by sera from guinea pigs and mice infected with low-passage L. interrogans. These findings indicate that pL40 may serve as a useful serodiagnostic antigen and vaccine candidate for L. interrogans.
Methylated glycans as conserved targets of animal and fungal innate defense

PubMed Central

Wohlschlager, Therese; Butschi, Alex; Grassi, Paola; Sutov, Grigorij; Gauss, Robert; Hauck, Dirk; Schmieder, Stefanie S.; Knobel, Martin; Titz, Alexander; Dell, Anne; Haslam, Stuart M.; Hengartner, Michael O.; Aebi, Markus; Künzler, Markus

2014-01-01

Effector proteins of innate immune systems recognize specific non-self epitopes. Tectonins are a family of β-propeller lectins conserved from bacteria to mammals that have been shown to bind bacterial lipopolysaccharide (LPS). We present experimental evidence that two Tectonins of fungal and animal origin have a specificity for O-methylated glycans. We show that Tectonin 2 of the mushroom Laccaria bicolor (Lb-Tec2) agglutinates Gram-negative bacteria and exerts toxicity toward the model nematode Caenorhabditis elegans, suggesting a role in fungal defense against bacteria and nematodes. Biochemical and genetic analysis of these interactions revealed that both bacterial agglutination and nematotoxicity of Lb-Tec2 depend on the recognition of methylated glycans, namely O-methylated mannose and fucose residues, as part of bacterial LPS and nematode cell-surface glycans. In addition, a C. elegans gene, termed samt-1, coding for a candidate membrane transport protein for the presumptive donor substrate of glycan methylation, S-adenosyl-methionine, from the cytoplasm to the Golgi was identified. Intriguingly, limulus lectin L6, a structurally related antibacterial protein of the Japanese horseshoe crab Tachypleus tridentatus, showed properties identical to the mushroom lectin. These results suggest that O-methylated glycans constitute a conserved target of the fungal and animal innate immune system. The broad phylogenetic distribution of O-methylated glycans increases the spectrum of potential antagonists recognized by Tectonins, rendering this conserved protein family a universal defense armor. PMID:24879441
DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats.

PubMed

de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

2015-11-16

Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genomic positional conservation identifies topological anchor point RNAs linked to developmental loci.

PubMed

Amaral, Paulo P; Leonardi, Tommaso; Han, Namshik; Viré, Emmanuelle; Gascoigne, Dennis K; Arias-Carrasco, Raúl; Büscher, Magdalena; Pandolfini, Luca; Zhang, Anda; Pluchino, Stefano; Maracaja-Coutinho, Vinicius; Nakaya, Helder I; Hemberg, Martin; Shiekhattar, Ramin; Enright, Anton J; Kouzarides, Tony

2018-03-15

The mammalian genome is transcribed into large numbers of long noncoding RNAs (lncRNAs), but the definition of functional lncRNA groups has proven difficult, partly due to their low sequence conservation and lack of identified shared properties. Here we consider promoter conservation and positional conservation as indicators of functional commonality. We identify 665 conserved lncRNA promoters in mouse and human that are preserved in genomic position relative to orthologous coding genes. These positionally conserved lncRNA genes are primarily associated with developmental transcription factor loci with which they are coexpressed in a tissue-specific manner. Over half of positionally conserved RNAs in this set are linked to chromatin organization structures, overlapping binding sites for the CTCF chromatin organiser and located at chromatin loop anchor points and borders of topologically associating domains (TADs). We define these RNAs as topological anchor point RNAs (tapRNAs). Characterization of these noncoding RNAs and their associated coding genes shows that they are functionally connected: they regulate each other's expression and influence the metastatic phenotype of cancer cells in vitro in a similar fashion. Furthermore, we find that tapRNAs contain conserved sequence domains that are enriched in motifs for zinc finger domain-containing RNA-binding proteins and transcription factors, whose binding sites are found mutated in cancers. This work leverages positional conservation to identify lncRNAs with potential importance in genome organization, development and disease. The evidence that many developmental transcription factors are physically and functionally connected to lncRNAs represents an exciting stepping-stone to further our understanding of genome regulation.
Unraveling patterns of site-to-site synonymous rates variation and associated gene properties of protein domains and families.

PubMed

Dimitrieva, Slavica; Anisimova, Maria

2014-01-01

In protein-coding genes, synonymous mutations are often thought not to affect fitness and therefore are not subject to natural selection. Yet increasingly, cases of non-neutral evolution at certain synonymous sites were reported over the last decade. To evaluate the extent and the nature of site-specific selection on synonymous codons, we computed the site-to-site synonymous rate variation (SRV) and identified gene properties that make SRV more likely in a large database of protein-coding gene families and protein domains. To our knowledge, this is the first study that explores the determinants and patterns of the SRV in real data. We show that the SRV is widespread in the evolution of protein-coding sequences, putting in doubt the validity of the synonymous rate as a standard neutral proxy. While protein domains rarely undergo adaptive evolution, the SRV appears to play important role in optimizing the domain function at the level of DNA. In contrast, protein families are more likely to evolve by positive selection, but are less likely to exhibit SRV. Stronger SRV was detected in genes with stronger codon bias and tRNA reusage, those coding for proteins with larger number of interactions or forming larger number of structures, located in intracellular components and those involved in typically conserved complex processes and functions. Genes with extreme SRV show higher expression levels in nearly all tissues. This indicates that codon bias in a gene, which often correlates with gene expression, may often be a site-specific phenomenon regulating the speed of translation along the sequence, consistent with the co-translational folding hypothesis. Strikingly, genes with SRV were strongly overrepresented for metabolic pathways and those associated with several genetic diseases, particularly cancers and diabetes.
Comparative Mitogenomics of Plant Bugs (Hemiptera: Miridae): Identifying the AGG Codon Reassignments between Serine and Lysine

PubMed Central

Wang, Pei; Song, Fan; Cai, Wanzhi

2014-01-01

Insect mitochondrial genomes are very important to understand the molecular evolution as well as for phylogenetic and phylogeographic studies of the insects. The Miridae are the largest family of Heteroptera encompassing more than 11,000 described species and of great economic importance. For better understanding the diversity and the evolution of plant bugs, we sequence five new mitochondrial genomes and present the first comparative analysis of nine mitochondrial genomes of mirids available to date. Our result showed that gene content, gene arrangement, base composition and sequences of mitochondrial transcription termination factor were conserved in plant bugs. Intra-genus species shared more conserved genomic characteristics, such as nucleotide and amino acid composition of protein-coding genes, secondary structure and anticodon mutations of tRNAs, and non-coding sequences. Control region possessed several distinct characteristics, including: variable size, abundant tandem repetitions, and intra-genus conservation; and was useful in evolutionary and population genetic studies. The AGG codon reassignments were investigated between serine and lysine in the genera Adelphocoris and other cimicomorphans. Our analysis revealed correlated evolution between reassignments of the AGG codon and specific point mutations at the antidocons of tRNALys and tRNASer(AGN). Phylogenetic analysis indicated that mitochondrial genome sequences were useful in resolving family level relationship of Cimicomorpha. Comparative evolutionary analysis of plant bug mitochondrial genomes allowed the identification of previously neglected coding genes or non-coding regions as potential molecular markers. The finding of the AGG codon reassignments between serine and lysine indicated the parallel evolution of the genetic code in Hemiptera mitochondrial genomes. PMID:24988409
Predicting Gene Structure Changes Resulting from Genetic Variants via Exon Definition Features.

PubMed

Majoros, William H; Holt, Carson; Campbell, Michael S; Ware, Doreen; Yandell, Mark; Reddy, Timothy E

2018-04-25

Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed, and produce functional proteins. We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and noncoding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or noncoding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products, and we propose that they may commonly act as cryptic factors in disease. The software is available from geneprediction.org/SGRF. bmajoros@duke.edu. Supplementary information is available at Bioinformatics online.
Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts.

PubMed

Sanford, Jeremy R; Wang, Xin; Mort, Matthew; Vanduyn, Natalia; Cooper, David N; Mooney, Sean D; Edenberg, Howard J; Liu, Yunlong

2009-03-01

Metazoan genes are encrypted with at least two superimposed codes: the genetic code to specify the primary structure of proteins and the splicing code to expand their proteomic output via alternative splicing. Here, we define the specificity of a central regulator of pre-mRNA splicing, the conserved, essential splicing factor SFRS1. Cross-linking immunoprecipitation and high-throughput sequencing (CLIP-seq) identified 23,632 binding sites for SFRS1 in the transcriptome of cultured human embryonic kidney cells. SFRS1 was found to engage many different classes of functionally distinct transcripts including mRNA, miRNA, snoRNAs, ncRNAs, and conserved intergenic transcripts of unknown function. The majority of these diverse transcripts share a purine-rich consensus motif corresponding to the canonical SFRS1 binding site. The consensus site was not only enriched in exons cross-linked to SFRS1 in vivo, but was also enriched in close proximity to splice sites. mRNAs encoding RNA processing factors were significantly overrepresented, suggesting that SFRS1 may broadly influence the post-transcriptional control of gene expression in vivo. Finally, a search for the SFRS1 consensus motif within the Human Gene Mutation Database identified 181 mutations in 82 different genes that disrupt predicted SFRS1 binding sites. This comprehensive analysis substantially expands the known roles of human SR proteins in the regulation of a diverse array of RNA transcripts.
Novel promoters and coding first exons in DLG2 linked to developmental disorders and intellectual disability.

PubMed

Reggiani, Claudio; Coppens, Sandra; Sekhara, Tayeb; Dimov, Ivan; Pichon, Bruno; Lufin, Nicolas; Addor, Marie-Claude; Belligni, Elga Fabia; Digilio, Maria Cristina; Faletra, Flavio; Ferrero, Giovanni Battista; Gerard, Marion; Isidor, Bertrand; Joss, Shelagh; Niel-Bütschi, Florence; Perrone, Maria Dolores; Petit, Florence; Renieri, Alessandra; Romana, Serge; Topa, Alexandra; Vermeesch, Joris Robert; Lenaerts, Tom; Casimir, Georges; Abramowicz, Marc; Bontempi, Gianluca; Vilain, Catheline; Deconinck, Nicolas; Smits, Guillaume

2017-07-19

Tissue-specific integrative omics has the potential to reveal new genic elements important for developmental disorders. Two pediatric patients with global developmental delay and intellectual disability phenotype underwent array-CGH genetic testing, both showing a partial deletion of the DLG2 gene. From independent human and murine omics datasets, we combined copy number variations, histone modifications, developmental tissue-specific regulation, and protein data to explore the molecular mechanism at play. Integrating genomics, transcriptomics, and epigenomics data, we describe two novel DLG2 promoters and coding first exons expressed in human fetal brain. Their murine conservation and protein-level evidence allowed us to produce new DLG2 gene models for human and mouse. These new genic elements are deleted in 90% of 29 patients (public and in-house) showing partial deletion of the DLG2 gene. The patients' clinical characteristics expand the neurodevelopmental phenotypic spectrum linked to DLG2 gene disruption to cognitive and behavioral categories. While protein-coding genes are regarded as well known, our work shows that integration of multiple omics datasets can unveil novel coding elements. From a clinical perspective, our work demonstrates that two new DLG2 promoters and exons are crucial for the neurodevelopmental phenotypes associated with this gene. In addition, our work brings evidence for the lack of cross-annotation in human versus mouse reference genomes and nucleotide versus protein databases.
Computer analysis of protein functional sites projection on exon structure of genes in Metazoa

PubMed Central

2015-01-01

Background Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residues that are distantly located from each other in the amino acid sequence. They are highly conserved within their functional group and vary significantly in structure between such groups. According to this facts analysis of the general properties of the structural organization of the functional sites at the protein level and, at the level of exon-intron structure of the coding gene is still an actual problem. Results One approach to this analysis is the projection of amino acid residue positions of the functional sites along with the exon boundaries to the gene structure. In this paper, we examined the discontinuity of the functional sites in the exon-intron structure of genes and the distribution of lengths and phases of the functional site encoding exons in vertebrate genes. We have shown that the DNA fragments coding the functional sites were in the same exons, or in close exons. The observed tendency to cluster the exons that code functional sites which could be considered as the unit of protein evolution. We studied the characteristics of the structure of the exon boundaries that code, and do not code, functional sites in 11 Metazoa species. This is accompanied by a reduced frequency of intercodon gaps (phase 0) in exons encoding the amino acid residue functional site, which may be evidence of the existence of evolutionary limitations to the exon shuffling. Conclusions These results characterize the features of the coding exon-intron structure that affect the functionality of the encoded protein and allow a better understanding of the emergence of biological diversity. PMID:26693737
Molecular Evolution of Aminoacyl tRNA Synthetase Proteins in the Early History of Life

NASA Astrophysics Data System (ADS)

Fournier, Gregory P.; Andam, Cheryl P.; Alm, Eric J.; Gogarten, J. Peter

2011-12-01

Aminoacyl-tRNA synthetases (aaRS) consist of several families of functionally conserved proteins essential for translation and protein synthesis. Like nearly all components of the translation machinery, most aaRS families are universally distributed across cellular life, being inherited from the time of the Last Universal Common Ancestor (LUCA). However, unlike the rest of the translation machinery, aaRS have undergone numerous ancient horizontal gene transfers, with several independent events detected between domains, and some possibly involving lineages diverging before the time of LUCA. These transfers reveal the complexity of molecular evolution at this early time, and the chimeric nature of genomes within cells that gave rise to the major domains. Additionally, given the role of these protein families in defining the amino acids used for protein synthesis, sequence reconstruction of their pre-LUCA ancestors can reveal the evolutionary processes at work in the origin of the genetic code. In particular, sequence reconstructions of the paralog ancestors of isoleucyl- and valyl- RS provide strong empirical evidence that at least for this divergence, the genetic code did not co-evolve with the aaRSs; rather, both amino acids were already part of the genetic code before their cognate aaRSs diverged from their common ancestor. The implications of this observation for the early evolution of RNA-directed protein biosynthesis are discussed.
Improving the genome annotation of the acarbose producer Actinoplanes sp. SE50/110 by sequencing enriched 5'-ends of primary transcripts.

PubMed

Schwientek, Patrick; Neshat, Armin; Kalinowski, Jörn; Klein, Andreas; Rückert, Christian; Schneiker-Bekel, Susanne; Wendler, Sergej; Stoye, Jens; Pühler, Alfred

2014-11-20

Actinoplanes sp. SE50/110 is the producer of the alpha-glucosidase inhibitor acarbose, which is an economically relevant and potent drug in the treatment of type-2 diabetes mellitus. In this study, we present the detection of transcription start sites on this genome by sequencing enriched 5'-ends of primary transcripts. Altogether, 1427 putative transcription start sites were initially identified. With help of the annotated genome sequence, 661 transcription start sites were found to belong to the leader region of protein-coding genes with the surprising result that roughly 20% of these genes rank among the class of leaderless transcripts. Next, conserved promoter motifs were identified for protein-coding genes with and without leader sequences. The mapped transcription start sites were finally used to improve the annotation of the Actinoplanes sp. SE50/110 genome sequence. Concerning protein-coding genes, 41 translation start sites were corrected and 9 novel protein-coding genes could be identified. In addition to this, 122 previously undetermined non-coding RNA (ncRNA) genes of Actinoplanes sp. SE50/110 were defined. Focusing on antisense transcription start sites located within coding genes or their leader sequences, it was discovered that 96 of those ncRNA genes belong to the class of antisense RNA (asRNA) genes. The remaining 26 ncRNA genes were found outside of known protein-coding genes. Four chosen examples of prominent ncRNA genes, namely the transfer messenger RNA gene ssrA, the ribonuclease P class A RNA gene rnpB, the cobalamin riboswitch RNA gene cobRS, and the selenocysteine-specific tRNA gene selC, are presented in more detail. This study demonstrates that sequencing of enriched 5'-ends of primary transcripts and the identification of transcription start sites are valuable tools for advanced genome annotation of Actinoplanes sp. SE50/110 and most probably also for other bacteria. Copyright © 2014 Elsevier B.V. All rights reserved.
Are plant formins integral membrane proteins?

PubMed

Cvrcková, F

2000-01-01

The formin family of proteins has been implicated in signaling pathways of cellular morphogenesis in both animals and fungi; in the latter case, at least, they participate in communication between the actin cytoskeleton and the cell surface. Nevertheless, they appear to be cytoplasmic or nuclear proteins, and it is not clear whether they communicate with the plasma membrane, and if so, how. Because nothing is known about formin function in plants, I performed a systematic search for putative Arabidopsis thaliana formin homologs. I found eight putative formin-coding genes in the publicly available part of the Arabidopsis genome sequence and analyzed their predicted protein sequences. Surprisingly, some of them lack parts of the conserved formin-homology 2 (FH2) domain and the majority of them seem to have signal sequences and putative transmembrane segments that are not found in yeast or animals formins. Plant formins define a distinct subfamily. The presence in most Arabidopsis formins of sequence motifs typical or transmembrane proteins suggests a mechanism of membrane attachment that may be specific to plant formins, and indicates an unexpected evolutionary flexibility of the conserved formin domain.

The Mitochondrial Cytochrome Oxidase Subunit I Gene Occurs on a Minichromosome with Extensive Heteroplasmy in Two Species of Chewing Lice, Geomydoecus aurei and Thomomydoecus minor

PubMed Central

Pietan, Lucas L.; Spradling, Theresa A.

2016-01-01

In animals, mitochondrial DNA (mtDNA) typically occurs as a single circular chromosome with 13 protein-coding genes and 22 tRNA genes. The various species of lice examined previously, however, have shown mitochondrial genome rearrangements with a range of chromosome sizes and numbers. Our research demonstrates that the mitochondrial genomes of two species of chewing lice found on pocket gophers, Geomydoecus aurei and Thomomydoecus minor, are fragmented with the 1,536 base-pair (bp) cytochrome-oxidase subunit I (cox1) gene occurring as the only protein-coding gene on a 1,916–1,964 bp minicircular chromosome in the two species, respectively. The cox1 gene of T. minor begins with an atypical start codon, while that of G. aurei does not. Components of the non-protein coding sequence of G. aurei and T. minor include a tRNA (isoleucine) gene, inverted repeat sequences consistent with origins of replication, and an additional non-coding region that is smaller than the non-coding sequence of other lice with such fragmented mitochondrial genomes. Sequences of cox1 minichromosome clones for each species reveal extensive length and sequence heteroplasmy in both coding and noncoding regions. The highly variable non-gene regions of G. aurei and T. minor have little sequence similarity with one another except for a 19-bp region of phylogenetically conserved sequence with unknown function. PMID:27589589
The Big Entity of New RNA World: Long Non-Coding RNAs in Microvascular Complications of Diabetes.

PubMed

Raut, Satish K; Khullar, Madhu

2018-01-01

A major part of the genome is known to be transcribed into non-protein coding RNAs (ncRNAs), such as microRNA and long non-coding RNA (lncRNA). The importance of ncRNAs is being increasingly recognized in physiological and pathological processes. lncRNAs are a novel class of ncRNAs that do not code for proteins and are important regulators of gene expression. In the past, these molecules were thought to be transcriptional "noise" with low levels of evolutionary conservation. However, recent studies provide strong evidence indicating that lncRNAs are (i) regulated during various cellular processes, (ii) exhibit cell type-specific expression, (iii) localize to specific organelles, and (iv) associated with human diseases. Emerging evidence indicates an aberrant expression of lncRNAs in diabetes and diabetes-related microvascular complications. In the present review, we discuss the current state of knowledge of lncRNAs, their genesis from genome, and the mechanism of action of individual lncRNAs in the pathogenesis of microvascular complications of diabetes and therapeutic approaches.
NeuCode Labeling in Nematodes: Proteomic and Phosphoproteomic Impact of Ascaroside Treatment in Caenorhabditis elegans*

PubMed Central

Rhoads, Timothy W.; Prasad, Aman; Kwiecien, Nicholas W.; Merrill, Anna E.; Zawack, Kelson; Westphall, Michael S.; Schroeder, Frank C.; Kimble, Judith; Coon, Joshua J.

2015-01-01

The nematode Caenorhabditis elegans is an important model organism for biomedical research. We previously described NeuCode stable isotope labeling by amino acids in cell culture (SILAC), a method for accurate proteome quantification with potential for multiplexing beyond the limits of traditional stable isotope labeling by amino acids in cell culture. Here we apply NeuCode SILAC to profile the proteomic and phosphoproteomic response of C. elegans to two potent members of the ascaroside family of nematode pheromones. By consuming labeled E. coli as part of their diet, C. elegans nematodes quickly and easily incorporate the NeuCode heavy lysine isotopologues by the young adult stage. Using this approach, we report, at high confidence, one of the largest proteomic and phosphoproteomic data sets to date in C. elegans: 6596 proteins at a false discovery rate ≤ 1% and 6620 phosphorylation isoforms with localization probability ≥75%. Our data reveal a post-translational signature of pheromone sensing that includes many conserved proteins implicated in longevity and response to stress. PMID:26392051
RNA editing differently affects protein-coding genes in D. melanogaster and H. sapiens.

PubMed

Grassi, Luigi; Leoni, Guido; Tramontano, Anna

2015-07-14

When an RNA editing event occurs within a coding sequence it can lead to a different encoded amino acid. The biological significance of these events remains an open question: they can modulate protein functionality, increase the complexity of transcriptomes or arise from a loose specificity of the involved enzymes. We analysed the editing events in coding regions that produce or not a change in the encoded amino acid (nonsynonymous and synonymous events, respectively) in D. melanogaster and in H. sapiens and compared them with the appropriate random models. Interestingly, our results show that the phenomenon has rather different characteristics in the two organisms. For example, we confirm the observation that editing events occur more frequently in non-coding than in coding regions, and report that this effect is much more evident in H. sapiens. Additionally, in this latter organism, editing events tend to affect less conserved residues. The less frequently occurring editing events in Drosophila tend to avoid drastic amino acid changes. Interestingly, we find that, in Drosophila, changes from less frequently used codons to more frequently used ones are favoured, while this is not the case in H. sapiens.
Trade-off between Transcriptome Plasticity and Genome Evolution in Cephalopods.

PubMed

Liscovitch-Brauer, Noa; Alon, Shahar; Porath, Hagit T; Elstein, Boaz; Unger, Ron; Ziv, Tamar; Admon, Arie; Levanon, Erez Y; Rosenthal, Joshua J C; Eisenberg, Eli

2017-04-06

RNA editing, a post-transcriptional process, allows the diversification of proteomes beyond the genomic blueprint; however it is infrequently used among animals for this purpose. Recent reports suggesting increased levels of RNA editing in squids thus raise the question of the nature and effects of these events. We here show that RNA editing is particularly common in behaviorally sophisticated coleoid cephalopods, with tens of thousands of evolutionarily conserved sites. Editing is enriched in the nervous system, affecting molecules pertinent for excitability and neuronal morphology. The genomic sequence flanking editing sites is highly conserved, suggesting that the process confers a selective advantage. Due to the large number of sites, the surrounding conservation greatly reduces the number of mutations and genomic polymorphisms in protein-coding regions. This trade-off between genome evolution and transcriptome plasticity highlights the importance of RNA recoding as a strategy for diversifying proteins, particularly those associated with neural function. PAPERCLIP. Copyright © 2017 Elsevier Inc. All rights reserved.
APPRIS 2017: principal isoforms for multiple gene sets

PubMed Central

Rodriguez-Rivas, Juan; Di Domenico, Tomás; Vázquez, Jesús; Valencia, Alfonso

2018-01-01

Abstract The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the ‘principal’ isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants. PMID:29069475
Cloning of the cDNA for U1 small nuclear ribonucleoprotein particle 70K protein from Arabidopsis thaliana

NASA Technical Reports Server (NTRS)

Reddy, A. S.; Czernik, A. J.; An, G.; Poovaiah, B. W.

1992-01-01

We cloned and sequenced a plant cDNA that encodes U1 small nuclear ribonucleoprotein (snRNP) 70K protein. The plant U1 snRNP 70K protein cDNA is not full length and lacks the coding region for 68 amino acids in the amino-terminal region as compared to human U1 snRNP 70K protein. Comparison of the deduced amino acid sequence of the plant U1 snRNP 70K protein with the amino acid sequence of animal and yeast U1 snRNP 70K protein showed a high degree of homology. The plant U1 snRNP 70K protein is more closely related to the human counter part than to the yeast 70K protein. The carboxy-terminal half is less well conserved but, like the vertebrate 70K proteins, is rich in charged amino acids. Northern analysis with the RNA isolated from different parts of the plant indicates that the snRNP 70K gene is expressed in all of the parts tested. Southern blotting of genomic DNA using the cDNA indicates that the U1 snRNP 70K protein is coded by a single gene.
Functional amyloid in Pseudomonas.

PubMed

Dueholm, Morten S; Petersen, Steen V; Sønderkær, Mads; Larsen, Poul; Christiansen, Gunna; Hein, Kim L; Enghild, Jan J; Nielsen, Jeppe L; Nielsen, Kåre L; Nielsen, Per H; Otzen, Daniel E

2010-08-01

Amyloids are highly abundant in many microbial biofilms and may play an important role in their architecture. Nevertheless, little is known of the amyloid proteins. We report the discovery of a novel functional amyloid expressed by a Pseudomonas strain of the P. fluorescens group. The amyloid protein was purified and the amyloid-like structure verified. Partial sequencing by MS/MS combined with full genomic sequencing of the Pseudomonas strain identified the gene coding for the major subunit of the amyloid fibril, termed fapC. FapC contains a thrice repeated motif that differs from those previously found in curli fimbrins and prion proteins. The lack of aromatic residues in the repeat shows that aromatic side chains are not needed for efficient amyloid formation. In contrast, glutamine and asparagine residues seem to play a major role in amyloid formation as these are highly conserved in curli, prion proteins and FapC. fapC is conserved in many Pseudomonas strains including the opportunistic pathogen P. aeruginosa and is situated in a conserved operon containing six genes, of which one encodes a fapC homologue. Heterologous expression of the fapA-F operon in Escherichia coli BL21(DE3) resulted in a highly aggregative phenotype, showing that the operon is involved in biofilm formation. © 2010 Blackwell Publishing Ltd.
Molecular characterisation of Atlantic salmon paramyxovirus (ASPV): A novel paramyxovirus associated with proliferative gill inflammation

USGS Publications Warehouse

Falk, K.; Batts, W.N.; Kvellestad, A.; Kurath, G.; Wiik-Nielsen, J.; Winton, J.R.

2008-01-01

Atlantic salmon paramyxovirus (ASPV) was isolated in 1995 from gills of farmed Atlantic salmon suffering from proliferative gill inflammation. The complete genome sequence of ASPV was determined, revealing a genome 16,968 nucleotides in length consisting of six non-overlapping genes coding for the nucleo- (N), phospho- (P), matrix- (M), fusion- (F), haemagglutinin-neuraminidase- (HN) and large polymerase (L) proteins in the order 3???-N-P-M-F-HN-L-5???. The various conserved features related to virus replication found in most paramyxoviruses were also found in ASPV. These include: conserved and complementary leader and trailer sequences, tri-nucleotide intergenic regions and highly conserved transcription start and stop signal sequences. The P gene expression strategy of ASPV was like that of the respiro-, morbilli- and henipaviruses, which express the P and C proteins from the primary transcript and edit a portion of the mRNA to encode V and W proteins. Sequence similarities among various features related to virus replication, pairwise comparisons of all deduced ASPV protein sequences with homologous regions from other members of the family Paramyxoviridae, and phylogenetic analyses of these amino acid sequences suggested that ASPV was a novel member of the sub-family Paramyxovirinae, most closely related to the respiroviruses. ?? 2008 Elsevier B.V. All rights reserved.
Bovine adipose triglyceride lipase is not altered and adipocyte fatty acid binding protein is increased by dietary flaxseed

USDA-ARS?s Scientific Manuscript database

In this paper, we report the full length coding sequence of bovine ATGL cDNA are reported and analyze its expression in bovine tissues. Similar to human, mouse, and pig ATGL sequences, bovine ATGL has a highly conserved patatin domain that is necessary for lipolytic function in mice and humans. Thi...
The origins and evolutionary history of human non-coding RNA regulatory networks.

PubMed

Sherafatian, Masih; Mowla, Seyed Javad

2017-04-01

The evolutionary history and origin of the regulatory function of animal non-coding RNAs are not well understood. Lack of conservation of long non-coding RNAs and small sizes of microRNAs has been major obstacles in their phylogenetic analysis. In this study, we tried to shed more light on the evolution of ncRNA regulatory networks by changing our phylogenetic strategy to focus on the evolutionary pattern of their protein coding targets. We used available target databases of miRNAs and lncRNAs to find their protein coding targets in human. We were able to recognize evolutionary hallmarks of ncRNA targets by phylostratigraphic analysis. We found the conventional 3'-UTR and lesser known 5'-UTR targets of miRNAs to be enriched at three consecutive phylostrata. Firstly, in eukaryata phylostratum corresponding to the emergence of miRNAs, our study revealed that miRNA targets function primarily in cell cycle processes. Moreover, the same overrepresentation of the targets observed in the next two consecutive phylostrata, opisthokonta and eumetazoa, corresponded to the expansion periods of miRNAs in animals evolution. Coding sequence targets of miRNAs showed a delayed rise at opisthokonta phylostratum, compared to the 3' and 5' UTR targets of miRNAs. LncRNA regulatory network was the latest to evolve at eumetazoa.
A Bioinformatics-Based Alternative mRNA Splicing Code that May Explain Some Disease Mutations Is Conserved in Animals.

PubMed

Qu, Wen; Cingolani, Pablo; Zeeberg, Barry R; Ruden, Douglas M

2017-01-01

Deep sequencing of cDNAs made from spliced mRNAs indicates that most coding genes in many animals and plants have pre-mRNA transcripts that are alternatively spliced. In pre-mRNAs, in addition to invariant exons that are present in almost all mature mRNA products, there are at least 6 additional types of exons, such as exons from alternative promoters or with alternative polyA sites, mutually exclusive exons, skipped exons, or exons with alternative 5' or 3' splice sites. Our bioinformatics-based hypothesis is that, in analogy to the genetic code, there is an "alternative-splicing code" in introns and flanking exon sequences, analogous to the genetic code, that directs alternative splicing of many of the 36 types of introns. In humans, we identified 42 different consensus sequences that are each present in at least 100 human introns. 37 of the 42 top consensus sequences are significantly enriched or depleted in at least one of the 36 types of introns. We further supported our hypothesis by showing that 96 out of 96 analyzed human disease mutations that affect RNA splicing, and change alternative splicing from one class to another, can be partially explained by a mutation altering a consensus sequence from one type of intron to that of another type of intron. Some of the alternative splicing consensus sequences, and presumably their small-RNA or protein targets, are evolutionarily conserved from 50 plant to animal species. We also noticed the set of introns within a gene usually share the same splicing codes, thus arguing that one sub-type of splicesosome might process all (or most) of the introns in a given gene. Our work sheds new light on a possible mechanism for generating the tremendous diversity in protein structure by alternative splicing of pre-mRNAs.
Crystal structure of AFV3-109, a highly conserved protein from crenarchaeal viruses

PubMed Central

Keller, Jenny; Leulliot, Nicolas; Cambillau, Christian; Campanacci, Valérie; Porciero, Stéphanie; Prangishvili, David; Forterre, Patrick; Cortez, Diego; Quevillon-Cheruel, Sophie; van Tilbeurgh, Herman

2007-01-01

The extraordinary morphologies of viruses infecting hyperthermophilic archaea clearly distinguish them from bacterial and eukaryotic viruses. Moreover, their genomes code for proteins that to a large extend have no related sequences in the extent databases. However, a small pool of genes is shared by overlapping subsets of these viruses, and the most conserved gene, exemplified by the ORF109 of the Acidianus Filamentous Virus 3, AFV3, is present on genomes of members of three viral familes, the Lipothrixviridae, Rudiviridae, and "Bicaudaviridae", as well as of the unclassified Sulfolobus Turreted Icosahedral Virus, STIV. We present here the crystal structure of the protein (Mr = 13.1 kD, 109 residues) encoded by the AFV3 ORF 109 in two different crystal forms at 1.5 and 1.3 Å resolution. The structure of AFV3-109 is a five stranded β-sheet with loops on one side and three helices on the other. It forms a dimer adopting the shape of a cradle that encompasses the best conserved regions of the sequence. No protein with a related fold could be identified except for the ortholog from STIV1, whose structure was deposited at the Protein Data Bank. We could clearly identify a well bound glycerol inside the cradle, contacting exclusively totally conserved residues. This interaction was confirmed in solution by fluorescence titration. Although the function of AFV3-109 cannot be deduced directly from its structure, structural homology with the STIV1 protein, and the size and charge distribution of the cavity suggested it could interact with nucleic acids. Fluorescence quenching titrations also showed that AFV3-109 interacts with dsDNA. Genomic sequence analysis revealed bacterial homologs of AFV3-109 as a part of a putative previously unidentified prophage sequences in some Firmicutes. PMID:17241456
Structure of Thermotoga maritima Stationary Phase Survival Protein SurE: A Novel Acid Phosphatase

PubMed Central

Zhang, R.-G.; Skarina, T.; Katz, J.E.; Beasley, S.; Khachatryan, A.; Vyas, S.; Arrowsmith, C.H.; Clarke, S.; Edwards, A.; Joachimiak, A.; Savchenko, A.

2009-01-01

Summary Background The rpoS, nlpD, pcm, and surE genes are among many whose expression is induced during the stationary phase of bacterial growth. rpoS codes for the stationary-phase RNA polymerase σ subunit, and nlpD codes for a lipoprotein. The pcm gene product repairs damaged proteins by converting the atypical isoaspartyl residues back to L-aspartyls. The physiological and biochemical functions of surE are unknown, but its importance in stress is supported by the duplication of the surE gene in E. coli subjected to high-temperature growth. The pcm and surE genes are highly conserved in bacteria, archaea, and plants. Results The structure of SurE from Thermotoga maritima was determined at 2.0 Å. The SurE monomer is composed of two domains; a conserved N-terminal domain, a Rossman fold, and a C-terminal oligomerization domain, a new fold. Monomers form a dimer that assembles into a tetramer. Biochemical analysis suggests that SurE is an acid phosphatase, with an optimum pH of 5.5–6.2. The active site was identified in the N-terminal domain through analysis of conserved residues. Structure-based site-directed point mutations abolished phosphatase activity. T. maritima SurE intra- and inter-subunit salt bridges were identified that may explain the SurE thermostability. Conclusions The structure of SurE provided information about the protein’s fold, oligomeric state, and active site. The protein possessed magnesium-dependent acid phosphatase activity, but the physiologically relevant substrate(s) remains to be identified. The importance of three of the assigned active site residues in catalysis was confirmed by site-directed mutagenesis. PMID:11709173
The complete mitogenome of the Australian tadpole shrimp Triops australiensis (Spencer & Hall, 1895) (Crustacea: Branchiopoda: Notostraca).

PubMed

Gan, Han Ming; Tan, Mun Hua; Lee, Yin Peng; Austin, Christopher M

2016-05-01

The mitochondrial genome sequence of the Australian tadpole shrimp, Triops australiensis is presented (GenBank Accession Number: NC_024439) and compared with other Triops species. Triops australiensis has a mitochondrial genome of 15,125 base pairs consisting of 13 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a non-coding AT-rich region. The T. australiensis mitogenome is composed of 36.4% A, 16.1% C, 12.3% G and 35.1% T. The mitogenome gene order conforms to the primitive arrangement for Branchiopod crustaceans, which is also conserved within the Pancrustacean.
Domain Organization and Evolution of the Highly Divergent 5′ Coding Region of Genomes of Arteriviruses, Including the Novel Possum Nidovirus

PubMed Central

Gulyaeva, Anastasia; Hoogendoorn, Erik; Giles, Julia; Samborskiy, Dmitry

2017-01-01

ABSTRACT In five experimentally characterized arterivirus species, the 5′-end genome coding region encodes the most divergent nonstructural proteins (nsp's), nsp1 and nsp2, which include papain-like proteases (PLPs) and other poorly characterized domains. These are involved in regulation of transcription, polyprotein processing, and virus-host interaction. Here we present results of a bioinformatics analysis of this region of 14 arterivirus species, including that of the most distantly related virus, wobbly possum disease virus (WPDV), determined by a modified 5′ rapid amplification of cDNA ends (RACE) protocol. By combining profile-profile comparisons and phylogeny reconstruction, we identified an association of the four distinct domain layouts of nsp1-nsp2 with major phylogenetic lineages, implicating domain gain, including duplication, and loss in the early nsp1 evolution. Specifically, WPDV encodes highly divergent homologs of PLP1a, PLP1b, PLP1c, and PLP2, with PLP1a lacking the catalytic Cys residue, but does not encode nsp1 Zn finger (ZnF) and “nuclease” domains, which are conserved in other arteriviruses. Unexpectedly, our analysis revealed that the only catalytically active nsp1 PLP of equine arteritis virus (EAV), known as PLP1b, is most similar to PLP1c and thus is likely to be a PLP1b paralog. In all non-WPDV arteriviruses, PLP1b/c and PLP1a show contrasting patterns of conservation, with the N- and C-terminal subdomains, respectively, being enriched with conserved residues, which is indicative of different functional specializations. The least conserved domain of nsp2, the hypervariable region (HVR), has its size varied 5-fold and includes up to four copies of a novel PxPxPR motif that is potentially recognized by SH3 domain-containing proteins. Apparently, only EAV lacks the signal that directs −2 ribosomal frameshifting in the nsp2 coding region. IMPORTANCE Arteriviruses comprise a family of mammalian enveloped positive-strand RNA viruses that include some of the most economically important pathogens of swine. Most of our knowledge about this family has been obtained through characterization of viruses from five species: Equine arteritis virus, Simian hemorrhagic fever virus, Lactate dehydrogenase-elevating virus, Porcine respiratory and reproductive syndrome virus 1, and Porcine respiratory and reproductive syndrome virus 2. Here we present the results of comparative genomics analyses of viruses from all known 14 arterivirus species, including the most distantly related virus, WPDV, whose genome sequence was completed in this study. Our analysis focused on the multifunctional 5′-end genome coding region that encodes multidomain nonstructural proteins 1 and 2. Using diverse bioinformatics techniques, we identified many patterns of evolutionary conservation that are specific to members of distinct arterivirus species, both characterized and novel, or their groups. They are likely associated with structural and functional determinants important for virus replication and virus-host interaction. PMID:28053107
Molecular characterization of amino acid deletion in VP1 (1D) protein and novel amino acid substitutions in 3D polymerase protein of foot and mouth disease virus subtype A/Iran87.

PubMed

Esmaelizad, Majid; Jelokhani-Niaraki, Saber; Hashemnejad, Khadije; Kamalzadeh, Morteza; Lotfi, Mohsen

2011-12-01

The nucleotide sequence of the VP1 (1D) and partial 3D polymerase (3D(pol)) coding regions of the foot and mouth disease virus (FMDV) vaccine strain A/Iran87, a highly passaged isolate (~150 passages), was determined and aligned with previously published FMDV serotype A sequences. Overall analysis of the amino acid substitutions revealed that the partial 3D(pol) coding region contained four amino acid alterations. Amino acid sequence comparison of the VP1 coding region of the field isolates revealed deletions in the highly passaged Iranian isolate (A/Iran87). The prominent G-H loop of the FMDV VP1 protein contains the conserved arginine-glycine-aspartic acid (RGD) tripeptide, which is a well-known ligand for a specific cell surface integrin. Despite losing the RGD sequence of the VP1 protein and an Asp(26)→Glu substitution in a beta sheet located within a small groove of the 3D(pol) protein, the virus grew in BHK 21 suspension cell cultures. Since this strain has been used as a vaccine strain, it may be inferred that the RGD deletion has no critical role in virus attachment to the cell during the initiation of infection. It is probable that this FMDV subtype can utilize other pathways for cell attachment.
A novel bioinformatics pipeline to discover genes related to arbuscular mycorrhizal symbiosis based on their evolutionary conservation pattern among higher plants.

PubMed

Favre, Patrick; Bapaume, Laure; Bossolini, Eligio; Delorenzi, Mauro; Falquet, Laurent; Reinhardt, Didier

2014-12-03

Genes involved in arbuscular mycorrhizal (AM) symbiosis have been identified primarily by mutant screens, followed by identification of the mutated genes (forward genetics). In addition, a number of AM-related genes has been identified by their AM-related expression patterns, and their function has subsequently been elucidated by knock-down or knock-out approaches (reverse genetics). However, genes that are members of functionally redundant gene families, or genes that have a vital function and therefore result in lethal mutant phenotypes, are difficult to identify. If such genes are constitutively expressed and therefore escape differential expression analyses, they remain elusive. The goal of this study was to systematically search for AM-related genes with a bioinformatics strategy that is insensitive to these problems. The central element of our approach is based on the fact that many AM-related genes are conserved only among AM-competent species. Our approach involves genome-wide comparisons at the proteome level of AM-competent host species with non-mycorrhizal species. Using a clustering method we first established orthologous/paralogous relationships and subsequently identified protein clusters that contain members only of the AM-competent species. Proteins of these clusters were then analyzed in an extended set of 16 plant species and ranked based on their relatedness among AM-competent monocot and dicot species, relative to non-mycorrhizal species. In addition, we combined the information on the protein-coding sequence with gene expression data and with promoter analysis. As a result we present a list of yet uncharacterized proteins that show a strongly AM-related pattern of sequence conservation, indicating that the respective genes may have been under selection for a function in AM. Among the top candidates are three genes that encode a small family of similar receptor-like kinases that are related to the S-locus receptor kinases involved in sporophytic self-incompatibility. We present a new systematic strategy of gene discovery based on conservation of the protein-coding sequence that complements classical forward and reverse genetics. This strategy can be applied to diverse other biological phenomena if species with established genome sequences fall into distinguished groups that differ in a defined functional trait of interest.
Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

NASA Technical Reports Server (NTRS)

Gatlin, L. L.

1974-01-01

Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.
The Complete Mitochondrial DNA Sequence of Scenedesmus obliquus Reflects an Intermediate Stage in the Evolution of the Green Algal Mitochondrial Genome

PubMed Central

Nedelcu, Aurora M.; Lee, Robert W.; Lemieux, Claude; Gray, Michael W.; Burger, Gertraud

2000-01-01

Two distinct mitochondrial genome types have been described among the green algal lineages investigated to date: a reduced–derived, Chlamydomonas-like type and an ancestral, Prototheca-like type. To determine if this unexpected dichotomy is real or is due to insufficient or biased sampling and to define trends in the evolution of the green algal mitochondrial genome, we sequenced and analyzed the mitochondrial DNA (mtDNA) of Scenedesmus obliquus. This genome is 42,919 bp in size and encodes 42 conserved genes (i.e., large and small subunit rRNA genes, 27 tRNA and 13 respiratory protein-coding genes), four additional free-standing open reading frames with no known homologs, and an intronic reading frame with endonuclease/maturase similarity. No 5S rRNA or ribosomal protein-coding genes have been identified in Scenedesmus mtDNA. The standard protein-coding genes feature a deviant genetic code characterized by the use of UAG (normally a stop codon) to specify leucine, and the unprecedented use of UCA (normally a serine codon) as a signal for termination of translation. The mitochondrial genome of Scenedesmus combines features of both green algal mitochondrial genome types: the presence of a more complex set of protein-coding and tRNA genes is shared with the ancestral type, whereas the lack of 5S rRNA and ribosomal protein-coding genes as well as the presence of fragmented and scrambled rRNA genes are shared with the reduced–derived type of mitochondrial genome organization. Furthermore, the gene content and the fragmentation pattern of the rRNA genes suggest that this genome represents an intermediate stage in the evolutionary process of mitochondrial genome streamlining in green algae. [The sequence data described in this paper have been submitted to the GenBank data library under accession no. AF204057.] PMID:10854413

Genomics of Clostridium taeniosporum, an organism which forms endospores with ribbon-like appendages

PubMed Central

Cambridge, Joshua M.; Blinkova, Alexandra L.; Salvador Rocha, Erick I.; Bode Hernández, Addys; Moreno, Maday; Ginés-Candelaria, Edwin; Goetz, Benjamin M.; Hunicke-Smith, Scott; Satterwhite, Ed; Tucker, Haley O.

2018-01-01

Clostridium taeniosporum, a non-pathogenic anaerobe closely related to the C. botulinum Group II members, was isolated from Crimean lake silt about 60 years ago. Its endospores are surrounded by an encasement layer which forms a trunk at one spore pole to which about 12–14 large, ribbon-like appendages are attached. The genome consists of one 3,264,813 bp, circular chromosome (with 26.6% GC) and three plasmids. The chromosome contains 2,892 potential protein coding sequences: 2,124 have specific functions, 147 have general functions, 228 are conserved but without known function and 393 are hypothetical based on the fact that no statistically significant orthologs were found. The chromosome also contains 101 genes for stable RNAs, including 7 rRNA clusters. Over 84% of the protein coding sequences and 96% of the stable RNA coding regions are oriented in the same direction as replication. The three known appendage genes are located within a single cluster with five other genes, the protein products of which are closely related, in terms of sequence, to the known appendage proteins. The relatedness of the deduced protein products suggests that all or some of the closely related genes might code for minor appendage proteins or assembly factors. The appendage genes might be unique among the known clostridia; no statistically significant orthologs were found within other clostridial genomes for which sequence data are available. The C. taeniosporum chromosome contains two functional prophages, one Siphoviridae and one Myoviridae, and one defective prophage. Three plasmids of 5.9, 69.7 and 163.1 Kbp are present. These data are expected to contribute to future studies of developmental, structural and evolutionary biology and to potential industrial applications of this organism. PMID:29293521
Genomics of Clostridium taeniosporum, an organism which forms endospores with ribbon-like appendages.

PubMed

Cambridge, Joshua M; Blinkova, Alexandra L; Salvador Rocha, Erick I; Bode Hernández, Addys; Moreno, Maday; Ginés-Candelaria, Edwin; Goetz, Benjamin M; Hunicke-Smith, Scott; Satterwhite, Ed; Tucker, Haley O; Walker, James R

2018-01-01

Clostridium taeniosporum, a non-pathogenic anaerobe closely related to the C. botulinum Group II members, was isolated from Crimean lake silt about 60 years ago. Its endospores are surrounded by an encasement layer which forms a trunk at one spore pole to which about 12-14 large, ribbon-like appendages are attached. The genome consists of one 3,264,813 bp, circular chromosome (with 26.6% GC) and three plasmids. The chromosome contains 2,892 potential protein coding sequences: 2,124 have specific functions, 147 have general functions, 228 are conserved but without known function and 393 are hypothetical based on the fact that no statistically significant orthologs were found. The chromosome also contains 101 genes for stable RNAs, including 7 rRNA clusters. Over 84% of the protein coding sequences and 96% of the stable RNA coding regions are oriented in the same direction as replication. The three known appendage genes are located within a single cluster with five other genes, the protein products of which are closely related, in terms of sequence, to the known appendage proteins. The relatedness of the deduced protein products suggests that all or some of the closely related genes might code for minor appendage proteins or assembly factors. The appendage genes might be unique among the known clostridia; no statistically significant orthologs were found within other clostridial genomes for which sequence data are available. The C. taeniosporum chromosome contains two functional prophages, one Siphoviridae and one Myoviridae, and one defective prophage. Three plasmids of 5.9, 69.7 and 163.1 Kbp are present. These data are expected to contribute to future studies of developmental, structural and evolutionary biology and to potential industrial applications of this organism.
Pseudo-polyprotein translated from the full-length ORF1 of capillovirus is important for pathogenicity, but a truncated ORF1 protein without variable and CP regions is sufficient for replication.

PubMed

Hirata, Hisae; Yamaji, Yasuyuki; Komatsu, Ken; Kagiwada, Satoshi; Oshima, Kenro; Okano, Yukari; Takahashi, Shuichiro; Ugaki, Masashi; Namba, Shigetou

2010-09-01

The first open-reading frame (ORF) of the genus Capillovirus encodes an apparently chimeric polyprotein containing conserved regions for replicase (Rep) and coat protein (CP), while other viruses in the family Flexiviridae have separate ORFs encoding these proteins. To investigate the role of the full-length ORF1 polyprotein of capillovirus, we generated truncation mutants of ORF1 of apple stem grooving virus by inserting a termination codon into the variable region located between the putative Rep- and CP-coding regions. These mutants were capable of systemic infection, although their pathogenicity was attenuated. In vitro translation of ORF1 produced both the full-length polyprotein and the smaller Rep protein. The results of in vivo reporter assays suggested that the mechanism of this early termination is a ribosomal -1 frame-shift occurring downstream from the conserved Rep domains. The mechanism of capillovirus gene expression and the very close evolutionary relationship between the genera Capillovirus and Trichovirus are discussed. Copyright (c) 2010. Published by Elsevier B.V.
Structural and Functional Characterization of Ribosomal Protein Gene Introns in Sponges

PubMed Central

Perina, Drago; Korolija, Marina; Mikoč, Andreja; Roller, Maša; Pleše, Bruna; Imešek, Mirna; Morrow, Christine; Batel, Renato; Ćetković, Helena

2012-01-01

Ribosomal protein genes (RPGs) are a powerful tool for studying intron evolution. They exist in all three domains of life and are much conserved. Accumulating genomic data suggest that RPG introns in many organisms abound with non-protein-coding-RNAs (ncRNAs). These ancient ncRNAs are small nucleolar RNAs (snoRNAs) essential for ribosome assembly. They are also mobile genetic elements and therefore probably important in diversification and enrichment of transcriptomes through various mechanisms such as intron/exon gain/loss. snoRNAs in basal metazoans are poorly characterized. We examined 449 RPG introns, in total, from four demosponges: Amphimedon queenslandica, Suberites domuncula, Suberites ficus and Suberites pagurorum and showed that RPG introns from A. queenslandica share position conservancy and some structural similarity with “higher” metazoans. Moreover, our study indicates that mobile element insertions play an important role in the evolution of their size. In four sponges 51 snoRNAs were identified. The analysis showed discrepancies between the snoRNA pools of orthologous RPG introns between S. domuncula and A. queenslandica. Furthermore, these two sponges show as much conservancy of RPG intron positions between each other as between themselves and human. Sponges from the Suberites genus show consistency in RPG intron position conservation. However, significant differences in some of the orthologous RPG introns of closely related sponges were observed. This indicates that RPG introns are dynamic even on these shorter evolutionary time scales. PMID:22880015
Structural and functional characterization of ribosomal protein gene introns in sponges.

PubMed

Perina, Drago; Korolija, Marina; Mikoč, Andreja; Roller, Maša; Pleše, Bruna; Imešek, Mirna; Morrow, Christine; Batel, Renato; Ćetković, Helena

2012-01-01

Ribosomal protein genes (RPGs) are a powerful tool for studying intron evolution. They exist in all three domains of life and are much conserved. Accumulating genomic data suggest that RPG introns in many organisms abound with non-protein-coding-RNAs (ncRNAs). These ancient ncRNAs are small nucleolar RNAs (snoRNAs) essential for ribosome assembly. They are also mobile genetic elements and therefore probably important in diversification and enrichment of transcriptomes through various mechanisms such as intron/exon gain/loss. snoRNAs in basal metazoans are poorly characterized. We examined 449 RPG introns, in total, from four demosponges: Amphimedon queenslandica, Suberites domuncula, Suberites ficus and Suberites pagurorum and showed that RPG introns from A. queenslandica share position conservancy and some structural similarity with "higher" metazoans. Moreover, our study indicates that mobile element insertions play an important role in the evolution of their size. In four sponges 51 snoRNAs were identified. The analysis showed discrepancies between the snoRNA pools of orthologous RPG introns between S. domuncula and A. queenslandica. Furthermore, these two sponges show as much conservancy of RPG intron positions between each other as between themselves and human. Sponges from the Suberites genus show consistency in RPG intron position conservation. However, significant differences in some of the orthologous RPG introns of closely related sponges were observed. This indicates that RPG introns are dynamic even on these shorter evolutionary time scales.
Two rapidly evolving genes contribute to male fitness in Drosophila

PubMed Central

Reinhardt, Josephine A; Jones, Corbin D

2013-01-01

Purifying selection often results in conservation of gene sequence and function. The most functionally conserved genes are also thought to be among the most biologically essential. These observations have led to the use of sequence conservation as a proxy for functional conservation. Here we describe two genes that are exceptions to this pattern. We show that lack of sequence conservation among orthologs of CG15460 and CG15323 – herein named jean-baptiste (jb) and karr respectively – does not necessarily predict lack of functional conservation. These two Drosophila melanogaster genes are among the most rapidly evolving protein-coding genes in this species, being nearly as diverged from their D. yakuba orthologs as random sequences are. jb and karr are both expressed at an elevated level in larval males and adult testes, but they are not accessory gland proteins and their loss does not affect male fertility. Instead, knockdown of these genes in D. melanogaster via RNA interference caused male-biased viability defects. These viability effects occur prior to the third instar for jb and during late pupation for karr. We show that putative orthologs to jb and karr are also expressed strongly in the testes of other Drosophila species and have similar gene structure across species despite low levels of sequence conservation. While standard molecular evolution tests could not reject neutrality, other data hint at a role for natural selection. Together these data provide a clear case where a lack of sequence conservation does not imply a lack of conservation of expression or function. PMID:24221639
Highly tissue specific expression of Sphinx supports its male courtship related role in Drosophila melanogaster.

PubMed

Chen, Ying; Dai, Hongzheng; Chen, Sidi; Zhang, Luoying; Long, Manyuan

2011-04-26

Sphinx is a lineage-specific non-coding RNA gene involved in regulating courtship behavior in Drosophila melanogaster. The 5' flanking region of the gene is conserved across Drosophila species, with the proximal 300 bp being conserved out to D. virilis and a further 600 bp region being conserved amongst the melanogaster subgroup (D. melanogaster, D. simulans, D. sechellia, D. yakuba, and D. erecta). Using a green fluorescence protein transformation system, we demonstrated that a 253 bp region of the highly conserved segment was sufficient to drive sphinx expression in male accessory gland. GFP signals were also observed in brain, wing hairs and leg bristles. An additional ∼800 bp upstream region was able to enhance expression specifically in proboscis, suggesting the existence of enhancer elements. Using anti-GFP staining, we identified putative sphinx expression signal in the brain antennal lobe and inner antennocerebral tract, suggesting that sphinx might be involved in olfactory neuron mediated regulation of male courtship behavior. Whole genome expression profiling of the sphinx knockout mutation identified significant up-regulated gene categories related to accessory gland protein function and odor perception, suggesting sphinx might be a negative regulator of its target genes.
Highly Tissue Specific Expression of Sphinx Supports Its Male Courtship Related Role in Drosophila melanogaster

PubMed Central

Chen, Sidi; Zhang, Luoying; Long, Manyuan

2011-01-01

Sphinx is a lineage-specific non-coding RNA gene involved in regulating courtship behavior in Drosophila melanogaster. The 5′ flanking region of the gene is conserved across Drosophila species, with the proximal 300 bp being conserved out to D. virilis and a further 600 bp region being conserved amongst the melanogaster subgroup (D. melanogaster, D. simulans, D. sechellia, D. yakuba, and D. erecta). Using a green fluorescence protein transformation system, we demonstrated that a 253 bp region of the highly conserved segment was sufficient to drive sphinx expression in male accessory gland. GFP signals were also observed in brain, wing hairs and leg bristles. An additional ∼800 bp upstream region was able to enhance expression specifically in proboscis, suggesting the existence of enhancer elements. Using anti-GFP staining, we identified putative sphinx expression signal in the brain antennal lobe and inner antennocerebral tract, suggesting that sphinx might be involved in olfactory neuron mediated regulation of male courtship behavior. Whole genome expression profiling of the sphinx knockout mutation identified significant up-regulated gene categories related to accessory gland protein function and odor perception, suggesting sphinx might be a negative regulator of its target genes. PMID:21541324
Identification and characterization of proteins involved in rice urea and arginine catabolism.

PubMed

Cao, Feng-Qiu; Werner, Andrea K; Dahncke, Kathleen; Romeis, Tina; Liu, Lai-Hua; Witte, Claus-Peter

2010-09-01

Rice (Oryza sativa) production relies strongly on nitrogen (N) fertilization with urea, but the proteins involved in rice urea metabolism have not yet been characterized. Coding sequences for rice arginase, urease, and the urease accessory proteins D (UreD), F (UreF), and G (UreG) involved in urease activation were identified and cloned. The functionality of urease and the urease accessory proteins was demonstrated by complementing corresponding Arabidopsis (Arabidopsis thaliana) mutants and by multiple transient coexpression of the rice proteins in Nicotiana benthamiana. Secondary structure models of rice (plant) UreD and UreF proteins revealed a possible functional conservation to bacterial orthologs, especially for UreF. Using amino-terminally StrepII-tagged urease accessory proteins, an interaction between rice UreD and urease could be shown. Prokaryotic and eukaryotic urease activation complexes seem conserved despite limited protein sequence conservation for UreF and UreD. In plant metabolism, urea is generated by the arginase reaction. Rice arginase was transiently expressed as a carboxyl-terminally StrepII-tagged fusion protein in N. benthamiana, purified, and biochemically characterized (K(m) = 67 mm, k(cat) = 490 s(-1)). The activity depended on the presence of manganese (K(d) = 1.3 microm). In physiological experiments, urease and arginase activities were not influenced by the external N source, but sole urea nutrition imbalanced the plant amino acid profile, leading to the accumulation of asparagine and glutamine in the roots. Our data indicate that reduced plant performance with urea as N source is not a direct result of insufficient urea metabolism but may in part be caused by an imbalance of N distribution.
Cloning and sequence analysis of Hemonchus contortus HC58cDNA.

PubMed

Muleke, Charles I; Ruofeng, Yan; Lixin, Xu; Xinwen, Bo; Xiangrui, Li

2007-06-01

The complete coding sequence of Hemonchus contortus HC58cDNA was generated by rapid amplification of cDNA ends and polymerase chain reaction using primers based on the 5' and 3' ends of the parasite mRNA, accession no. AF305964. The HC58cDNA gene was 851 bp long, with open reading frame of 717 bp, precursors to 239 amino acids coding for approximately 27 kDa protein. Analysis of amino acid sequence revealed conserved residues of cysteine, histidine, asparagine, occluding loop pattern, hemoglobinase motif and glutamine of the oxyanion hole characteristic of cathepsin B like proteases (CBL). Comparison of the predicted amino acid sequences showed the protein shared 33.5-58.7% identity to cathepsin B homologues in the papain clan CA family (family C1). Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the CBL, suggesting that HC58cDNA was a member of the papain family.
A resource of vectors and ES cells for targeted deletion of microRNAs in mice

PubMed Central

Prosser, Haydn M.; Koike-Yusa, Hiroko; Cooper, James D.; Law, Frances C.; Bradley, Allan

2011-01-01

The 21-23 nucleotide single-stranded RNAs classified as microRNAs (miRNA) perform fundamental roles in a wide range of cellular and developmental processes. miRNAs regulate protein expression through sequence-specific base pairing with target messenger RNAs (mRNA) reducing both their stability and the process of protein translation1, 2. At least 30% of protein coding genes appear to be conserved targets for miRNAs1. In contrast to the protein coding genes3, 4, no public resource of miRNA mouse mutant alleles exists. We have generated a library of highly germ-line transmissible C57BL/6N mouse mutant embryonic stem (ES) cells with targeted deletions for the majority of miRNA genes currently annotated within the miRBase registry5. These alleles have been designed to be highly adaptable research tools that can be efficiently altered to create reporter, conditional and other allelic variants. This ES cell resource can be searched electronically and is available from ES cell repositories for distribution to the scientific community6. PMID:21822254
A gene family for acidic ribosomal proteins in Schizosaccharomyces pombe: two essential and two nonessential genes.

PubMed Central

Beltrame, M; Bianchi, M E

1990-01-01

We have cloned the genes for small acidic ribosomal proteins (A-proteins) of the fission yeast Schizosaccharomyces pombe. S. pombe contains four transcribed genes for small A-proteins per haploid genome, as is the case for Saccharomyces cerevisiae. In contrast, multicellular eucaryotes contain two transcribed genes per haploid genome. The four proteins of S. pombe, besides sharing a high overall similarity, form two couples of nearly identical sequences. Their corresponding genes have a very conserved structure and are transcribed to a similar level. Surprisingly, of each couple of genes coding for nearly identical proteins, one is essential for cell growth, whereas the other is not. We suggest that the unequal importance of the four small A-proteins for cell survival is related to their physical organization in 60S ribosomal subunits. Images PMID:2325655
The Arabidopsis KIN17 and its homolog KLP mediate different aspects of plant growth and development.

PubMed

Garcia-Molina, Antoni; Xing, Shuping; Huijser, Peter

2014-01-01

Proteins harboring the kin17 domain (KIN17) constitute a family of well-conserved eukaryotic nuclear proteins involved in nucleic acid metabolism. In mammals, KIN17 orthologs contribute to DNA replication, RNA splicing, and DNA integrity maintenance. Recently, we reported a functional characterization of an Arabidopsis thaliana KIN17 homolog (AtKIN17) that uncovered a role for this protein in tuning physiological responses during copper (Cu) deficiency and oxidative stress. However, functions similar to those described in mammals may also be expected in plants given the conservation of functional domains in KIN17 orthologs. Here, we provide additional data consistent with the participation of AtKIN17 in controlling general plant growth and development, as well as in response to UV radiation. Furthermore, the Arabidopsis genome codes for a second homolog to KIN17, we referred to as KIN17-like-protein (KLP). KLP loss-of-function lines exhibited a reduced inhibition of root growth in response to copper excess and relatively elongated hypocotyls in etiolated seedlings. Altogether, our experimental data point to a general function of the kin17 domain proteins in plant growth and development.
The Arabidopsis KIN17 and its homolog KLP mediate different aspects of plant growth and development

PubMed Central

Garcia-Molina, Antoni; Xing, Shuping; Huijser, Peter

2014-01-01

Proteins harboring the kin17 domain (KIN17) constitute a family of well-conserved eukaryotic nuclear proteins involved in nucleic acid metabolism. In mammals, KIN17 orthologs contribute to DNA replication, RNA splicing, and DNA integrity maintenance. Recently, we reported a functional characterization of an Arabidopsis thaliana KIN17 homolog (AtKIN17) that uncovered a role for this protein in tuning physiological responses during copper (Cu) deficiency and oxidative stress. However, functions similar to those described in mammals may also be expected in plants given the conservation of functional domains in KIN17 orthologs. Here, we provide additional data consistent with the participation of AtKIN17 in controlling general plant growth and development, as well as in response to UV radiation. Furthermore, the Arabidopsis genome codes for a second homolog to KIN17, we referred to as KIN17-LIKE-PROTEIN (KLP). KLP loss-of-function lines exhibited a reduced inhibition of root growth in response to copper excess and relatively elongated hypocotyls in etiolated seedlings. Altogether, our experimental data point to a general function of the kin17 domain proteins in plant growth and development. PMID:24713636
ChIP-seq Identification of Weakly Conserved Heart Enhancers

PubMed Central

Blow, Matthew J.; McCulley, David J.; Li, Zirong; Zhang, Tao; Akiyama, Jennifer A.; Holt, Amy; Plajzer-Frick, Ingrid; Shoukry, Malak; Wright, Crystal; Chen, Feng; Afzal, Veena; Bristow, James; Ren, Bing; Black, Brian L.; Rubin, Edward M.; Visel, Axel; Pennacchio, Len A.

2011-01-01

Accurate control of tissue-specific gene expression plays a pivotal role in heart development, but few cardiac transcriptional enhancers have thus far been identified. Extreme non-coding sequence conservation successfully predicts enhancers active in many tissues, but fails to identify substantial numbers of heart enhancers. Here we used ChIP-seq with the enhancer-associated protein p300 from mouse embryonic day 11.5 heart tissue to identify over three thousand candidate heart enhancers genome-wide. Compared to other tissues studied at this time-point, most candidate heart enhancers are less deeply conserved in vertebrate evolution. Nevertheless, the testing of 130 candidate regions in a transgenic mouse assay revealed that most of them reproducibly function as enhancers active in the heart, irrespective of their degree of evolutionary constraint. These results provide evidence for a large population of poorly conserved heart enhancers and suggest that the evolutionary constraint of embryonic enhancers can vary depending on tissue type. PMID:20729851
Genomic assessment of the evolution of the prion protein gene family in vertebrates.

PubMed

Harrison, Paul M; Khachane, Amit; Kumar, Manish

2010-05-01

Prion diseases are devastating neurological disorders caused by the propagation of particles containing an alternative beta-sheet-rich form of the prion protein (PrP). Genes paralogous to PrP, called Doppel and Shadoo, have been identified, that also have neuropathological relevance. To aid in the further functional characterization of PrP and its relatives, we annotated completely the PrP gene family (PrP-GF), in the genomes of 42 vertebrates, through combined strategic application of gene prediction programs and advanced remote homology detection techniques (such as HMMs, PSI-TBLASTN and pGenThreader). We have uncovered several previously undescribed paralogous genes and pseudogenes. We find that current high-quality genomic evidence indicates that the PrP relative Doppel, was likely present in the last common ancestor of present-day Tetrapoda, but was lost in the bird lineage, since its divergence from reptiles. Using the new gene annotations, we have defined the consensus of structural features that are characteristic of the PrP and Doppel structures, across diverse Tetrapoda clades. Furthermore, we describe in detail a transcribed pseudogene derived from Shadoo that is conserved across primates, and that overlaps the meiosis gene, SYCE1, thus possibly regulating its expression. In addition, we analysed the locus of PRNP/PRND for significant conservation across the genomic DNA of eleven mammals, and determined the phylogenetic penetration of non-coding exons. The genomic evidence indicates that the second PRNP non-coding exon found in even-toed ungulates and rodents, is conserved in all high-coverage genome assemblies of primates (human, chimp, orang utan and macaque), and is, at least, likely to have fallen out of use during primate speciation. Furthermore, we have demonstrated that the PRNT gene (at the PRNP human locus) is conserved across at least sixteen mammals, and evolves like a long non-coding RNA, fashioned from fragments of ancient, long, interspersed elements. These annotations and evolutionary analyses will be of further use for functional characterisation of the PrP-GF, and will be updatable in a semi-automated fashion as more genomes accumulate. Copyright 2010 Elsevier Inc. All rights reserved.
Behind the curtain of non-coding RNAs; long non-coding RNAs regulating hepatocarcinogenesis

PubMed Central

El Khodiry, Aya; Afify, Menna; El Tayebi, Hend M

2018-01-01

Hepatocellular carcinoma (HCC) is one of the most common and aggressive cancers worldwide. HCC is the fifth common malignancy in the world and the second leading cause of cancer death in Asia. Long non-coding RNAs (lncRNAs) are RNAs with a length greater than 200 nucleotides that do not encode proteins. lncRNAs can regulate gene expression and protein synthesis in several ways by interacting with DNA, RNA and proteins in a sequence specific manner. They could regulate cellular and developmental processes through either gene inhibition or gene activation. Many studies have shown that dysregulation of lncRNAs is related to many human diseases such as cardiovascular diseases, genetic disorders, neurological diseases, immune mediated disorders and cancers. However, the study of lncRNAs is challenging as they are poorly conserved between species, their expression levels aren’t as high as that of mRNAs and have great interpatient variations. The study of lncRNAs expression in cancers have been a breakthrough as it unveils potential biomarkers and drug targets for cancer therapy and helps understand the mechanism of pathogenesis. This review discusses many long non-coding RNAs and their contribution in HCC, their role in development, metastasis, and prognosis of HCC and how to regulate and target these lncRNAs as a therapeutic tool in HCC treatment in the future. PMID:29434445
The complete mitochondrial genome of Lota lota (Gadiformes: Gadidae) from the Burqin River in China.

PubMed

Lu, Zhichuang; Zhang, Nan; Song, Na; Gao, Tianxiang

2016-05-01

In this study, the complete mitochondrial genome (mitogenome) sequence of Lota lota has been determined by long polymerase chain reaction and primer walking methods. The mitogenome is a circular molecule of 16,519 bp in length and contains 37 mitochondrial genes including 13 protein-coding genes, 2 ribosomal RNA (rRNA), 22 transfer RNA (tRNA) and a control region as other bony fishes. Within the control region, we identified the termination-associated sequence domain (TAS), the central conserved sequence block domains (CSB-F and CSB-D), and the conserved sequence block domains (CSB-1, CSB-2 and CSB-3).
Structural architecture of the human long non-coding RNA, steroid receptor RNA activator

PubMed Central

Novikova, Irina V.; Hennelly, Scott P.; Sanbonmatsu, Karissa Y.

2012-01-01

While functional roles of several long non-coding RNAs (lncRNAs) have been determined, the molecular mechanisms are not well understood. Here, we report the first experimentally derived secondary structure of a human lncRNA, the steroid receptor RNA activator (SRA), 0.87 kB in size. The SRA RNA is a non-coding RNA that coactivates several human sex hormone receptors and is strongly associated with breast cancer. Coding isoforms of SRA are also expressed to produce proteins, making the SRA gene a unique bifunctional system. Our experimental findings (SHAPE, in-line, DMS and RNase V1 probing) reveal that this lncRNA has a complex structural organization, consisting of four domains, with a variety of secondary structure elements. We examine the coevolution of the SRA gene at the RNA structure and protein structure levels using comparative sequence analysis across vertebrates. Rapid evolutionary stabilization of RNA structure, combined with frame-disrupting mutations in conserved regions, suggests that evolutionary pressure preserves the RNA structural core rather than its translational product. We perform similar experiments on alternatively spliced SRA isoforms to assess their structural features. PMID:22362738
Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis

PubMed Central

Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia

2011-01-01

Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358

1H, 13C, and 15N resonance assignments for the protein coded by gene locus BB0938 of Bordetella bronchiseptica

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rossi, Paolo; Ramelot, Theresa A.; Xiao, Rong

2005-11-01

The product of gene locus BB0938 from Bordetella bronchiseptica (Swiss-Prot ID: Q7WNU7-BORBR; NESG target ID: BoR11; Wunderlich et al., 2004; Pfam ID: PF03476) is a 128-residue protein of unknown function. This broadly conserved protein family is found in eubacteria and eukaryotes. Using triple resonance NMR techniques, we have determined 98% of backbone and 94% of side chain 1H, 13C, and 15N resonance assignments. The chemical shift and 3J(HN?Ha) scalar coupling data reveal a b topology with a seven-residue helical insert, ??????????. BMRB deposit with accession number 6693. Reference: Wunderlich et al. (2004) Proteins, 56, 181?187.
Characterization of a Gene Coding for the Complement System Component FB from Loxosceles laeta Spider Venom Glands.

PubMed

Myamoto, Daniela Tiemi; Pidde-Queiroz, Giselle; Gonçalves-de-Andrade, Rute Maria; Pedroso, Aurélio; van den Berg, Carmen W; Tambourgi, Denise V

2016-01-01

The human complement system is composed of more than 30 proteins and many of these have conserved domains that allow tracing the phylogenetic evolution. The complement system seems to be initiated with the appearance of C3 and factor B (FB), the only components found in some protostomes and cnidarians, suggesting that the alternative pathway is the most ancient. Here, we present the characterization of an arachnid homologue of the human complement component FB from the spider Loxosceles laeta. This homologue, named Lox-FB, was identified from a total RNA L. laeta spider venom gland library and was amplified using RACE-PCR techniques and specific primers. Analysis of the deduced amino acid sequence and the domain structure showed significant similarity to the vertebrate and invertebrate FB/C2 family proteins. Lox-FB has a classical domain organization composed of a control complement protein domain (CCP), a von Willebrand Factor domain (vWFA), and a serine protease domain (SP). The amino acids involved in Mg2+ metal ion dependent adhesion site (MIDAS) found in the vWFA domain in the vertebrate C2/FB proteins are well conserved; however, the classic catalytic triad present in the serine protease domain is not conserved in Lox-FB. Similarity and phylogenetic analyses indicated that Lox-FB shares a major identity (43%) and has a close evolutionary relationship with the third isoform of FB-like protein (FB-3) from the jumping spider Hasarius adansoni belonging to the Family Salcitidae.
Characterization of a Gene Coding for the Complement System Component FB from Loxosceles laeta Spider Venom Glands

PubMed Central

Myamoto, Daniela Tiemi; Pidde-Queiroz, Giselle; Gonçalves-de-Andrade, Rute Maria; Pedroso, Aurélio; van den Berg, Carmen W.; Tambourgi, Denise V.

2016-01-01

The human complement system is composed of more than 30 proteins and many of these have conserved domains that allow tracing the phylogenetic evolution. The complement system seems to be initiated with the appearance of C3 and factor B (FB), the only components found in some protostomes and cnidarians, suggesting that the alternative pathway is the most ancient. Here, we present the characterization of an arachnid homologue of the human complement component FB from the spider Loxosceles laeta. This homologue, named Lox-FB, was identified from a total RNA L. laeta spider venom gland library and was amplified using RACE-PCR techniques and specific primers. Analysis of the deduced amino acid sequence and the domain structure showed significant similarity to the vertebrate and invertebrate FB/C2 family proteins. Lox-FB has a classical domain organization composed of a control complement protein domain (CCP), a von Willebrand Factor domain (vWFA), and a serine protease domain (SP). The amino acids involved in Mg2+ metal ion dependent adhesion site (MIDAS) found in the vWFA domain in the vertebrate C2/FB proteins are well conserved; however, the classic catalytic triad present in the serine protease domain is not conserved in Lox-FB. Similarity and phylogenetic analyses indicated that Lox-FB shares a major identity (43%) and has a close evolutionary relationship with the third isoform of FB-like protein (FB-3) from the jumping spider Hasarius adansoni belonging to the Family Salcitidae. PMID:26771533
A conserved genetic module that encodes the major virion components in both the coliphage T4 and the marine cyanophage S-PM2

PubMed Central

Hambly, Emma; Tétart, Francoise; Desplats, Carine; Wilson, William H.; Krisch, Henry M.; Mann, Nicholas H.

2001-01-01

Sequence analysis of a 10-kb region of the genome of the marine cyanomyovirus S-PM2 reveals a homology to coliphage T4 that extends as a contiguous block from gene (g)18 to g23. The order of the S-PM2 genes in this region is similar to that of T4, but there are insertions and deletions of small ORFs of unknown function. In T4, g18 codes for the tail sheath, g19, the tail tube, g20, the head portal protein, g21, the prohead core protein, g22, a scaffolding protein, and g23, the major capsid protein. Thus, the entire module that determines the structural components of the phage head and contractile tail is conserved between T4 and this cyanophage. The significant differences in the morphology of these phages must reflect the considerable divergence of the amino acid sequence of their homologous virion proteins, which uniformly exceeds 50%. We suggest that their enormous diversity in the sea could be a result of genetic shuffling between disparate phages mediated by such commonly shared modules. These conserved sequences could facilitate genetic exchange by providing partially homologous substrates for recombination between otherwise divergent phage genomes. Such a mechanism would thus expand the pool of phage genes accessible by recombination to all those phages that share common modules. PMID:11553768
Intrinsic and extrinsic approaches for detecting genes in a bacterial genome.

PubMed Central

Borodovsky, M; Rudd, K E; Koonin, E V

1994-01-01

The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins. Images PMID:7984428
Familial neurohypophyseal diabetes insipidus associated with a novel mutation in the vasopressin-neurophysin II gene.

PubMed

Fujii, H; Iida, S; Moriwaki, K

2000-03-01

Familial neurohypophyseal diabetes insipidus (FNDI) is an autosomal dominant disorder of renal water conservation due to deficiency of arginine vasopressin as the result of mutations in the arginine vasopressin-neurophysin II (AVP-NPII) gene that encodes the hormone or its carrier protein. Thirty-one different mutations have been reported. In this study, we evaluated the AVP-NPII gene in a family with FNDI and identified a new mutation (1911Gright curved arrow A) in the coding sequence for NPII in affected family members. This mutation substitutes Tyr for 74 Cys in the NPII moiety. NPII is an intracellular carrier protein for AVP during the axonal transport from the hypothalamus to the posterior pituitary and contains 14 conserved cysteine residues forming 7 disulfide bonds. Because the mutation cosegregates with the phenotype, it is possible that this mutation causes neurohypophyseal diabetes insipidus in this family.
Ribosomal protein S14 transcripts are edited in Oenothera mitochondria.

PubMed Central

Schuster, W; Unseld, M; Wissinger, B; Brennicke, A

1990-01-01

The gene encoding ribosomal protein S14 (rps14) in Oenothera mitochondria is located upstream of the cytochrome b gene (cob). Sequence analysis of independently derived cDNA clones covering the entire rps14 coding region shows two nucleotides edited from the genomic DNA to the mRNA derived sequences by C to U modifications. A third editing event occurs four nucleotides upstream of the AUG initiation codon and improves a potential ribosome binding site. A CGG codon specifying arginine in a position conserved in evolution between chloroplasts and E. coli as a UGG tryptophan codon is not edited in any of the cDNAs analysed. An inverted repeat 3' of an unidentified open reading frame is located upstream of the rps14 gene. The inverted repeat sequence is highly conserved at analogous regions in other Oenothera mitochondrial loci. Images PMID:2326162
An Autonomous BMP2 Regulatory Element in Mesenchymal Cells

PubMed Central

Kruithof, Boudewijn P.T.; Fritz, David T.; Liu, Yijun; Garsetti, Diane E.; Frank, David B.; Pregizer, Steven K.; Gaussin, Vinciane; Mortlock, Douglas P.; Rogers, Melissa B.

2014-01-01

BMP2 is a morphogen that controls mesenchymal cell differentiation and behavior. For example, BMP2 concentration controls the differentiation of mesenchymal precursors into myocytes, adipocytes, chondrocytes, and osteoblasts. Sequences within the 3′untranslated region (UTR) of the Bmp2 mRNA mediate a post-transcriptional block of protein synthesis. Interaction of cell and developmental stage-specific trans-regulatory factors with the 3′UTR is a nimble and versatile mechanism for modulating this potent morphogen in different cell types. We show here, that an ultra-conserved sequence in the 3′UTR functions independently of promoter, coding region, and 3′UTR context in primary and immortalized tissue culture cells and in transgenic mice. Our findings indicate that the ultra-conserved sequence is an autonomously functioning post-transcriptional element that may be used to modulate the level of BMP2 and other proteins while retaining tissue specific regulatory elements. PMID:21268088
A combinatorial code for pattern formation in Drosophila oogenesis.

PubMed

Yakoby, Nir; Bristow, Christopher A; Gong, Danielle; Schafer, Xenia; Lembong, Jessica; Zartman, Jeremiah J; Halfon, Marc S; Schüpbach, Trudi; Shvartsman, Stanislav Y

2008-11-01

Two-dimensional patterning of the follicular epithelium in Drosophila oogenesis is required for the formation of three-dimensional eggshell structures. Our analysis of a large number of published gene expression patterns in the follicle cells suggests that they follow a simple combinatorial code based on six spatial building blocks and the operations of union, difference, intersection, and addition. The building blocks are related to the distribution of inductive signals, provided by the highly conserved epidermal growth factor receptor and bone morphogenetic protein signaling pathways. We demonstrate the validity of the code by testing it against a set of patterns obtained in a large-scale transcriptional profiling experiment. Using the proposed code, we distinguish 36 distinct patterns for 81 genes expressed in the follicular epithelium and characterize their joint dynamics over four stages of oogenesis. The proposed combinatorial framework allows systematic analysis of the diversity and dynamics of two-dimensional transcriptional patterns and guides future studies of gene regulation.
Human somatostatin I: sequence of the cDNA.

PubMed Central

Shen, L P; Pictet, R L; Rutter, W J

1982-01-01

RNA has been isolated from a human pancreatic somatostatinoma and used to prepare a cDNA library. After prescreening, clones containing somatostatin I sequences were identified by hybridization with an anglerfish somatostatin I-cloned cDNA probe. From the nucleotide sequence of two of these clones, we have deduced an essentially full-length mRNA sequence, including the preprosomatostatin coding region, 105 nucleotides from the 5' untranslated region and the complete 150-nucleotide 3' untranslated region. The coding region predicts a 116-amino acid precursor protein (Mr, 12.727) that contains somatostatin-14 and -28 at its COOH terminus. The predicted amino acid sequence of human somatostatin-28 is identical to that of somatostatin-28 isolated from the porcine and ovine species. A comparison of the amino acid sequences of human and anglerfish preprosomatostatin I indicated that the COOH-terminal region encoding somatostatin-14 and the adjacent 6 amino acids are highly conserved, whereas the remainder of the molecule, including the signal peptide region, is more divergent. However, many of the amino acid differences found in the pro region of the human and anglerfish proteins are conservative changes. This suggests that the propeptides have a similar secondary structure, which in turn may imply a biological function for this region of the molecule. Images PMID:6126875
Fifteen new earthworm mitogenomes shed new light on phylogeny within the Pheretima complex

PubMed Central

Zhang, Liangliang; Sechi, Pierfrancesco; Yuan, Minglong; Jiang, Jibao; Dong, Yan; Qiu, Jiangping

2016-01-01

The Pheretima complex within the Megascolecidae family is a major earthworm group. Recently, the systematic status of the Pheretima complex based on morphology was challenged by molecular studies. In this study, we carry out the first comparative mitogenomic study in oligochaetes. The mitogenomes of 15 earthworm species were sequenced and compared with other 9 available earthworm mitogenomes, with the main aim to explore their phylogenetic relationships and test different analytical approaches on phylogeny reconstruction. The general earthworm mitogenomic features revealed to be conservative: all genes encoded on the same strand, all the protein coding loci shared the same initiation codon (ATG), and tRNA genes showed conserved structures. The Drawida japonica mitogenome displayed the highest A + T content, reversed AT/GC-skews and the highest genetic diversity. Genetic distances among protein coding genes displayed their maximum and minimum interspecific values in the ATP8 and CO1 genes, respectively. The 22 tRNAs showed variable substitution patterns between the considered earthworm mitogenomes. The inclusion of rRNAs positively increased phylogenetic support. Furthermore, we tested different trimming tools for alignment improvement. Our analyses rejected reciprocal monophyly among Amynthas and Metaphire and indicated that the two genera should be systematically classified into one. PMID:26833286
Identification and characterization of novel reptile cathelicidins from elapid snakes.

PubMed

Zhao, Hui; Gan, Tong-Xiang; Liu, Xiao-Dong; Jin, Yang; Lee, Wen-Hui; Shen, Ji-Hong; Zhang, Yun

2008-10-01

Three cDNA sequences coding for elapid cathelicidins were cloned from constructed venom gland cDNA libraries of Naja atra, Bungarus fasciatus and Ophiophagus hannah. The open reading frames of the cloned elapid cathelicidins were all composed of 576bp and coded for 191 amino acid residue protein precursors. Each of the deduced elapid cathelicidin has a 22 amino acid residue signal peptide, a conserved cathelin domain of 135 amino acid residues and a mature antimicrobial peptide of 34 amino acid residues. Unlike the highly divergent cathelicidins in mammals, the nucleotide and deduced protein sequences of the three cloned elapid cathelicidins were remarkably conserved. All the elapid mature cathelicidins were predicted to be cleaved at Valine157 by elastase. OH-CATH, the deduced mature cathelicidin from king cobra, was chemically synthesized and it showed strong antibacterial activity against various bacteria with minimal inhibitory concentration of 1-20microg/ml in the presence of 1% NaCl. Meanwhile, the synthetic peptide showed no haemolytic activity toward human red blood cells even at a high dose of 200microg/ml. Phylogenetic analysis of cathelicidins from vertebrate suggested that elapid and viperid cathelicidins were grouped together in the tree. Snake cathelicidins were evolutionary closely related to the neutrophilic granule proteins (NGPs) from mouse, rat and rabbit. Snake cathelicidins also showed a close relationship with avian fowlicidins (1-3) and chicken myeloid antimicrobial peptide 27. Elapid cathelicidins might be used as models for the development of novel therapeutic drugs.
Online interactive analysis of protein structure ensembles with Bio3D-web.

PubMed

Skjærven, Lars; Jariwala, Shashank; Yao, Xin-Qiu; Grant, Barry J

2016-11-15

Bio3D-web is an online application for analyzing the sequence, structure and conformational heterogeneity of protein families. Major functionality is provided for identifying protein structure sets for analysis, their alignment and refined structure superposition, sequence and structure conservation analysis, mapping and clustering of conformations and the quantitative comparison of their predicted structural dynamics. Bio3D-web is based on the Bio3D and Shiny R packages. All major browsers are supported and full source code is available under a GPL2 license from http://thegrantlab.org/bio3d-web CONTACT: bjgrant@umich.edu or lars.skjarven@uib.no. © The Author 2016. Published by Oxford University Press.
Extracellular Vesicle-Associated RNA as a Carrier of Epigenetic Information

PubMed Central

2017-01-01

Post-transcriptional regulation of messenger RNA (mRNA) metabolism and subcellular localization is of the utmost importance both during development and in cell differentiation. Besides carrying genetic information, mRNAs contain cis-acting signals (zip codes), usually present in their 5′- and 3′-untranslated regions (UTRs). By binding to these signals, trans-acting factors, such as RNA-binding proteins (RBPs), and/or non-coding RNAs (ncRNAs), control mRNA localization, translation and stability. RBPs can also form complexes with non-coding RNAs of different sizes. The release of extracellular vesicles (EVs) is a conserved process that allows both normal and cancer cells to horizontally transfer molecules, and hence properties, to neighboring cells. By interacting with proteins that are specifically sorted to EVs, mRNAs as well as ncRNAs can be transferred from cell to cell. In this review, we discuss the mechanisms underlying the sorting to EVs of different classes of molecules, as well as the role of extracellular RNAs and the associated proteins in altering gene expression in the recipient cells. Importantly, if, on the one hand, RBPs play a critical role in transferring RNAs through EVs, RNA itself could, on the other hand, function as a carrier to transfer proteins (i.e., chromatin modifiers, and transcription factors) that, once transferred, can alter the cell’s epigenome. PMID:28937658
A dehydration-inducible gene in the truffle Tuber borchii identifies a novel group of dehydrins

PubMed Central

Abba', Simona; Ghignone, Stefano; Bonfante, Paola

2006-01-01

Background The expressed sequence tag M6G10 was originally isolated from a screening for differentially expressed transcripts during the reproductive stage of the white truffle Tuber borchii. mRNA levels for M6G10 increased dramatically during fruiting body maturation compared to the vegetative mycelial stage. Results Bioinformatics tools, phylogenetic analysis and expression studies were used to support the hypothesis that this sequence, named TbDHN1, is the first dehydrin (DHN)-like coding gene isolated in fungi. Homologs of this gene, all defined as "coding for hypothetical proteins" in public databases, were exclusively found in ascomycetous fungi and in plants. Although complete (or almost complete) fungal genomes and EST collections of some Basidiomycota and Glomeromycota are already available, DHN-like proteins appear to be represented only in Ascomycota. A new and previously uncharacterized conserved signature pattern was identified and proposed to Uniprot database as the main distinguishing feature of this new group of DHNs. Expression studies provide experimental evidence of a transcript induction of TbDHN1 during cellular dehydration. Conclusion Expression pattern and sequence similarities to known plant DHNs indicate that TbDHN1 is the first characterized DHN-like protein in fungi. The high similarity of TbDHN1 with homolog coding sequences implies the existence of a novel fungal/plant group of LEA Class II proteins characterized by a previously undescribed signature pattern. PMID:16512918
Strategies to Improve Efficiency and Specificity of Degenerate Primers in PCR.

PubMed

Campos, Maria Jorge; Quesada, Alberto

2017-01-01

PCR with degenerate primers can be used to identify the coding sequence of an unknown protein or to detect a genetic variant within a gene family. These primers, which are complex mixtures of slightly different oligonucleotide sequences, can be optimized to increase the efficiency and/or specificity of PCR in the amplification of a sequence of interest by the introduction of mismatches with the target sequence and balancing their position toward the primers 5'- or 3'-ends. In this work, we explain in detail examples of rational design of primers in two different applications, including the use of specific determinants at the 3'-end, to: (1) improve PCR efficiency with coding sequences for members of a protein family by fully degeneration at a core box of conserved genetic information, with the reduction of degeneration at the 5'-end, and (2) optimize specificity of allelic discrimination of closely related orthologous by 5'-end degenerate primers.
Complete mitochondrial genome of the invasive brown alga Sargassum muticum (Sargassaceae, Phaeophyceae).

PubMed

Liu, Feng; Pang, Shaojun

2016-01-01

Sargassum muticum (Yendo) Fensholt is an invasive canopy-forming brown alga, expanding its presence from Northeast Asia to North America and Europe. The complete mitochondrial genome of S. muticum is characterized as a circular molecule of 34,720 bp. The overall AT content of S. muticum mitogenome is 63.41%. This mitogenome contains 65 genes typically found in brown algae, including 3 ribosomal RNA genes, 25 transfer RNA genes, 35 protein-coding genes, and 2 conserved open reading frames (ORFs). The gene order of mitogenome for S. muticum is identical to that for Sargassum horneri, Fucus vesiculosus and Desmarestia viridis. Phylogenetic analyses based on 35 protein-coding genes reveal that S. muticum has a close evolutionary relationship with S. horneri and a distant relationship with Dictyota dichotoma, supporting current taxonomic systems. The present investigation provides new molecular data for studies of S. muticum population diversity as well as comparative genomics in the Phaeophyceae.
Phylogenetic Analysis and Classification of the Fungal bHLH Domain

PubMed Central

Sailsbery, Joshua K.; Atchley, William R.; Dean, Ralph A.

2012-01-01

The basic Helix-Loop-Helix (bHLH) domain is an essential highly conserved DNA-binding domain found in many transcription factors in all eukaryotic organisms. The bHLH domain has been well studied in the Animal and Plant Kingdoms but has yet to be characterized within Fungi. Herein, we obtained and evaluated the phylogenetic relationship of 490 fungal-specific bHLH containing proteins from 55 whole genome projects composed of 49 Ascomycota and 6 Basidiomycota organisms. We identified 12 major groupings within Fungi (F1–F12); identifying conserved motifs and functions specific to each group. Several classification models were built to distinguish the 12 groups and elucidate the most discerning sites in the domain. Performance testing on these models, for correct group classification, resulted in a maximum sensitivity and specificity of 98.5% and 99.8%, respectively. We identified 12 highly discerning sites and incorporated those into a set of rules (simplified model) to classify sequences into the correct group. Conservation of amino acid sites and phylogenetic analyses established that like plant bHLH proteins, fungal bHLH–containing proteins are most closely related to animal Group B. The models used in these analyses were incorporated into a software package, the source code for which is available at www.fungalgenomics.ncsu.edu. PMID:22114358
Comparative and functional characterization of intragenic tandem repeats in 10 Aspergillus genomes.

PubMed

Gibbons, John G; Rokas, Antonis

2009-03-01

Intragenic tandem repeats (ITRs) are consecutive repeats of three or more nucleotides found in coding regions. ITRs are the underlying cause of several human genetic diseases and have been associated with phenotypic variation, including pathogenesis, in several clades of the tree of life. We have examined the evolution and functional role of ITRs in 10 genomes spanning the fungal genus Aspergillus, a clade of relevance to medicine, agriculture, and industry. We identified several hundred ITRs in each of the species examined. ITR content varied extensively between species, with an average 79% of ITRs unique to a given species. For the fraction of conserved ITR regions, sequence comparisons within species and between close relatives revealed that they were highly variable. ITR-containing proteins were evolutionarily less conserved, compositionally distinct, and overrepresented for domains associated with cell-surface localization and function relative to the rest of the proteome. Furthermore, ITRs were preferentially found in proteins involved in transcription, cellular communication, and cell-type differentiation but were underrepresented in proteins involved in metabolism and energy. Importantly, although ITRs were evolutionarily labile, their functional associations appeared. To be remarkably conserved across eukaryotes. Fungal ITRs likely participate in a variety of developmental processes and cell-surface-associated functions, suggesting that their contribution to fungal lifestyle and evolution may be more general than previously assumed.
Identification of Group B Streptococcal Sip Protein, Which Elicits Cross-Protective Immunity

PubMed Central

Brodeur, Bernard R.; Boyer, Martine; Charlebois, Isabelle; Hamel, Josée; Couture, France; Rioux, Clément R.; Martin, Denis

2000-01-01

A protein of group B streptococci (GBS), named Sip for surface immunogenic protein, which is distinct from previously described surface proteins, was identified after immunological screening of a genomic library. Immunoblots using a Sip-specific monoclonal antibody indicated that a protein band with an approximate molecular mass of 53 kDa which did not vary in size was present in every GBS strain tested. Representatives of all nine GBS serotypes were included in the panel of strains. Cloning and sequencing of the sip gene revealed an open reading frame of 1,305 nucleotides coding for a polypeptide of 434 amino acid residues, with a calculated pI of 6.84 and molecular mass of 45.5 kDa. Comparison of the nucleotide sequences from six different strains confirmed with 98% identity that the sip gene is highly conserved among GBS isolates. N-terminal amino acid sequencing also indicated the presence of a 25-amino-acid signal peptide which is cleaved in the mature protein. More importantly, immunization with the recombinant Sip protein efficiently protected CD-1 mice against deadly challenges with six GBS strains of serotypes Ia/c, Ib, II/R, III, V, and VI. The data presented in this study suggest that this highly conserved protein induces cross-protective immunity against GBS infections and emphasize its potential as a universal vaccine candidate. PMID:10992461

RNA Editing in Plant Mitochondria

NASA Astrophysics Data System (ADS)

Hiesel, Rudolf; Wissinger, Bernd; Schuster, Wolfgang; Brennicke, Axel

1989-12-01

Comparative sequence analysis of genomic and complementary DNA clones from several mitochondrial genes in the higher plant Oenothera revealed nucleotide sequence divergences between the genomic and the messenger RNA-derived sequences. These sequence alterations could be most easily explained by specific post-transcriptional nucleotide modifications. Most of the nucleotide exchanges in coding regions lead to altered codons in the mRNA that specify amino acids better conserved in evolution than those encoded by the genomic DNA. Several instances show that the genomic arginine codon CGG is edited in the mRNA to the tryptophan codon TGG in amino acid positions that are highly conserved as tryptophan in the homologous proteins of other species. This editing suggests that the standard genetic code is used in plant mitochondria and resolves the frequent coincidence of CGG codons and tryptophan in different plant species. The apparently frequent and non-species-specific equivalency of CGG and TGG codons in particular suggests that RNA editing is a common feature of all higher plant mitochondria.
Disruption of long-distance highly conserved noncoding elements in neurocristopathies.

PubMed

Amiel, Jeanne; Benko, Sabina; Gordon, Christopher T; Lyonnet, Stanislas

2010-12-01

One of the key discoveries of vertebrate genome sequencing projects has been the identification of highly conserved noncoding elements (CNEs). Some characteristics of CNEs include their high frequency in mammalian genomes, their potential regulatory role in gene expression, and their enrichment in gene deserts nearby master developmental genes. The abnormal development of neural crest cells (NCCs) leads to a broad spectrum of congenital malformation(s), termed neurocristopathies, and/or tumor predisposition. Here we review recent findings that disruptions of CNEs, within or at long distance from the coding sequences of key genes involved in NCC development, result in neurocristopathies via the alteration of tissue- or stage-specific long-distance regulation of gene expression. While most studies on human genetic disorders have focused on protein-coding sequences, these examples suggest that investigation of genomic alterations of CNEs will provide a broader understanding of the molecular etiology of both rare and common human congenital malformations. © 2010 New York Academy of Sciences.
Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome.

PubMed

Bush, Stephen J; Muriuki, Charity; McCulloch, Mary E B; Farquhar, Iseabail L; Clark, Emily L; Hume, David A

2018-04-24

mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci. Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes. Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.
Mitochondrial genome evolution in the Saccharomyces sensu stricto complex.

PubMed

Ruan, Jiangxing; Cheng, Jian; Zhang, Tongcun; Jiang, Huifeng

2017-01-01

Exploring the evolutionary patterns of mitochondrial genomes is important for our understanding of the Saccharomyces sensu stricto (SSS) group, which is a model system for genomic evolution and ecological analysis. In this study, we first obtained the complete mitochondrial sequences of two important species, Saccharomyces mikatae and Saccharomyces kudriavzevii. We then compared the mitochondrial genomes in the SSS group with those of close relatives, and found that the non-coding regions evolved rapidly, including dramatic expansion of intergenic regions, fast evolution of introns and almost 20-fold higher rearrangement rates than those of the nuclear genomes. However, the coding regions, and especially the protein-coding genes, are more conserved than those in the nuclear genomes of the SSS group. The different evolutionary patterns of coding and non-coding regions in the mitochondrial and nuclear genomes may be related to the origin of the aerobic fermentation lifestyle in this group. Our analysis thus provides novel insights into the evolution of mitochondrial genomes.
Identification of Bombyx mori bidensovirus VD1-ORF4 reveals a novel protein associated with viral structural component.

PubMed

Li, Guohui; Hu, Zhaoyang; Guo, Xuli; Li, Guangtian; Tang, Qi; Wang, Peng; Chen, Keping; Yao, Qin

2013-06-01

Bombyx mori bidensovirus (BmBDV) VD1-ORF4 (open reading frame 4, ORF4) consists of 3,318 nucleotides, which codes for a predicted 1,105-amino acid protein containing a conserved DNA polymerase motif. However, its functions in viral propagation remain unknown. In the current study, the transcription of VD1-ORF4 was examined from 6 to 96 h postinfection (p.i.) by RT-PCR, 5'-RACE revealed the transcription initiation site of BmBDV ORF4 to be -16 nucleotides upstream from the start codon, and 3'-RACE revealed the transcription termination site of VD1-ORF4 to be +7 nucleotides downstream from termination codon. Three different proteins were examined in the extracts of BmBDV-infected silkworms midguts by Western blot using raised antibodies against VD1-ORF4 deduced amino acid, and a specific protein band about 53 kDa was further detected in purified virions using the same antibodies. Taken together, BmBDV VD1-ORF4 codes for three or more proteins during the viral life cycle, one of which is a 53 kDa protein and confirmed to be a component of BmBDV virion.
A Very Fast and Angular Momentum Conserving Tree Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marcello, Dominic C., E-mail: dmarce504@gmail.com

There are many methods used to compute the classical gravitational field in astrophysical simulation codes. With the exception of the typically impractical method of direct computation, none ensure conservation of angular momentum to machine precision. Under uniform time-stepping, the Cartesian fast multipole method of Dehnen (also known as the very fast tree code) conserves linear momentum to machine precision. We show that it is possible to modify this method in a way that conserves both angular and linear momenta.
Sequence variations of the bovine prion protein gene (PRNP) in native Korean Hanwoo cattle

PubMed Central

Choi, Sangho

2012-01-01

Bovine spongiform encephalopathy (BSE) is one of the fatal neurodegenerative diseases known as transmissible spongiform encephalopathies (TSEs) caused by infectious prion proteins. Genetic variations correlated with susceptibility or resistance to TSE in humans and sheep have not been reported for bovine strains including those from Holstein, Jersey, and Japanese Black cattle. Here, we investigated bovine prion protein gene (PRNP) variations in Hanwoo cattle [Bos (B.) taurus coreanae], a native breed in Korea. We identified mutations and polymorphisms in the coding region of PRNP, determined their frequency, and evaluated their significance. We identified four synonymous polymorphisms and two non-synonymous mutations in PRNP, but found no novel polymorphisms. The sequence and number of octapeptide repeats were completely conserved, and the haplotype frequency of the coding region was similar to that of other B. taurus strains. When we examined the 23-bp and 12-bp insertion/deletion (indel) polymorphisms in the non-coding region of PRNP, Hanwoo cattle had a lower deletion allele and 23-bp del/12-bp del haplotype frequency than healthy and BSE-affected animals of other strains. Thus, Hanwoo are seemingly less susceptible to BSE than other strains due to the 23-bp and 12-bp indel polymorphisms. PMID:22705734
Analysis of cellulose synthase genes from domesticated apple identifies collinear genes WDR53 and CesA8A: partial co-expression, bicistronic mRNA, and alternative splicing of CESA8A

PubMed Central

Guerriero, Gea; Spadiut, Oliver; Kerschbamer, Christine; Giorno, Filomena; Baric, Sanja; Ezcurra, Inés

2016-01-01

Cellulose synthase (CesA) genes constitute a complex multigene family with six major phylogenetic clades in angiosperms. The recently sequenced genome of domestic apple, Malus×domestica, was mined for CesA genes, by blasting full-length cellulose synthase protein (CESA) sequences annotated in the apple genome against protein databases from the plant models Arabidopsis thaliana and Populus trichocarpa. Thirteen genes belonging to the six angiosperm CesA clades and coding for proteins with conserved residues typical of processive glycosyltransferases from family 2 were detected. Based on their phylogenetic relationship to Arabidopsis CESAs, as well as expression patterns, a nomenclature is proposed to facilitate further studies. Examination of their genomic organization revealed that MdCesA8-A is closely linked and co-oriented with WDR53, a gene coding for a WD40 repeat protein. The WDR53 and CesA8 genes display conserved collinearity in dicots and are partially co-expressed in the apple xylem. Interestingly, the presence of a bicistronic WDR53–CesA8A transcript was detected in phytoplasma-infected phloem tissues of apple. The bicistronic transcript contains a spliced intergenic sequence that is predicted to fold into hairpin structures typical of internal ribosome entry sites, suggesting its potential cap-independent translation. Surprisingly, the CesA8A cistron is alternatively spliced and lacks the zinc-binding domain. The possible roles of WDR53 and the alternatively spliced CESA8 variant during cellulose biosynthesis in M.×domestica are discussed. PMID:23048131
Transcriptome interrogation of human myometrium identifies differentially expressed sense-antisense pairs of protein-coding and long non-coding RNA genes in spontaneous labor at term

PubMed Central

Romero, Roberto; Tarca, Adi; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S.; Kalita, Cynthia A.; Cai, Juan; Yeo, Lami; Lipovich, Leonard

2014-01-01

Objective The mechanisms responsible for normal and abnormal parturition are poorly understood. Myometrial activation leading to regular uterine contractions is a key component of labor. Dysfunctional labor (arrest of dilatation and/or descent) is a leading indication for cesarean delivery. Compelling evidence suggests that most of these disorders are functional in nature, and not the result of cephalopelvic disproportion. The methodology and the datasets afforded by the post-genomic era provide novel opportunities to understand and target gene functions in these disorders. In 2012, the ENCODE Consortium elucidated the extraordinary abundance and functional complexity of long non-coding RNA genes in the human genome. The purpose of the study was to identify differentially expressed long non-coding RNA genes in human myometrium in women in spontaneous labor at term. Materials and Methods Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n=19) and women in spontaneous labor at term (n=20). RNA was extracted and profiled using an Illumina® microarray platform. The analysis of the protein coding genes from this study has been previously reported. Here, we have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. Results Upon considering more than 18,498 distinct lncRNA genes compiled nonredundantly from public experimental data sources, and interrogating 2,634 that matched Illumina microarray probes, we identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an independent experimental method. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site that lacked evolutionary conservation beyond primates. Conclusions We provide for the first time evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known, as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term. PMID:24168098
Identification and Analysis of Jasmonate Pathway Genes in Coffea canephora (Robusta Coffee) by In Silico Approach.

PubMed

Bharathi, Kosaraju; Sreenath, H L

2017-07-01

Coffea canephora is the commonly cultivated coffee species in the world along with Coffea arabica . Different pests and pathogens affect the production and quality of the coffee. Jasmonic acid (JA) is a plant hormone which plays an important role in plants growth, development, and defense mechanisms, particularly against insect pests. The key enzymes involved in the production of JA are lipoxygenase, allene oxide synthase, allene oxide cyclase, and 12-oxo-phytodienoic reductase. There is no report on the genes involved in JA pathway in coffee plants. We made an attempt to identify and analyze the genes coding for these enzymes in C. canephora . First, protein sequences of jasmonate pathway genes from model plant Arabidopsis thaliana were identified in the National Center for Biotechnology Information (NCBI) database. These protein sequences were used to search the web-based database Coffee Genome Hub to identify homologous protein sequences in C. canephora genome using Basic Local Alignment Search Tool (BLAST). Homologous protein sequences for key genes were identified in the C. canephora genome database. Protein sequences of the top matches were in turn used to search in NCBI database using BLAST tool to confirm the identity of the selected proteins and to identify closely related genes in species. The protein sequences from C. canephora database and the top matches in NCBI were aligned, and phylogenetic trees were constructed using MEGA6 software and identified the genetic distance of the respective genes. The study identified the four key genes of JA pathway in C. canephora , confirming the conserved nature of the pathway in coffee. The study expected to be useful to further explore the defense mechanisms of coffee plants. JA is a plant hormone that plays an important role in plant defense against insect pests. Genes coding for the 4 key enzymes involved in the production of JA viz., LOX, AOS, AOC, and OPR are identified in C. canephora (robusta coffee) by bioinformatic approaches confirming the conserved nature of the pathway in coffee. The findings are useful to understand the defense mechanisms of C. canephora and coffee breeding in the long run. JA is a plant hormone that plays an important role in plant defense against insect pests. Genes coding for the 4 key enzymes involved in the production of JA viz., LOX, AOS, AOC and OPR were identified and analyzed in C. canephora (robusta coffee) by in silico approach. The study has confirmed the conserved nature of JA pathway in coffee; the findings are useful to further explore the defense mechanisms of coffee plants. Abbreviations used: C. canephora : Coffea canephora ; C. arabica : Coffea arabica ; JA: Jasmonic acid; CGH: Coffee Genome Hub; NCBI: National Centre for Biotechnology Information; BLAST: Basic Local Alignment Search Tool; A. thaliana : Arabidopsis thaliana ; LOX: Lipoxygenase, AOS: Allene oxide synthase; AOC: Allene oxide cyclase; OPR: 12 oxo phytodienoic reductase.
Complete mitochondrial genome of Platevindex sp. (Gastropoda: Pulmonata: Systellommatophora: Onchidiidae).

PubMed

Liu, Chen; Shen, He Ding; Zhou, Na

2016-01-01

The complete mitochondrial genome sequence of Platevindex sp. is firstly described in the article. The mitogenome (13,908 bp) contains 22 tRNA genes, 2 ribosomal RNA genes and 13 protein-coding genes, and 1 putative control region (CR). CR is not well characterized due to lack of discrete conserved sequence blocks. This characteristic is similar with CRs of other invertebrate mitochondrial genomes. The characteristic is the typical bivalvia mitochondrial gene composition.
Protein composition of oil bodies from mature Brassica napus seeds.

PubMed

Jolivet, Pascale; Boulard, Céline; Bellamy, Annick; Larré, Colette; Barre, Marion; Rogniaux, Hélène; d'Andréa, Sabine; Chardot, Thierry; Nesi, Nathalie

2009-06-01

Seed oil bodies (OBs) are intracellular particles storing lipids as food or biofuel reserves in oleaginous plants. Since Brassica napus OBs could be easily contaminated with protein bodies and/or myrosin cells, they must be purified step by step using floatation technique in order to remove non-specifically trapped proteins. An exhaustive description of the protein composition of rapeseed OBs from two double-zero varieties was achieved by a combination of proteomic and genomic tools. Genomic analysis led to the identification of sequences coding for major seed oil body proteins, including 19 oleosins, 5 steroleosins and 9 caleosins. Most of these proteins were also identified through proteomic analysis and displayed a high level of sequence conservation with their Arabidopsis thaliana counterparts. Two rapeseed oleosin orthologs appeared acetylated on their N-terminal alanine residue and both caleosins and steroleosins displayed a low level of phosphorylation.
Analysis of differential selective forces acting on the coat protein (P3) of the plant virus family Luteoviridae.

PubMed

Torres, Marina W; Corrêa, Régis L; Schrago, Carlos G

2005-12-30

The coat protein (CP) of the family Luteoviridae is directly associated with the success of infection. It participates in various steps of the virus life cycle, such as virion assembly, stability, systemic infection, and transmission. Despite its importance, extensive studies on the molecular evolution of this protein are lacking. In the present study, we investigate the action of differential selective forces on the CP coding region using maximum likelihood methods. We found that the protein is subjected to heterogeneous selective pressures and some sites may be evolving near neutrality. Based on the proposed 3-D model of the CP S-domain, we showed that nearly neutral sites are predominantly located in the region of the protein that faces the interior of the capsid, in close contact with the viral RNA, while highly conserved sites are mainly part of beta-strands, in the protein's major framework.
Chimeric mitochondrial minichromosomes of the human body louse, Pediculus humanus: evidence for homologous and non-homologous recombination.

PubMed

Shao, Renfu; Barker, Stephen C

2011-02-15

The mitochondrial (mt) genome of the human body louse, Pediculus humanus, consists of 18 minichromosomes. Each minichromosome is 3 to 4 kb long and has 1 to 3 genes. There is unequivocal evidence for recombination between different mt minichromosomes in P. humanus. It is not known, however, how these minichromosomes recombine. Here, we report the discovery of eight chimeric mt minichromosomes in P. humanus. We classify these chimeric mt minichromosomes into two groups: Group I and Group II. Group I chimeric minichromosomes contain parts of two different protein-coding genes that are from different minichromosomes. The two parts of protein-coding genes in each Group I chimeric minichromosome are joined at a microhomologous nucleotide sequence; microhomologous nucleotide sequences are hallmarks of non-homologous recombination. Group II chimeric minichromosomes contain all of the genes and the non-coding regions of two different minichromosomes. The conserved sequence blocks in the non-coding regions of Group II chimeric minichromosomes resemble the "recombination repeats" in the non-coding regions of the mt genomes of higher plants. These repeats are essential to homologous recombination in higher plants. Our analyses of the nucleotide sequences of chimeric mt minichromosomes indicate both homologous and non-homologous recombination between minichromosomes in the mitochondria of the human body louse. Copyright © 2010 Elsevier B.V. All rights reserved.
microRNA Therapeutics in Cancer - An Emerging Concept.

PubMed

Shah, Maitri Y; Ferrajoli, Alessandra; Sood, Anil K; Lopez-Berestein, Gabriel; Calin, George A

2016-10-01

MicroRNAs (miRNAs) are an evolutionarily conserved class of small, regulatory non-coding RNAs that negatively regulate protein coding gene and other non-coding transcripts expression. miRNAs have been established as master regulators of cellular processes, and they play a vital role in tumor initiation, progression and metastasis. Further, widespread deregulation of microRNAs have been reported in several cancers, with several microRNAs playing oncogenic and tumor suppressive roles. Based on these, miRNAs have emerged as promising therapeutic tools for cancer management. In this review, we have focused on the roles of miRNAs in tumorigenesis, the miRNA-based therapeutic strategies currently being evaluated for use in cancer, and the advantages and current challenges to their use in the clinic. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
The complete mitochondrial genome of Pholis nebulosus (Perciformes: Pholidae).

PubMed

Wang, Zhongquan; Qin, Kaili; Liu, Jingxi; Song, Na; Han, Zhiqiang; Gao, Tianxiang

2016-11-01

In this study, the complete mitochondrial genome (mitogenome) sequence of Pholis nebulosus has been determined by long polymerase chain reaction and primer-walking methods. The mitogenome is a circular molecule of 16 524 bp in length, including the typical structure of 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 2 non-coding regions (L-strand replication origin and control region), the gene contents of which are identical to those observed in most bony fishes. Within the control region, we identified the termination-associated sequence domain (TAS), and the conserved sequence block domain (CSB-F, CSB-E, CSB-D, CSB-C, CSB-B, CSB-A, CSB-1, CSB-2, CSB-3).
Building Standards and Codes for Energy Conservation

ERIC Educational Resources Information Center

Gross, James G.; Pierlert, James H.

1977-01-01

Current activity intended to lead to energy conservation measures in building codes and standards is reviewed by members of the Office of Building Standards and Codes Services of the National Bureau of Standards. For journal availability see HE 508 931. (LBH)
Type IV pili of Acidithiobacillus ferrooxidans can transfer electrons from extracellular electron donors.

PubMed

Li, Yongquan; Li, Hongyu

2014-03-01

Studies on Acidithiobacillus ferrooxidans accepting electrons from Fe(II) have previously focused on cytochrome c. However, we have discovered that, besides cytochrome c, type IV pili (Tfp) can transfer electrons. Here, we report conduction by Tfp of A. ferrooxidans analyzed with a conducting-probe atomic force microscope (AFM). The results indicate that the Tfp of A. ferrooxidans are highly conductive. The genome sequence of A. ferrooxidans ATCC 23270 contains two genes, pilV and pilW, which code for pilin domain proteins with the conserved amino acids characteristic of Tfp. Multiple alignment analysis of the PilV and PilW (pilin) proteins indicated that pilV is the adhesin gene while pilW codes for the major protein element of Tfp. The likely function of Tfp is to complete the circuit between the cell surface and Fe(II) oxides. These results indicate that Tfp of A. ferrooxidans might serve as biological nanowires transferring electrons from the surface of Fe(II) oxides to the cell surface. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Adaptive evolution of the matrix extracellular phosphoglycoprotein in mammals

PubMed Central

2011-01-01

Background Matrix extracellular phosphoglycoprotein (MEPE) belongs to a family of small integrin-binding ligand N-linked glycoproteins (SIBLINGs) that play a key role in skeleton development, particularly in mineralization, phosphate regulation and osteogenesis. MEPE associated disorders cause various physiological effects, such as loss of bone mass, tumors and disruption of renal function (hypophosphatemia). The study of this developmental gene from an evolutionary perspective could provide valuable insights on the adaptive diversification of morphological phenotypes in vertebrates. Results Here we studied the adaptive evolution of the MEPE gene in 26 Eutherian mammals and three birds. The comparative genomic analyses revealed a high degree of evolutionary conservation of some coding and non-coding regions of the MEPE gene across mammals indicating a possible regulatory or functional role likely related with mineralization and/or phosphate regulation. However, the majority of the coding region had a fast evolutionary rate, particularly within the largest exon (1467 bp). Rodentia and Scandentia had distinct substitution rates with an increased accumulation of both synonymous and non-synonymous mutations compared with other mammalian lineages. Characteristics of the gene (e.g. biochemical, evolutionary rate, and intronic conservation) differed greatly among lineages of the eight mammalian orders. We identified 20 sites with significant positive selection signatures (codon and protein level) outside the main regulatory motifs (dentonin and ASARM) suggestive of an adaptive role. Conversely, we find three sites under selection in the signal peptide and one in the ASARM motif that were supported by at least one selection model. The MEPE protein tends to accumulate amino acids promoting disorder and potential phosphorylation targets. Conclusion MEPE shows a high number of selection signatures, revealing the crucial role of positive selection in the evolution of this SIBLING member. The selection signatures were found mainly outside the functional motifs, reinforcing the idea that other regions outside the dentonin and the ASARM might be crucial for the function of the protein and future studies should be undertaken to understand its importance. PMID:22103247
Nucleotide sequence of the gene for the Mr 32,000 thylakoid membrane protein from Spinacia oleracea and Nicotiana debneyi predicts a totally conserved primary translation product of Mr 38,950

PubMed Central

Zurawski, Gerard; Bohnert, Hans J.; Whitfeld, Paul R.; Bottomley, Warwick

1982-01-01

The gene for the so-called Mr 32,000 rapidly labeled photosystem II thylakoid membrane protein (here designated psbA) of spinach (Spinacia oleracea) chloroplasts is located on the chloroplast DNA in the large single-copy region immediately adjacent to one of the inverted repeat sequences. In this paper we show that the size of the mRNA for this protein is ≈ 1.25 kilobases and that the direction of transcription is towards the inverted repeat unit. The nucleotide sequence of the gene and its flanking regions is presented. The only large open reading frame in the sequence codes for a protein of Mr 38,950. The nucleotide sequence of psbA from Nicotiana debneyi also has been determined, and comparison of the sequences from the two species shows them to be highly conserved (>95% homology) throughout the entire reading frame. Conservation of the amino acid sequence is absolute, there being no changes in a total of 353 residues. This leads us to conclude that the primary translation product of psbA must be a protein of Mr 38,950. The protein is characterized by the complete absence of lysine residues and is relatively rich in hydrophobic amino acids, which tend to be clustered. Transcription of spinach psbA starts about 86 base pairs before the first ATG codon. Immediately upstream from this point there is a sequence typical of that found in E. coli promoters. An almost identical sequence occurs in the equivalent region of N. debneyi DNA. Images PMID:16593262

Characterization and Evolution of Conserved MicroRNA through Duplication Events in Date Palm (Phoenix dactylifera)

PubMed Central

Yang, Yaodong; Mason, Annaliese S.; Lei, Xintao; Ma, Zilong

2013-01-01

MicroRNAs (miRNAs) are important regulators of gene expression at the post-transcriptional level in a wide range of species. Highly conserved miRNAs regulate ancestral transcription factors common to all plants, and control important basic processes such as cell division and meristem function. We selected 21 conserved miRNA families to analyze the distribution and maintenance of miRNAs. Recently, the first genome sequence in Palmaceae was released: date palm (Phoenix dactylifera). We conducted a systematic miRNA analysis in date palm, computationally identifying and characterizing the distribution and duplication of conserved miRNAs in this species compared to other published plant genomes. A total of 81 miRNAs belonging to 18 miRNA families were identified in date palm. The majority of miRNAs in date palm and seven other well-studied plant species were located in intergenic regions and located 4 to 5 kb away from the nearest protein-coding genes. Sequence comparison showed that 67% of date palm miRNA members were present in duplicated segments, and that 135 pairs of miRNA-containing segments were duplicated in Arabidopsis, tomato, orange, rice, apple, poplar and soybean with a high similarity of non coding sequences between duplicated segments, indicating genomic duplication was a major force for expansion of conserved miRNAs. Duplicated miRNA pairs in date palm showed divergence in pre-miRNA sequence and in number of promoters, implying that these duplicated pairs may have undergone divergent evolution. Comparisons between date palm and the seven other plant species for the gain/loss of miR167 loci in an ancient segment shared between monocots and dicots suggested that these conserved miRNAs were highly influenced by and diverged as a result of genomic duplication events. PMID:23951162
Characterization and evolution of conserved MicroRNA through duplication events in date palm (Phoenix dactylifera).

PubMed

Xiao, Yong; Xia, Wei; Yang, Yaodong; Mason, Annaliese S; Lei, Xintao; Ma, Zilong

2013-01-01

MicroRNAs (miRNAs) are important regulators of gene expression at the post-transcriptional level in a wide range of species. Highly conserved miRNAs regulate ancestral transcription factors common to all plants, and control important basic processes such as cell division and meristem function. We selected 21 conserved miRNA families to analyze the distribution and maintenance of miRNAs. Recently, the first genome sequence in Palmaceae was released: date palm (Phoenix dactylifera). We conducted a systematic miRNA analysis in date palm, computationally identifying and characterizing the distribution and duplication of conserved miRNAs in this species compared to other published plant genomes. A total of 81 miRNAs belonging to 18 miRNA families were identified in date palm. The majority of miRNAs in date palm and seven other well-studied plant species were located in intergenic regions and located 4 to 5 kb away from the nearest protein-coding genes. Sequence comparison showed that 67% of date palm miRNA members were present in duplicated segments, and that 135 pairs of miRNA-containing segments were duplicated in Arabidopsis, tomato, orange, rice, apple, poplar and soybean with a high similarity of non coding sequences between duplicated segments, indicating genomic duplication was a major force for expansion of conserved miRNAs. Duplicated miRNA pairs in date palm showed divergence in pre-miRNA sequence and in number of promoters, implying that these duplicated pairs may have undergone divergent evolution. Comparisons between date palm and the seven other plant species for the gain/loss of miR167 loci in an ancient segment shared between monocots and dicots suggested that these conserved miRNAs were highly influenced by and diverged as a result of genomic duplication events.
Specific DNA binding of the two chicken Deformed family homeodomain proteins, Chox-1.4 and Chox-a.

PubMed Central

Sasaki, H; Yokoyama, E; Kuroiwa, A

1990-01-01

The cDNA clones encoding two chicken Deformed (Dfd) family homeobox containing genes Chox-1.4 and Chox-a were isolated. Comparison of their amino acid sequences with another chicken Dfd family homeodomain protein and with those of mouse homologues revealed that strong homologies are located in the amino terminal regions and around the homeodomains. Although homologies in other regions were relatively low, some short conserved sequences were also identified. E. coli-made full length proteins were purified and used for the production of specific antibodies and for DNA binding studies. The binding profiles of these proteins to the 5'-leader and 5'-upstream sequences of Chox-1.4 and Chox-a coding regions were analyzed by immunoprecipitation and DNase I footprint assays. These two Chox proteins bound to the same sites in the 5'-flanking sequences of their coding regions with various affinities and their binding affinities to each site were nearly the same. The consensus sequences of the high and low affinity binding sites were TAATGA(C/G) and CTAATTTT, respectively. A clustered binding site was identified in the 5'-upstream of the Chox-a gene, suggesting that this clustered binding site works as a cis-regulatory element for auto- and/or cross-regulation of Chox-a gene expression. Images PMID:1970866
Mechanisms of radiation-induced gene responses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Woloschak, G.E.; Paunesku, T.

1996-10-01

In the process of identifying genes differentially expressed in cells exposed ultraviolet radiation, we have identified a transcript having a 26-bp region that is highly conserved in a variety of species including Bacillus circulans, yeast, pumpkin, Drosophila, mouse, and man. When the 5` region (flanking region or UTR) of a gene, the sequence is predominantly in +/+ orientation with respect to the coding DNA strand; while in the coding region and the 3` region (UTR), the sequence is most frequently in the +/-orientation with respect to the coding DNA strand. In two genes, the element is split into two parts;more » however, in most cases, it is found only once but with a minimum of 11 consecutive nucleotides precisely depicting the original sequence. The element is found in a large number of different genes with diverse functions (from human ras p21 to B. circulans chitonase). Gel shift assays demonstrated the presence of a protein in HeLa cell extracts that binds to the sense and antisense single-stranded consensus oligomers, as well as to the double- stranded oligonucleotide. When double-stranded oligomer was used, the size shift demonstrated as additional protein-oligomer complex larger than the one bound to either sense or antisense single-stranded consensus oligomers alone. It is speculated either that this element binds to protein(s) important in maintaining DNA is a single-stranded orientation for transcription or, alternatively that this element is important in the transcription-coupled DNA repair process.« less
Dose-sensitivity, conserved non-coding sequences, and duplicate gene retention through multiple tetraploidies in the grasses.

PubMed

Schnable, James C; Pedersen, Brent S; Subramaniam, Sabarinath; Freeling, Michael

2011-01-01

Whole genome duplications, or tetraploidies, are an important source of increased gene content. Following whole genome duplication, duplicate copies of many genes are lost from the genome. This loss of genes is biased both in the classes of genes deleted and the subgenome from which they are lost. Many or all classes are genes preferentially retained as duplicate copies are engaged in dose sensitive protein-protein interactions, such that deletion of any one duplicate upsets the status quo of subunit concentrations, and presumably lowers fitness as a result. Transcription factors are also preferentially retained following every whole genome duplications studied. This has been explained as a consequence of protein-protein interactions, just as for other highly retained classes of genes. We show that the quantity of conserved noncoding sequences (CNSs) associated with genes predicts the likelihood of their retention as duplicate pairs following whole genome duplication. As many CNSs likely represent binding sites for transcriptional regulators, we propose that the likelihood of gene retention following tetraploidy may also be influenced by dose-sensitive protein-DNA interactions between the regulatory regions of CNS-rich genes - nicknamed bigfoot genes - and the proteins that bind to them. Using grass genomes, we show that differential loss of CNSs from one member of a pair following the pre-grass tetraploidy reduces its chance of retention in the subsequent maize lineage tetraploidy.
Rhodopseudomonas palustris CGA010 Proteome Implicates Extracytoplasmic Function Sigma Factor in Stress Response

DOE PAGES

Allen, Michael S.; Hurst, Gregory B.; Lu, Tse-Yuan S.; ...

2015-04-08

Rhodopseudomonas palustris encodes 16 extracytoplasmic function (ECF) σ factors. In this paper, to begin to investigate the regulatory network of one of these ECF σ factors, the whole proteome of R. palustris CGA010 was quantitatively analyzed by tandem mass spectrometry from cultures episomally expressing the ECF σ RPA4225 (ecfT) versus a WT control. Among the proteins with the greatest increase in abundance were catalase KatE, trehalose synthase, a DPS-like protein, and several regulatory proteins. Alignment of the cognate promoter regions driving expression of several upregulated proteins suggested a conserved binding motif in the -35 and -10 regions with the consensusmore » sequence GGAAC-18N-TT. Additionally, the putative anti-σ factor RPA4224, whose gene is contained in the same predicted operon as RPA4225, was identified as interacting directly with the predicted response regulator RPA4223 by mass spectrometry of affinity-isolated protein complexes. Furthermore, another gene (RPA4226) coding for a protein that contains a cytoplasmic histidine kinase domain is located immediately upstream of RPA4225. The genomic organization of orthologs for these four genes is conserved in several other strains of R. palustris as well as in closely related α-Proteobacteria. Finally, taken together, these data suggest that ECF σ RPA4225 and the three additional genes make up a sigma factor mimicry system in R. palustris.« less
Rhodopseudomonas palustris CGA010 Proteome Implicates Extracytoplasmic Function Sigma Factor in Stress Response

DOE Office of Scientific and Technical Information (OSTI.GOV)

Allen, Michael S.; Hurst, Gregory B.; Lu, Tse-Yuan S.

Rhodopseudomonas palustris encodes 16 extracytoplasmic function (ECF) σ factors. In this paper, to begin to investigate the regulatory network of one of these ECF σ factors, the whole proteome of R. palustris CGA010 was quantitatively analyzed by tandem mass spectrometry from cultures episomally expressing the ECF σ RPA4225 (ecfT) versus a WT control. Among the proteins with the greatest increase in abundance were catalase KatE, trehalose synthase, a DPS-like protein, and several regulatory proteins. Alignment of the cognate promoter regions driving expression of several upregulated proteins suggested a conserved binding motif in the -35 and -10 regions with the consensusmore » sequence GGAAC-18N-TT. Additionally, the putative anti-σ factor RPA4224, whose gene is contained in the same predicted operon as RPA4225, was identified as interacting directly with the predicted response regulator RPA4223 by mass spectrometry of affinity-isolated protein complexes. Furthermore, another gene (RPA4226) coding for a protein that contains a cytoplasmic histidine kinase domain is located immediately upstream of RPA4225. The genomic organization of orthologs for these four genes is conserved in several other strains of R. palustris as well as in closely related α-Proteobacteria. Finally, taken together, these data suggest that ECF σ RPA4225 and the three additional genes make up a sigma factor mimicry system in R. palustris.« less
Deep sequencing of Salmonella RNA associated with heterologous Hfq proteins in vivo reveals small RNAs as a major target class and identifies RNA processing phenotypes.

PubMed

Sittka, Alexandra; Sharma, Cynthia M; Rolle, Katarzyna; Vogel, Jörg

2009-01-01

The bacterial Sm-like protein, Hfq, is a key factor for the stability and function of small non-coding RNAs (sRNAs) in Escherichia coli. Homologues of this protein have been predicted in many distantly related organisms yet their functional conservation as sRNA-binding proteins has not entirely been clear. To address this, we expressed in Salmonella the Hfq proteins of two eubacteria (Neisseria meningitides, Aquifex aeolicus) and an archaeon (Methanocaldococcus jannaschii), and analyzed the associated RNA by deep sequencing. This in vivo approach identified endogenous Salmonella sRNAs as a major target of the foreign Hfq proteins. New Salmonella sRNA species were also identified, and some of these accumulated specifically in the presence of a foreign Hfq protein. In addition, we observed specific RNA processing defects, e.g., suppression of precursor processing of SraH sRNA by Methanocaldococcus Hfq, or aberrant accumulation of extracytoplasmic target mRNAs of the Salmonella GcvB, MicA or RybB sRNAs. Taken together, our study provides evidence of a conserved inherent sRNA-binding property of Hfq, which may facilitate the lateral transmission of regulatory sRNAs among distantly related species. It also suggests that the expression of heterologous RNA-binding proteins combined with deep sequencing analysis of RNA ligands can be used as a molecular tool to dissect individual steps of RNA metabolism in vivo.
Transcripts of the NADH-dehydrogenase subunit 3 gene are differentially edited in Oenothera mitochondria.

PubMed Central

Schuster, W; Wissinger, B; Unseld, M; Brennicke, A

1990-01-01

A number of cytosines are altered to be recognized as uridines in transcripts of the nad3 locus in mitochondria of the higher plant Oenothera. Such nucleotide modifications can be found at 16 different sites within the nad3 coding region. Most of these alterations in the mRNA sequence change codon identities to specify amino acids better conserved in evolution. Individual cDNA clones differ in their degree of editing at five nucleotide positions, three of which are silent, while two lead to codon alterations specifying different amino acids. None of the cDNA clones analysed is maximally edited at all possible sites, suggesting slow processing or lowered stringency of editing at these nucleotides. Differentially edited transcripts could be editing intermediates or could code for differing polypeptides. Two edited nucleotides in an open reading frame located upstream of nad3 change two amino acids in the deduced polypeptide. Part of the well-conserved ribosomal protein gene rps12 also encoded downstream of nad3 in other plants, is lost in Oenothera mitochondria by recombination events. The functional rps12 protein must be imported from the cytoplasm since the deleted sequences of this gene are not found in the Oenothera mitochondrial genome. The pseudogene sequence is not edited at any nucleotide position. Images Fig. 3. Fig. 4. Fig. 7. PMID:1688531
Posttranscriptional regulation of the immediate-early gene EGR1 by light in the mouse retina.

PubMed

Simon, Perikles; Schott, Klaus; Williams, Robert W; Schaeffel, Frank

2004-12-01

Synaptic plasticity is modulated by differential regulation of transcription factors such as EGR1 which binds to DNA via a zinc finger binding domain. Inactivation of EGR1 has implicated this gene as a key regulator of memory formation and learning. However, it remains puzzling how synaptic input can lead to an up-regulation of the EGR-1 protein within only a few minutes. Here, we show by immunohistochemical staining that the EGR-1 protein is localized in synapses throughout the mouse retina. We demonstrate for the first time that two variants of Egr-1 mRNA are produced in the retina by alternative polyadenylation, with the longer version having an additional 293 base pairs at the end of the 3'UTR. Remarkably, the use of the alternative polyadenylation site is controlled by light. The additional 3'UTR sequence of the longer variant displays an even higher level of phylogenetic conservation than the coding region of this highly conserved gene. Additionally, it harbours a cytoplasmic polyadenylation element which is known to respond to NMDA receptor activation. The longer version of the Egr-1 mRNA could therefore rapidly respond to excitatory stimuli such as light or glutamate release whereas the short variant, which is predominantly expressed and contains the full coding sequence, lacks the regulatory elements for cytoplasmic polyadenylation in its 3'UTR.
The identification and functional annotation of RNA structures conserved in vertebrates

PubMed Central

Seemann, Stefan E.; Mirza, Aashiq H.; Hansen, Claus; Bang-Berthelsen, Claus H.; Garde, Christian; Christensen-Dalsgaard, Mikkel; Torarinsson, Elfar; Yao, Zizhen; Workman, Christopher T.; Pociot, Flemming; Nielsen, Henrik; Tommerup, Niels; Ruzzo, Walter L.; Gorodkin, Jan

2017-01-01

Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After careful correction for sequence identity and GC content, we predict ∼516,000 human genomic regions containing CRSs. We find that a substantial fraction of human–mouse CRS regions (1) colocalize consistently with binding sites of the same RNA binding proteins (RBPs) or (2) are transcribed in corresponding tissues. Additionally, a CaptureSeq experiment revealed expression of many of our CRS regions in human fetal brain, including 662 novel ones. For selected human and mouse candidate pairs, qRT-PCR and in vitro RNA structure probing supported both shared expression and shared structure despite low abundance and low sequence identity. About 30,000 CRS regions are located near coding or long noncoding RNA genes or within enhancers. Structured (CRS overlapping) enhancer RNAs and extended 3′ ends have significantly increased expression levels over their nonstructured counterparts. Our findings of transcribed uncharacterized regulatory regions that contain CRSs support their RNA-mediated functionality. PMID:28487280
The increasing diversity of functions attributed to the SAFB family of RNA-/DNA-binding proteins.

PubMed

Norman, Michael; Rivers, Caroline; Lee, Youn-Bok; Idris, Jalilah; Uney, James

2016-12-01

RNA-binding proteins play a central role in cellular metabolism by orchestrating the complex interactions of coding, structural and regulatory RNA species. The SAFB (scaffold attachment factor B) proteins (SAFB1, SAFB2 and SAFB-like transcriptional modulator, SLTM), which are highly conserved evolutionarily, were first identified on the basis of their ability to bind scaffold attachment region DNA elements, but attention has subsequently shifted to their RNA-binding and protein-protein interactions. Initial studies identified the involvement of these proteins in the cellular stress response and other aspects of gene regulation. More recently, the multifunctional capabilities of SAFB proteins have shown that they play crucial roles in DNA repair, processing of mRNA and regulatory RNA, as well as in interaction with chromatin-modifying complexes. With the advent of new techniques for identifying RNA-binding sites, enumeration of individual RNA targets has now begun. This review aims to summarise what is currently known about the functions of SAFB proteins. © 2016 The Author(s).
Uncovering the functional constraints underlying the genomic organization of the odorant-binding protein genes.

PubMed

Librado, Pablo; Rozas, Julio

2013-01-01

Animal olfactory systems have a critical role for the survival and reproduction of individuals. In insects, the odorant-binding proteins (OBPs) are encoded by a moderately sized gene family, and mediate the first steps of the olfactory processing. Most OBPs are organized in clusters of a few paralogs, which are conserved over time. Currently, the biological mechanism explaining the close physical proximity among OBPs is not yet established. Here, we conducted a comprehensive study aiming to gain insights into the mechanisms underlying the OBP genomic organization. We found that the OBP clusters are embedded within large conserved arrangements. These organizations also include other non-OBP genes, which often encode proteins integral to plasma membrane. Moreover, the conservation degree of such large clusters is related to the following: 1) the promoter architecture of the confined genes, 2) a characteristic transcriptional environment, and 3) the chromatin conformation of the chromosomal region. Our results suggest that chromatin domains may restrict the location of OBP genes to regions having the appropriate transcriptional environment, leading to the OBP cluster structure. However, the appropriate transcriptional environment for OBP and the other neighbor genes is not dominated by reduced levels of expression noise. Indeed, the stochastic fluctuations in the OBP transcript abundance may have a critical role in the combinatorial nature of the olfactory coding process.
Identification of MicroRNAs in the Coral Stylophora pistillata

PubMed Central

Liew, Yi Jin; Aranda, Manuel; Carr, Adrian; Baumgarten, Sebastian; Zoccola, Didier; Tambutté, Sylvie; Allemand, Denis; Micklem, Gos; Voolstra, Christian R.

2014-01-01

Coral reefs are major contributors to marine biodiversity. However, they are in rapid decline due to global environmental changes such as rising sea surface temperatures, ocean acidification, and pollution. Genomic and transcriptomic analyses have broadened our understanding of coral biology, but a study of the microRNA (miRNA) repertoire of corals is missing. miRNAs constitute a class of small non-coding RNAs of ∼22 nt in size that play crucial roles in development, metabolism, and stress response in plants and animals alike. In this study, we examined the coral Stylophora pistillata for the presence of miRNAs and the corresponding core protein machinery required for their processing and function. Based on small RNA sequencing, we present evidence for 31 bona fide microRNAs, 5 of which (miR-100, miR-2022, miR-2023, miR-2030, and miR-2036) are conserved in other metazoans. Homologues of Argonaute, Piwi, Dicer, Drosha, Pasha, and HEN1 were identified in the transcriptome of S. pistillata based on strong sequence conservation with known RNAi proteins, with additional support derived from phylogenetic trees. Examination of putative miRNA gene targets indicates potential roles in development, metabolism, immunity, and biomineralisation for several of the microRNAs. Here, we present first evidence of a functional RNAi machinery and five conserved miRNAs in S. pistillata, implying that miRNAs play a role in organismal biology of scleractinian corals. Analysis of predicted miRNA target genes in S. pistillata suggests potential roles of miRNAs in symbiosis and coral calcification. Given the importance of miRNAs in regulating gene expression in other metazoans, further expression analyses of small non-coding RNAs in transcriptional studies of corals should be informative about miRNA-affected processes and pathways. PMID:24658574
The Pekin duck programmed death-ligand 1: cDNA cloning, genomic structure, molecular characterization and mRNA expression analysis.

PubMed

Yao, Q; Fischer, K P; Tyrrell, D L; Gutfreund, K S

2015-04-01

Programmed death ligand-1 (PD-L1) plays an important role in the attenuation of adaptive immune responses in higher vertebrates. Here, we describe the identification of the Pekin duck PD-L1 orthologue (duPD-L1) and its gene structure. The duPD-L1 cDNA encodes a 311-amino acid protein that has an amino acid identity of 78% and 42% with chicken and human PD-L1, respectively. Mapping of the duPD-L1 cDNA with duck genomic sequences revealed an exonic structure of its coding sequence similar to those of other vertebrates but lacked a noncoding exon 1. Homology modelling of the duPD-L1 extracellular domain was compatible with the tandem IgV-like and IgC-like IgSF domain structure of human PD-L1 (PDB ID: 3BIS). Residues known to be important for receptor binding of human PD-L1 were mostly conserved in duPD-L1 within the N-terminus and the G sheet, and partially conserved within the F sheet but not within sheets C and C'. DuPD-L1 mRNA was constitutively expressed in all tissues examined with highest expression levels in lung and spleen and very low levels of expression in muscle, kidney and brain. Mitogen stimulation of duck peripheral blood mononuclear cells transiently increased duPD-L1 mRNA expression. Our observations demonstrate evolutionary conservation of the exonic structure of its coding sequence, the extracellular domain structure and residues implicated in receptor binding, but the role of the longer cytoplasmic tail in avian PD-L1 proteins remains to be determined. © 2014 John Wiley & Sons Ltd.
Comparison of the complete mitochondrial genome of the stonefly Sweltsa longistyla (Plecoptera: Chloroperlidae) with mitogenomes of three other stoneflies.

PubMed

Chen, Zhi-Teng; Du, Yu-Zhou

2015-03-01

The complete mitochondrial genome of the stonefly, Sweltsa longistyla Wu (Plecoptera: Chloroperlidae), was sequenced in this study. The mitogenome of S. longistyla is 16,151bp and contains 37 genes including 13 protein-coding genes (PCGs), 22 tRNA genes, two rRNA genes, and a large non-coding region. S. longistyla, Pteronarcys princeps Banks, Kamimuria wangi Du and Cryptoperla stilifera Sivec belong to the Plecoptera, and the gene order and orientation of their mitogenomes were similar. The overall AT content for the four stoneflies was below 72%, and the AT content of tRNA genes was above 69%. The four genomes were compact and contained only 65-127bp of non-coding intergenic DNAs. Overlapping nucleotides existed in all four genomes and ranged from 24 (P. princeps) to 178bp (K. wangi). There was a 7-bp motif ('ATGATAA') of overlapping DNA and an 8-bp motif (AAGCCTTA) conserved in three stonefly species (P. princeps, K. wangi and C. stilifera). The control regions of four stoneflies contained a stem-loop structure. Four conserved sequence blocks (CSBs) were present in the A+T-rich regions of all four stoneflies. Copyright © 2014 Elsevier B.V. All rights reserved.
Crystal structures of MW1337R and lin2004: Representatives of a novel protein family that adopt a four-helical bundle fold

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kozbial, Piotr; Xu, Qingping; Chiu, Hsiu-Ju

2009-08-28

To extend the structural coverage of proteins with unknown functions, we targeted a novel protein family (Pfam accession number PF08807, DUF1798) for which we proposed and determined the structures of two representative members. The MW1337R gene of Staphylococcus aureus subsp. aureus Rosenbach (Wood 46) encodes a protein with a molecular weight of 13.8 kDa (residues 1-116) and a calculated isoelectric point of 5.15. The lin2004 gene of the nonspore-forming bacterium Listeria innocua Clip11262 encodes a protein with a molecular weight of 14.6 kDa (residues 1-121) and a calculated isoelectric point of 5.45. MW1337R and lin2004, as well as their homologs,more » which, so far, have been found only in Bacillus, Staphylococcus, Listeria, and related genera (Geobacillus, Exiguobacterium, and Oceanobacillus), have unknown functions and are annotated as hypothetical proteins. The genomic contexts of MW1337R and lin2004 are similar and conserved in related species. In prokaryotic genomes, most often, functionally interacting proteins are coded by genes, which are colocated in conserved operons. Proteins from the same operon as MW1337R and lin2004 either have unknown functions (i.e., belong to DUF1273, Pfam accession number PF06908) or are similar to ypsB from Bacillus subtilis. The function of ypsB is unclear, although it has a strong similarity to the N-terminal region of DivIVA, which was characterized as a bifunctional protein with distinct roles during vegetative growth and sporulation. In addition, members of the DUF1273 family display distant sequence similarity with the DprA/Smf protein, which acts downstream of the DNA uptake machinery, possibly in conjunction with RecA. The RecA activities in Bacillus subtilis are modulated by RecU Holliday-junction resolvase. In all analyzed cases, the gene coding for RecU is in the vicinity of MW1337R, lin2004, or their orthologs, but on a different operon located in the complementary DNA strand. Here, we report the crystal structures of MW1337R and lin2004, which were determined using the semiautomated, high-throughput pipeline of the Joint Center for Structural Genomics (JCSG), part of the National Institute of General Medical Sciences Protein Structure Initiative.« less
Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

PubMed Central

Fauteux, François; Strömvik, Martina V

2009-01-01

Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs. The majority of discovered motifs match experimentally characterized cis-regulatory elements. These results provide a good starting point for further experimental analysis of plant seed-specific promoters and our methodology can be used to unravel more transcriptional regulatory mechanisms in plants and other eukaryotes. PMID:19843335
ASXL gain-of-function truncation mutants: defective and dysregulated forms of a natural ribosomal frameshifting product?

PubMed

Dinan, Adam M; Atkins, John F; Firth, Andrew E

2017-10-16

Programmed ribosomal frameshifting (PRF) is a gene expression mechanism which enables the translation of two N-terminally coincident, C-terminally distinct protein products from a single mRNA. Many viruses utilize PRF to control or regulate gene expression, but very few phylogenetically conserved examples are known in vertebrate genes. Additional sex combs-like (ASXL) genes 1 and 2 encode important epigenetic and transcriptional regulatory proteins that control the expression of homeotic genes during key developmental stages. Here we describe an ~150-codon overlapping ORF (termed TF) in ASXL1 and ASXL2 that, with few exceptions, is conserved throughout vertebrates. Conservation of the TF ORF, strong suppression of synonymous site variation in the overlap region, and the completely conserved presence of an EH[N/S]Y motif (a known binding site for Host Cell Factor-1, HCF-1, an epigenetic regulatory factor), all indicate that TF is a protein-coding sequence. A highly conserved UCC_UUU_CGU sequence (identical to the known site of +1 ribosomal frameshifting for influenza virus PA-X expression) occurs at the 5' end of the region of enhanced synonymous site conservation in ASXL1. Similarly, a highly conserved RG_GUC_UCU sequence (identical to a known site of -2 ribosomal frameshifting for arterivirus nsp2TF expression) occurs at the 5' end of the region of enhanced synonymous site conservation in ASXL2. Due to a lack of appropriate splice forms, or initiation sites, the most plausible mechanism for translation of the ASXL1 and 2 TF regions is ribosomal frameshifting, resulting in a transframe fusion of the N-terminal half of ASXL1 or 2 to the TF product, termed ASXL-TF. Truncation or frameshift mutants of ASXL are linked to myeloid malignancies and genetic diseases, such as Bohring-Opitz syndrome, likely at least in part as a result of gain-of-function or dominant-negative effects. Our hypothesis now indicates that these disease-associated mutant forms represent overexpressed defective versions of ASXL-TF. This article was reviewed by Laurence Hurst and Eugene Koonin.
Antigenic Diversity of the Plasmodium vivax Circumsporozoite Protein in Parasite Isolates of Western Colombia

PubMed Central

Hernández-Martínez, Miguel Ángel; Escalante, Ananías A.; Arévalo-Herrera, Myriam; Herrera, Sócrates

2011-01-01

Circumsporozoite (CS) protein is a malaria antigen involved in sporozoite invasion of hepatocytes, and thus considered to have good vaccine potential. We evaluated the polymorphism of the Plasmodium vivax CS gene in 24 parasite isolates collected from malaria-endemic areas of Colombia. We sequenced 27 alleles, most of which (25/27) corresponded to the VK247 genotype and the remainder to the VK210 type. All VK247 alleles presented a mutation (Gly → Asn) at position 28 in the N-terminal region, whereas the C-terminal presented three insertions: the ANKKAGDAG, which is common in all VK247 isolates; 12 alleles presented the insertion GAGGQAAGGNAANKKAGDAG; and 5 alleles presented the insertion GGNAGGNA. Both repeat regions were polymorphic in gene sequence and size. Sequences coding for B-, T-CD4+, and T-CD8+ cell epitopes were found to be conserved. This study confirms the high polymorphism of the repeat domain and the highly conserved nature of the flanking regions. PMID:21292878

The actin multigene family and livestock speciation using the polymerase chain reaction.

PubMed

Fairbrother, K S; Hopwood, A J; Lockley, A K; Bardsley, R G

1998-01-01

Actins constitute a family of highly-conserved multifunctional intracellular proteins, best known as myofibrillar components in striated muscle fibres. Most vertebrate genomes contain numerous actin genes with high sequence homology in protein coding regions but considerable variability in intron number and sizes. This genetic diversity can be utilised for livestock speciation purposes. The high sequence conservation has enabled a single pair of oligonucleotides to be used to prime the polymerase chain reaction (PCR) with DNA extracted from all animals so far studied. Multiple amplification products were obtained which on gel electrophoresis constituted characteristic species-specific 'fingerprints'. The patterns were reproducible, did not vary between individuals of the same breed or between different breeds within a species, and could be generated even from heat-processed muscle held at 120 degrees C for one hour. Given the capacity of PCR to amplify relatively short sequences in highly-degraded DNA, this approach may be suitable for authentication of processed meat products.
APPRIS: annotation of principal and alternative splice isoforms

PubMed Central

Rodriguez, Jose Manuel; Maietta, Paolo; Ezkurdia, Iakes; Pietrelli, Alessandro; Wesselink, Jan-Jaap; Lopez, Gonzalo; Valencia, Alfonso; Tress, Michael L.

2013-01-01

Here, we present APPRIS (http://appris.bioinfo.cnio.es), a database that houses annotations of human splice isoforms. APPRIS has been designed to provide value to manual annotations of the human genome by adding reliable protein structural and functional data and information from cross-species conservation. The visual representation of the annotations provided by APPRIS for each gene allows annotators and researchers alike to easily identify functional changes brought about by splicing events. In addition to collecting, integrating and analyzing reliable predictions of the effect of splicing events, APPRIS also selects a single reference sequence for each gene, here termed the principal isoform, based on the annotations of structure, function and conservation for each transcript. APPRIS identifies a principal isoform for 85% of the protein-coding genes in the GENCODE 7 release for ENSEMBL. Analysis of the APPRIS data shows that at least 70% of the alternative (non-principal) variants would lose important functional or structural information relative to the principal isoform. PMID:23161672
Genome sequences of a mouse-avirulent and a mouse-virulent strain of Ross River virus.

PubMed

Faragher, S G; Meek, A D; Rice, C M; Dalgarno, L

1988-04-01

The nucleotide sequence of the genomic RNA of a mouse-avirulent strain of Ross River virus, RRV NB5092 (isolated in 1969), has been determined and the corresponding sequence for the prototype mouse-virulent strain, RRV T48 (isolated in 1959), has been completed. The RRV NB5092 genome is approximately 11,674 nucleotides in length, compared with 11,853 nucleotides for RRV T48. RRV NB5092 and RRV T48 have the same genome organization. For both viruses an untranslated region of 80 nucleotides at the 5' end of the genome is followed by a 7440-nucleotide open reading frame which is interrupted after 5586 nucleotides by a single opal termination codon. By homology with other alphaviruses, the 5586-nucleotide open reading frame encodes the nonstructural proteins nsP1, nsP2, and nsP3; a fourth nonstructural protein, nsP4, is produced by read-through of the opal codon. The RRV nonstructural proteins show strong homology with the corresponding proteins of Sindbis virus and Semliki Forest virus in terms of size, net charge, and hydropathy characteristics. However, homology is not uniform between or within the proteins; nsP1, nsP2, and nsP4 contain extended domains which are highly conserved between alphaviruses, while the C-terminal region of nsP3 shows little conservation in sequence or length between alphaviruses. An untranslated "junction" region of 44 nucleotides (for RRV NB5092) or 47 nucleotides (for RRV T48) separates the nonstructural and structural protein coding regions. The structural proteins (capsid-E3-E2-6K-E1) are translated from an open reading frame of 3762 nucleotides which is followed by a 3'-untranslated region of approximately 348 nucleotides (for RRV NB5092) or 524 nucleotides (for RRV T48). Excluding deletions and insertions, the genomes of RRV NB5092 and RRV T48 differ at 284 nucleotides, representing a sequence divergence of 2.38%. Sequence deletions or insertions were found only in the noncoding regions and include a 173-nucleotide deletion in the 3'-untranslated region of RRV NB5092, compared with RRV T48. In the coding regions, most of the nucleotide differences are silent; there are 36 amino acid differences in the nonstructural proteins and 12 in the structural proteins. The distribution of amino acid differences between the two RRV strains correlates with the location of domains which are poorly conserved in sequence between alphaviruses. The possible role of amino acid differences in envelope glycoproteins E1 and E2 in determining the different antigenic and biological properties of RRV NB5092 and RRV T48 is discussed.
Comparative Mitogenomics of the Assassin Bug Genus Peirates (Hemiptera: Reduviidae: Peiratinae) Reveal Conserved Mitochondrial Genome Organization of P. atromaculatus, P. fulvescens and P. turpis

PubMed Central

Zhao, Guangyu; Li, Hu; Zhao, Ping; Cai, Wanzhi

2015-01-01

In this study, we sequenced four new mitochondrial genomes and presented comparative mitogenomic analyses of five species in the genus Peirates (Hemiptera: Reduviidae). Mitochondrial genomes of these five assassin bugs had a typical set of 37 genes and retained the ancestral gene arrangement of insects. The A+T content, AT- and GC-skews were similar to the common base composition biases of insect mtDNA. Genomic size ranges from 15,702 bp to 16,314 bp and most of the size variation was due to length and copy number of the repeat unit in the putative control region. All of the control region sequences included large tandem repeats present in two or more copies. Our result revealed similarity in mitochondrial genomes of P. atromaculatus, P. fulvescens and P. turpis, as well as the highly conserved genomic-level characteristics of these three species, e.g., the same start and stop codons of protein-coding genes, conserved secondary structure of tRNAs, identical location and length of non-coding and overlapping regions, and conservation of structural elements and tandem repeat unit in control region. Phylogenetic analyses also supported a close relationship between P. atromaculatus, P. fulvescens and P. turpis, which might be recently diverged species. The present study indicates that mitochondrial genome has important implications on phylogenetics, population genetics and speciation in the genus Peirates. PMID:25689825
Transimulation - protein biosynthesis web service.

PubMed

Siwiak, Marlena; Zielenkiewicz, Piotr

2013-01-01

Although translation is the key step during gene expression, it remains poorly characterized at the level of individual genes. For this reason, we developed Transimulation - a web service measuring translational activity of genes in three model organisms: Escherichia coli, Saccharomyces cerevisiae and Homo sapiens. The calculations are based on our previous computational model of translation and experimental data sets. Transimulation quantifies mean translation initiation and elongation time (expressed in SI units), and the number of proteins produced per transcript. It also approximates the number of ribosomes that typically occupy a transcript during translation, and simulates their propagation. The simulation of ribosomes' movement is interactive and allows modifying the coding sequence on the fly. It also enables uploading any coding sequence and simulating its translation in one of three model organisms. In such a case, ribosomes propagate according to mean codon elongation times of the host organism, which may prove useful for heterologous expression. Transimulation was used to examine evolutionary conservation of translational parameters of orthologous genes. Transimulation may be accessed at http://nexus.ibb.waw.pl/Transimulation (requires Java version 1.7 or higher). Its manual and source code, distributed under the GPL-2.0 license, is freely available at the website.
RNAcode: Robust discrimination of coding and noncoding regions in comparative sequence data

PubMed Central

Washietl, Stefan; Findeiß, Sven; Müller, Stephan A.; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L.; Stadler, Peter F.; Goldman, Nick

2011-01-01

With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied “out of the box,” without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as “noncoding.” RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode. PMID:21357752
RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data.

PubMed

Washietl, Stefan; Findeiss, Sven; Müller, Stephan A; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L; Stadler, Peter F; Goldman, Nick

2011-04-01

With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied "out of the box," without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as "noncoding." RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.
RNA structural constraints in the evolution of the influenza A virus genome NP segment

PubMed Central

Gultyaev, Alexander P; Tsyganov-Bodounov, Anton; Spronken, Monique IJ; van der Kooij, Sander; Fouchier, Ron AM; Olsthoorn, René CL

2014-01-01

Conserved RNA secondary structures were predicted in the nucleoprotein (NP) segment of the influenza A virus genome using comparative sequence and structure analysis. A number of structural elements exhibiting nucleotide covariations were identified over the whole segment length, including protein-coding regions. Calculations of mutual information values at the paired nucleotide positions demonstrate that these structures impose considerable constraints on the virus genome evolution. Functional importance of a pseudoknot structure, predicted in the NP packaging signal region, was confirmed by plaque assays of the mutant viruses with disrupted structure and those with restored folding using compensatory substitutions. Possible functions of the conserved RNA folding patterns in the influenza A virus genome are discussed. PMID:25180940
Mitochondrial genome of the tomato clownfish Amphiprion frenatus (Pomacentridae, Amphiprioninae).

PubMed

Ye, Le; Hu, Jing; Wu, Kaichang; Wang, Yu; Li, Jianlong

2016-01-01

The complete mitochondrial (mt) genome of the tomato clownfish Amphiprion frenatus was obtained in this study. The circular mtDNA molecule was 16,774 bp in size and the overall nucleotide composition of the H-strand was 29.72% A, 25.81% T, 15.38% G and 29.09% C, with an A + T bias. The complete mitogenome encoded 13 protein-coding genes, 2 rRNAs, 22 tRNAs and a control region (D-loop), with the gene arrangement and translation direction basically identical to other typical vertebrate mitogenomes. The D-loop included termination associated sequence (TAS), central conserved domain (CCD) and conserved sequence block (CSB), and was composed of 6 complete continuity tandem repeat units and an imperfect tandem repeat unit.
CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison

PubMed Central

Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano

2004-01-01

The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features. PMID:15215464
Human mRNA polyadenylate binding protein: evolutionary conservation of a nucleic acid binding motif.

PubMed Central

Grange, T; de Sa, C M; Oddos, J; Pictet, R

1987-01-01

We have isolated a full length cDNA (cDNA) coding for the human poly(A) binding protein. The cDNA derived 73 kd basic translation product has the same Mr, isoelectric point and peptidic map as the poly(A) binding protein. DNA sequence analysis reveals a 70,244 dalton protein. The N terminal part, highly homologous to the yeast poly(A) binding protein, is sufficient for poly(A) binding activity. This domain consists of a four-fold repeated unit of approximately 80 amino acids present in other nucleic acid binding proteins. In the C terminal part there is, as in the yeast protein, a sequence of approximately 150 amino acids, rich in proline, alanine and glutamine which together account for 48% of the residues. A 2,9 kb mRNA corresponding to this cDNA has been detected in several vertebrate cell types and in Drosophila melanogaster at every developmental stage including oogenesis. Images PMID:2885805
Complete Mitochondrial Genome of Eruca sativa Mill. (Garden Rocket)

PubMed Central

Yang, Qing; Chang, Shengxin; Chen, Jianmei; Hu, Maolong; Guan, Rongzhan

2014-01-01

Eruca sativa (Cruciferae family) is an ancient crop of great economic and agronomic importance. Here, the complete mitochondrial genome of Eruca sativa was sequenced and annotated. The circular molecule is 247 696 bp long, with a G+C content of 45.07%, containing 33 protein-coding genes, three rRNA genes, and 18 tRNA genes. The Eruca sativa mitochondrial genome may be divided into six master circles and four subgenomic molecules via three pairwise large repeats, resulting in a more dynamic structure of the Eruca sativa mtDNA compared with other cruciferous mitotypes. Comparison with the Brassica napus MtDNA revealed that most of the genes with known function are conserved between these two mitotypes except for the ccmFN2 and rrn18 genes, and 27 point mutations were scattered in the 14 protein-coding genes. Evolutionary relationships analysis suggested that Eruca sativa is more closely related to the Brassica species and to Raphanus sativus than to Arabidopsis thaliana. PMID:25157569
Complete mitochondrial genome of the brown alga Sargassum fusiforme (Sargassaceae, Phaeophyceae): genome architecture and taxonomic consideration.

PubMed

Liu, Feng; Pang, Shaojun; Luo, Minbo

2016-01-01

Sargassum fusiforme (Harvey) Setchell (=Hizikia fusiformis (Harvey) Okamura) is one of the most important economic seaweeds for mariculture in China. In this study, we present the complete mitochondrial genome of S. fusiforme. The genome is 34,696 bp in length with circular organization, encoding the standard set of three ribosomal RNA genes (rRNA), 25 transfer RNA genes (tRNA), 35 protein-coding genes, and two conserved open reading frames (ORFs). Its total AT content is 62.47%, lower than other brown algae except Pylaiella littoralis. The mitogenome carries 1571 bp of intergenic region constituting 4.53% of the genome, and 13 pairs of overlapping genes with the overlap size from 1 to 90 bp. The phylogenetic analyses based on 35 protein-coding genes reveal that S. fusiforme has a closer evolutionary relationship with Sargassum muticum than Sargassum horneri, indicating Hizikia are not distinct evolutionary entity and should be reduced to synonymy with Sargassum.
Complete mitochondrial genome of yellow meal worm(Tenebrio molitor)

PubMed Central

LIU, Li-Na; WANG, Cheng-Ye

2014-01-01

The yellow meal worm(Tenebrio molitor L.) is an important resource insect typically used as animal feed additive. It is also widely used for biological research. The first complete mitochondrial genome of T. molitor was determined for the first time by long PCR and conserved primer walking approaches. The results showed that the entire mitogenome of T. molitor was 15 785 bp long, with 72.35% A+T content [deposited in GenBank with accession number KF418153]. The gene order and orientation were the same as the most common type suggested as ancestral for insects. Two protein-coding genes used atypical start codons(CTA in ND2 and AAT in COX1), and the remaining 11 protein-coding genes started with a typical insect initiation codon ATN. All tRNAs showed standard clover-leaf structure, except for tRNASer(AGN), which lacked a dihydrouridine(DHU) arm. The newly added T. molitor mitogenome could provide information for future studies on yellow meal worm. PMID:25465087
Complete mitochondrial genome of yellow meal worm (Tenebrio molitor).

PubMed

Liu, Li-Na; Wang, Cheng-Ye

2014-11-18

The yellow meal worm (Tenebrio molitor L.) is an important resource insect typically used as animal feed additive. It is also widely used for biological research. The first complete mitochondrial genome of T. molitor was determined for the first time by long PCR and conserved primer walking approaches. The results showed that the entire mitogenome of T. molitor was 15 785 bp long, with 72.35% A+T content [deposited in GenBank with accession number KF418153]. The gene order and orientation were the same as the most common type suggested as ancestral for insects. Two protein-coding genes used atypical start codons (CTA in ND2 and AAT in COX1), and the remaining 11 protein-coding genes started with a typical insect initiation codon ATN. All tRNAs showed standard clover-leaf structure, except for tRNA(Ser) (AGN), which lacked a dihydrouridine (DHU) arm. The newly added T. molitor mitogenome could provide information for future studies on yellow meal worm.
Comprehensive analysis of single molecule sequencing-derived complete genome and whole transcriptome of Hyposidra talaca nuclear polyhedrosis virus.

PubMed

Nguyen, Thong T; Suryamohan, Kushal; Kuriakose, Boney; Janakiraman, Vasantharajan; Reichelt, Mike; Chaudhuri, Subhra; Guillory, Joseph; Divakaran, Neethu; Rabins, P E; Goel, Ridhi; Deka, Bhabesh; Sarkar, Suman; Ekka, Preety; Tsai, Yu-Chih; Vargas, Derek; Santhosh, Sam; Mohan, Sangeetha; Chin, Chen-Shan; Korlach, Jonas; Thomas, George; Babu, Azariah; Seshagiri, Somasekar

2018-06-12

We sequenced the Hyposidra talaca NPV (HytaNPV) double stranded circular DNA genome using PacBio single molecule sequencing technology. We found that the HytaNPV genome is 139,089 bp long with a GC content of 39.6%. It encodes 141 open reading frames (ORFs) including the 37 baculovirus core genes, 25 genes conserved among lepidopteran baculoviruses, 72 genes known in baculovirus, and 7 genes unique to the HytaNPV genome. It is a group II alphabaculovirus that codes for the F protein and lacks the gp64 gene found in group I alphabaculovirus viruses. Using RNA-seq, we confirmed the expression of the ORFs identified in the HytaNPV genome. Phylogenetic analysis showed HytaNPV to be closest to BusuNPV, SujuNPV and EcobNPV that infect other tea pests, Buzura suppressaria, Sucra jujuba, and Ectropis oblique, respectively. We identified repeat elements and a conserved non-coding baculovirus element in the genome. Analysis of the putative promoter sequences identified motif consistent with the temporal expression of the genes observed in the RNA-seq data.
Characteristics and phylogenetic analysis of the complete mitochondrial genome of Cheilodactylus quadricornis (Perciformes, Cheilodactylidae).

PubMed

Wang, Aishuai; Sun, Yuena; Wu, Changwen

2016-11-01

The complete mitochondrial genome of the Cheilodactylus quadricornis was firstly determined in the present study. The mitochondrial genome of C. quadricornis is 16 521 nucleotides, comprising 13 protein-coding genes and 2 ribosomal RNA genes, 22 tRNA genes and 2 main non-coding regions (the control region and the origin of the light-strand replication). The overall base composition was T, 26.3%; C, 29.6%; A, 27.8% and G, 16.3%. The gene arrangement, base composition, and tRNA structures of the complete mitochondrial genome of C. quadricornis is similar to other teleosts. Only two central conserved sequence blocks (CSB-2 and CSB-3) were identified in the control region. In addition, the conserved motif 5'-GCCGG-3' was identified in the origin of light-strand replication of C. quadricornis. The complete mitochondrial genome of C. quadricornis was used to construct phylogenetic tree, which shows that C. quadricornis and C. variegatus clustered in a clade and formed a sister relationship. This mitogenome sequence data would play an important role in population genetics and phylogenetic analysis of the Cheilodactylidae.
Unusual Intron Conservation near Tissue-Regulated Exons Found by Splicing Microarrays

PubMed Central

Sugnet, Charles W; Srinivasan, Karpagam; Clark, Tyson A; O'Brien, Georgeann; Cline, Melissa S; Wang, Hui; Williams, Alan; Kulp, David; Blume, John E; Haussler, David; Ares, Manuel

2006-01-01

Alternative splicing contributes to both gene regulation and protein diversity. To discover broad relationships between regulation of alternative splicing and sequence conservation, we applied a systems approach, using oligonucleotide microarrays designed to capture splicing information across the mouse genome. In a set of 22 adult tissues, we observe differential expression of RNA containing at least two alternative splice junctions for about 40% of the 6,216 alternative events we could detect. Statistical comparisons identify 171 cassette exons whose inclusion or skipping is different in brain relative to other tissues and another 28 exons whose splicing is different in muscle. A subset of these exons is associated with unusual blocks of intron sequence whose conservation in vertebrates rivals that of protein-coding exons. By focusing on sets of exons with similar regulatory patterns, we have identified new sequence motifs implicated in brain and muscle splicing regulation. Of note is a motif that is strikingly similar to the branchpoint consensus but is located downstream of the 5′ splice site of exons included in muscle. Analysis of three paralogous membrane-associated guanylate kinase genes reveals that each contains a paralogous tissue-regulated exon with a similar tissue inclusion pattern. While the intron sequences flanking these exons remain highly conserved among mammalian orthologs, the paralogous flanking intron sequences have diverged considerably, suggesting unusually complex evolution of the regulation of alternative splicing in multigene families. PMID:16424921
COME: a robust coding potential calculation tool for lncRNA identification and characterization based on multiple features.

PubMed

Hu, Long; Xu, Zhiyu; Hu, Boqin; Lu, Zhi John

2017-01-09

Recent genomic studies suggest that novel long non-coding RNAs (lncRNAs) are specifically expressed and far outnumber annotated lncRNA sequences. To identify and characterize novel lncRNAs in RNA sequencing data from new samples, we have developed COME, a coding potential calculation tool based on multiple features. It integrates multiple sequence-derived and experiment-based features using a decompose-compose method, which makes it more accurate and robust than other well-known tools. We also showed that COME was able to substantially improve the consistency of predication results from other coding potential calculators. Moreover, COME annotates and characterizes each predicted lncRNA transcript with multiple lines of supporting evidence, which are not provided by other tools. Remarkably, we found that one subgroup of lncRNAs classified by such supporting features (i.e. conserved local RNA secondary structure) was highly enriched in a well-validated database (lncRNAdb). We further found that the conserved structural domains on lncRNAs had better chance than other RNA regions to interact with RNA binding proteins, based on the recent eCLIP-seq data in human, indicating their potential regulatory roles. Overall, we present COME as an accurate, robust and multiple-feature supported method for the identification and characterization of novel lncRNAs. The software implementation is available at https://github.com/lulab/COME. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
How the Sequence of a Gene Specifies Structural Symmetry in Proteins

PubMed Central

Shen, Xiaojuan; Huang, Tongcheng; Wang, Guanyu; Li, Guanglin

2015-01-01

Internal symmetry is commonly observed in the majority of fundamental protein folds. Meanwhile, sufficient evidence suggests that nascent polypeptide chains of proteins have the potential to start the co-translational folding process and this process allows mRNA to contain additional information on protein structure. In this paper, we study the relationship between gene sequences and protein structures from the viewpoint of symmetry to explore how gene sequences code for structural symmetry in proteins. We found that, for a set of two-fold symmetric proteins from left-handed beta-helix fold, intragenic symmetry always exists in their corresponding gene sequences. Meanwhile, codon usage bias and local mRNA structure might be involved in modulating translation speed for the formation of structural symmetry: a major decrease of local codon usage bias in the middle of the codon sequence can be identified as a common feature; and major or consecutive decreases in local mRNA folding energy near the boundaries of the symmetric substructures can also be observed. The results suggest that gene duplication and fusion may be an evolutionarily conserved process for this protein fold. In addition, the usage of rare codons and the formation of higher order of secondary structure near the boundaries of symmetric substructures might have coevolved as conserved mechanisms to slow down translation elongation and to facilitate effective folding of symmetric substructures. These findings provide valuable insights into our understanding of the mechanisms of translation and its evolution, as well as the design of proteins via symmetric modules. PMID:26641668

Comparative genomics of 9 novel Paenibacillus larvae bacteriophages

PubMed Central

Stamereilers, Casey; LeBlanc, Lucy; Yost, Diane; Amy, Penny S.; Tsourkas, Philippos K.

2016-01-01

ABSTRACT American Foulbrood Disease, caused by the bacterium Paenibacillus larvae, is one of the most destructive diseases of the honeybee, Apis mellifera. Our group recently published the sequences of 9 new phages with the ability to infect and lyse P. larvae. Here, we characterize the genomes of these P. larvae phages, compare them to each other and to other sequenced P. larvae phages, and putatively identify protein function. The phage genomes are 38–45 kb in size and contain 68–86 genes, most of which appear to be unique to P. larvae phages. We classify P. larvae phages into 2 main clusters and one singleton based on nucleotide sequence identity. Three of the new phages show sequence similarity to other sequenced P. larvae phages, while the remaining 6 do not. We identified functions for roughly half of the P. larvae phage proteins, including structural, assembly, host lysis, DNA replication/metabolism, regulatory, and host-related functions. Structural and assembly proteins are highly conserved among our phages and are located at the start of the genome. DNA replication/metabolism, regulatory, and host-related proteins are located in the middle and end of the genome, and are not conserved, with many of these genes found in some of our phages but not others. All nine phages code for a conserved N-acetylmuramoyl-L-alanine amidase. Comparative analysis showed the phages use the “cohesive ends with 3′ overhang” DNA packaging strategy. This work is the first in-depth study of P. larvae phage genomics, and serves as a marker for future work in this area. PMID:27738559
Missense mutations in SURF1 associated with deficient cytochrome c oxidase assembly in Leigh syndrome patients.

PubMed

Poyau, A; Buchet, K; Bouzidi, M F; Zabot, M T; Echenne, B; Yao, J; Shoubridge, E A; Godinot, C

2000-02-01

We have studied the fibroblasts of three patients suffering from Leigh syndrome associated with cytochrome c oxidase deficiency (LS-COX-). Their mitochondrial DNA was functional and all nuclear COX subunits had a normal sequence. The expression of transcripts encoding mitochondrial and nuclear COX subunits was normal or slightly increased. Similarly, the OXA1 transcript coding for a protein involved in COX assembly was increased. However, several COX-protein subunits were severely depressed, indicating deficient COX assembly. Surf1, a factor involved in COX biogenesis, was recently reported as mutated in LS-COX- patients, all mutations predicting a truncated protein. Sequence analysis of SURF1 gene in our three patients revealed seven heterozygous mutations, six of which were new : an insertion, a nonsense mutation, a splicing mutation of intron 7 in addition to three missense mutations. The mutation G385 A (Gly124-->Glu) changes a Gly that is strictly conserved in Surfl homologs of 12 species. The substitution G618 C (Asp202-->His), changing an Asp that is conserved only in mammals, appears to be a polymorphism. The mutation T751 C changes Ile246 to Thr, a position at which a hydrophobic amino acid is conserved in all eukaryotic and some bacterial species. Replacing Ile246 by Thr disrupts a predicted beta sheet structure present in all higher eukaryotes. COX activity could be restored in fibroblasts of the three patients by complementation with a retroviral vector containing normal SURF1 cDNA. These mutations identify domains essential to Surf1 protein structure and/or function.
A Conserved Acidic Motif in the N-Terminal Domain of Nitrate Reductase Is Necessary for the Inactivation of the Enzyme in the Dark by Phosphorylation and 14-3-3 Binding1

PubMed Central

Pigaglio, Emmanuelle; Durand, Nathalie; Meyer, Christian

1999-01-01

It has previously been shown that the N-terminal domain of tobacco (Nicotiana tabacum) nitrate reductase (NR) is involved in the inactivation of the enzyme by phosphorylation, which occurs in the dark (L. Nussaume, M. Vincentz, C. Meyer, J.P. Boutin, and M. Caboche [1995] Plant Cell 7: 611–621). The activity of a mutant NR protein lacking this N-terminal domain was no longer regulated by light-dark transitions. In this study smaller deletions were performed in the N-terminal domain of tobacco NR that removed protein motifs conserved among higher plant NRs. The resulting truncated NR-coding sequences were then fused to the cauliflower mosaic virus 35S RNA promoter and introduced in NR-deficient mutants of the closely related species Nicotiana plumbaginifolia. We found that the deletion of a conserved stretch of acidic residues led to an active NR protein that was more thermosensitive than the wild-type enzyme, but it was relatively insensitive to the inactivation by phosphorylation in the dark. Therefore, the removal of this acidic stretch seems to have the same effects on NR activation state as the deletion of the N-terminal domain. A hypothetical explanation for these observations is that a specific factor that impedes inactivation remains bound to the truncated enzyme. A synthetic peptide derived from this acidic protein motif was also found to be a good substrate for casein kinase II. PMID:9880364
Translational initiation in Leishmania tarentolae and Phytomonas serpens (Kinetoplastida) is strongly influenced by pre-ATG triplet and its 5' sequence context.

PubMed

Lukes, Julius; Paris, Zdenek; Regmi, Sandesh; Breitling, Reinhard; Mureev, Sergey; Kushnir, Susanna; Pyatkov, Konstantin; Jirků, Milan; Alexandrov, Kirill A

2006-08-01

To investigate the influence of sequence context of translation initiation codon on translation efficiency in Kinetoplastida, we constructed a library of expression plasmids randomized in the three nucleotides prefacing ATG of a reporter gene encoding enhanced green fluorescent protein (EGFP). All 64 possible combinations of pre-ATG triplets were individually stably integrated into the rDNA locus of Leishmania tarentolae and the resulting cell lines were assessed for EGFP expression. The expression levels were quantified directly by measuring the fluorescence of EGFP protein in living cells and confirmed by Western blotting. We observed a strong influence of the pre-ATG triplet on the level of protein expression over a 20-fold range. To understand the degree of evolutionary conservation of the observed effect, we transformed Phytomonas serpens, a trypanosomatid parasite of plants, with a subset of the constructs. The pattern of translational efficiency mediated by individual pre-ATG triplets in this species was similar to that observed in L. tarentolae. However, the pattern of translational efficiency of two other proteins (red fluorescent protein and tetracycline repressor) containing selected pre-ATG triplets did not correlate with either EGFP or each other. Thus, we conclude that a conserved mechanism of translation initiation site selection exists in kinetoplastids that is strongly influenced not only by the pre-ATG sequences but also by the coding region of the gene.
UniDrug-target: a computational tool to identify unique drug targets in pathogenic bacteria.

PubMed

Chanumolu, Sree Krishna; Rout, Chittaranjan; Chauhan, Rajinder S

2012-01-01

Targeting conserved proteins of bacteria through antibacterial medications has resulted in both the development of resistant strains and changes to human health by destroying beneficial microbes which eventually become breeding grounds for the evolution of resistances. Despite the availability of more than 800 genomes sequences, 430 pathways, 4743 enzymes, 9257 metabolic reactions and protein (three-dimensional) 3D structures in bacteria, no pathogen-specific computational drug target identification tool has been developed. A web server, UniDrug-Target, which combines bacterial biological information and computational methods to stringently identify pathogen-specific proteins as drug targets, has been designed. Besides predicting pathogen-specific proteins essentiality, chokepoint property, etc., three new algorithms were developed and implemented by using protein sequences, domains, structures, and metabolic reactions for construction of partial metabolic networks (PMNs), determination of conservation in critical residues, and variation analysis of residues forming similar cavities in proteins sequences. First, PMNs are constructed to determine the extent of disturbances in metabolite production by targeting a protein as drug target. Conservation of pathogen-specific protein's critical residues involved in cavity formation and biological function determined at domain-level with low-matching sequences. Last, variation analysis of residues forming similar cavities in proteins sequences from pathogenic versus non-pathogenic bacteria and humans is performed. The server is capable of predicting drug targets for any sequenced pathogenic bacteria having fasta sequences and annotated information. The utility of UniDrug-Target server was demonstrated for Mycobacterium tuberculosis (H37Rv). The UniDrug-Target identified 265 mycobacteria pathogen-specific proteins, including 17 essential proteins which can be potential drug targets. UniDrug-Target is expected to accelerate pathogen-specific drug targets identification which will increase their success and durability as drugs developed against them have less chance to develop resistances and adverse impact on environment. The server is freely available at http://117.211.115.67/UDT/main.html. The standalone application (source codes) is available at http://www.bioinformatics.org/ftp/pub/bioinfojuit/UDT.rar.
Mitogenome Sequencing in the Genus Camelus Reveals Evidence for Purifying Selection and Long-term Divergence between Wild and Domestic Bactrian Camels.

PubMed

Mohandesan, Elmira; Fitak, Robert R; Corander, Jukka; Yadamsuren, Adiya; Chuluunbat, Battsetseg; Abdelhadi, Omer; Raziq, Abdul; Nagy, Peter; Stalder, Gabrielle; Walzer, Chris; Faye, Bernard; Burger, Pamela A

2017-08-30

The genus Camelus is an interesting model to study adaptive evolution in the mitochondrial genome, as the three extant Old World camel species inhabit hot and low-altitude as well as cold and high-altitude deserts. We sequenced 24 camel mitogenomes and combined them with three previously published sequences to study the role of natural selection under different environmental pressure, and to advance our understanding of the evolutionary history of the genus Camelus. We confirmed the heterogeneity of divergence across different components of the electron transport system. Lineage-specific analysis of mitochondrial protein evolution revealed a significant effect of purifying selection in the concatenated protein-coding genes in domestic Bactrian camels. The estimated dN/dS < 1 in the concatenated protein-coding genes suggested purifying selection as driving force for shaping mitogenome diversity in camels. Additional analyses of the functional divergence in amino acid changes between species-specific lineages indicated fixed substitutions in various genes, with radical effects on the physicochemical properties of the protein products. The evolutionary time estimates revealed a divergence between domestic and wild Bactrian camels around 1.1 [0.58-1.8] million years ago (mya). This has major implications for the conservation and management of the critically endangered wild species, Camelus ferus.
Sequence similarity is more relevant than species specificity in probabilistic backtranslation.

PubMed

Ferro, Alfredo; Giugno, Rosalba; Pigola, Giuseppe; Pulvirenti, Alfredo; Di Pietro, Cinzia; Purrello, Michele; Ragusa, Marco

2007-02-21

Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species. This paper describes EasyBack, a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision. The prediction quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically.
Variation in conserved non-coding sequences on chromosome 5q andsusceptibility to asthma and atopy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Donfack, Joseph; Schneider, Daniel H.; Tan, Zheng

2005-09-10

Background: Evolutionarily conserved sequences likely havebiological function. Methods: To determine whether variation in conservedsequences in non-coding DNA contributes to risk for human disease, westudied six conserved non-coding elements in the Th2 cytokine cluster onhuman chromosome 5q31 in a large Hutterite pedigree and in samples ofoutbred European American and African American asthma cases and controls.Results: Among six conserved non-coding elements (>100 bp,>70percent identity; human-mouse comparison), we identified one singlenucleotide polymorphism (SNP) in each of two conserved elements and sixSNPs in the flanking regions of three conserved elements. We genotypedour samples for four of these SNPs and an additional three SNPs eachmore » inthe IL13 and IL4 genes. While there was only modest evidence forassociation with single SNPs in the Hutterite and European Americansamples (P<0.05), there were highly significant associations inEuropean Americans between asthma and haplotypes comprised of SNPs in theIL4 gene (P<0.001), including a SNP in a conserved non-codingelement. Furthermore, variation in the IL13 gene was strongly associatedwith total IgE (P = 0.00022) and allergic sensitization to mold allergens(P = 0.00076) in the Hutterites, and more modestly associated withsensitization to molds in the European Americans and African Americans (P<0.01). Conclusion: These results indicate that there is overalllittle variation in the conserved non-coding elements on 5q31, butvariation in IL4 and IL13, including possibly one SNP in a conservedelement, influence asthma and atopic phenotypes in diversepopulations.« less
ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos.

PubMed

Roca, Alberto I

2014-01-01

The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.
SERPINA2 Is a Novel Gene with a Divergent Function from SERPINA1

PubMed Central

Martins, Manuella; Figueiredo, Joana; Silva, Diana Isabel; Castro, Patrícia; Morales-Hojas, Ramiro; Simões-Correia, Joana; Seixas, Susana

2013-01-01

Serine protease inhibitors (SERPINs) are a superfamily of highly conserved proteins that play a key role in controlling the activity of proteases in diverse biological processes. The SERPIN cluster located at the 14q32.1 region includes the gene coding for SERPINA1, and a highly homologous sequence, SERPINA2, which was originally thought to be a pseudogene. We have previously shown that SERPINA2 is expressed in different tissues, namely leukocytes and testes, suggesting that it is a functional SERPIN. To investigate the function of SERPINA2, we used HeLa cells stably transduced with the different variants of SERPINA2 and SERPINA1 (M1, S and Z) and leukocytes as the in vivo model. We identified SERPINA2 as a 52 kDa intracellular glycoprotein, which is localized at the endoplasmic reticulum (ER), independently of the variant analyzed. SERPINA2 is not significantly regulated by proteasome, proposing that ER localization is not due to misfolding. Specific features of SERPINA2 include the absence of insoluble aggregates and the insignificant response to cell stress, suggesting that it is a non-polymerogenic protein with divergent activity of SERPINA1. Using phylogenetic analysis, we propose an origin of SERPINA2 in the crown of primates, and we unveiled the overall conservation of SERPINA2 and A1. Nonetheless, few SERPINA2 residues seem to have evolved faster, contributing to the emergence of a new advantageous function, possibly as a chymotrypsin-like SERPIN. Herein, we present evidences that SERPINA2 is an active gene, coding for an ER-resident protein, which may act as substrate or adjuvant of ER-chaperones. PMID:23826168
Structural Isosteres of Phosphate Groups in the Protein Data Bank.

PubMed

Zhang, Yuezhou; Borrel, Alexandre; Ghemtio, Leo; Regad, Leslie; Boije Af Gennäs, Gustav; Camproux, Anne-Claude; Yli-Kauhaluoma, Jari; Xhaard, Henri

2017-03-27

We developed a computational workflow to mine the Protein Data Bank for isosteric replacements that exist in different binding site environments but have not necessarily been identified and exploited in compound design. Taking phosphate groups as examples, the workflow was used to construct 157 data sets, each composed of a reference protein complexed with AMP, ADP, ATP, or pyrophosphate as well other ligands. Phosphate binding sites appear to have a high hydration content and large size, resulting in U-shaped bioactive conformations recurrently found across unrelated protein families. A total of 16 413 replacements were extracted, filtered for a significant structural overlap on phosphate groups, and sorted according to their SMILES codes. In addition to the classical isosteres of phosphate, such as carboxylate, sulfone, or sulfonamide, unexpected replacements that do not conserve charge or polarity, such as aryl, aliphatic, or positively charged groups, were found.
Identification and substrate prediction of new Fragaria x ananassa aquaporins and expression in different tissues and during strawberry fruit development.

PubMed

Merlaen, Britt; De Keyser, Ellen; Van Labeke, Marie-Christine

2018-01-01

The newly identified aquaporin coding sequences presented here pave the way for further insights into the plant-water relations in the commercial strawberry ( Fragaria x ananassa ). Aquaporins are water channel proteins that allow water to cross (intra)cellular membranes. In Fragaria x ananassa , few of them have been identified hitherto, hampering the exploration of the water transport regulation at cellular level. Here, we present new aquaporin coding sequences belonging to different subclasses: plasma membrane intrinsic proteins subtype 1 and subtype 2 (PIP1 and PIP2) and tonoplast intrinsic proteins (TIP). The classification is based on phylogenetic analysis and is confirmed by the presence of conserved residues. Substrate-specific signature sequences (SSSSs) and specificity-determining positions (SDPs) predict the substrate specificity of each new aquaporin. Expression profiling in leaves, petioles and developing fruits reveals distinct patterns, even within the same (sub)class. Expression profiles range from leaf-specific expression over constitutive expression to fruit-specific expression. Both upregulation and downregulation during fruit ripening occur. Substrate specificity and expression profiles suggest that functional specialization exists among aquaporins belonging to a different but also to the same (sub)class.
Comparison of two computer codes for crack growth analysis: NASCRAC Versus NASA/FLAGRO

NASA Technical Reports Server (NTRS)

Stallworth, R.; Meyers, C. A.; Stinson, H. C.

1989-01-01

Results are presented from the comparison study of two computer codes for crack growth analysis - NASCRAC and NASA/FLAGRO. The two computer codes gave compatible conservative results when the part through crack analysis solutions were analyzed versus experimental test data. Results showed good correlation between the codes for the through crack at a lug solution. For the through crack at a lug solution, NASA/FLAGRO gave the most conservative results.
Complete chloroplast genome sequence of MD-2 pineapple and its comparative analysis among nine other plants from the subclass Commelinidae.

PubMed

Redwan, R M; Saidin, A; Kumar, S V

2015-08-12

Pineapple (Ananas comosus var. comosus) is known as the king of fruits for its crown and is the third most important tropical fruit after banana and citrus. The plant, which is indigenous to South America, is the most important species in the Bromeliaceae family and is largely traded for fresh fruit consumption. Here, we report the complete chloroplast sequence of the MD-2 pineapple that was sequenced using the PacBio sequencing technology. In this study, the high error rate of PacBio long sequence reads of A. comosus's total genomic DNA were improved by leveraging on the high accuracy but short Illumina reads for error-correction via the latest error correction module from Novocraft. Error corrected long PacBio reads were assembled by using a single tool to produce a contig representing the pineapple chloroplast genome. The genome of 159,636 bp in length is featured with the conserved quadripartite structure of chloroplast containing a large single copy region (LSC) with a size of 87,482 bp, a small single copy region (SSC) with a size of 18,622 bp and two inverted repeat regions (IRA and IRB) each with the size of 26,766 bp. Overall, the genome contained 117 unique coding regions and 30 were repeated in the IR region with its genes contents, structure and arrangement similar to its sister taxon, Typha latifolia. A total of 35 repeats structure were detected in both the coding and non-coding regions with a majority being tandem repeats. In addition, 205 SSRs were detected in the genome with six protein-coding genes contained more than two SSRs. Comparative chloroplast genomes from the subclass Commelinidae revealed a conservative protein coding gene albeit located in a highly divergence region. Analysis of selection pressure on protein-coding genes using Ka/Ks ratio showed significant positive selection exerted on the rps7 gene of the pineapple chloroplast with P less than 0.05. Phylogenetic analysis confirmed the recent taxonomical relation among the member of commelinids which support the monophyly relationship between Arecales and Dasypogonaceae and between Zingiberales to the Poales, which includes the A. comosus. The complete sequence of the chloroplast of pineapple provides insights to the divergence of genic chloroplast sequences from the members of the subclass Commelinidae. The complete pineapple chloroplast will serve as a reference for in-depth taxonomical studies in the Bromeliaceae family when more species under the family are sequenced in the future. The genetic sequence information will also make feasible other molecular applications of the pineapple chloroplast for plant genetic improvement.
Infection of capilloviruses requires subgenomic RNAs whose transcription is controlled by promoter-like sequences conserved among flexiviruses.

PubMed

Komatsu, Ken; Hirata, Hisae; Fukagawa, Takako; Yamaji, Yasuyuki; Okano, Yukari; Ishikawa, Kazuya; Adachi, Tatsushi; Maejima, Kensaku; Hashimoto, Masayoshi; Namba, Shigetou

2012-07-01

The first open-reading frame (ORF) of apple stem grooving virus (ASGV), of the genus Capillovirus, encodes an apparently chimeric polyprotein containing conserved regions for replicase (Rep) and coat protein (CP). However, our previous study revealed that ASGV mutants with distinct and discontinuous Rep- and CP-coding regions successfully infect plants, indicating that CP expressed via a subgenomic RNA (sgRNA) is sufficient for viability of the virus. Here we identified a transcription start site of the CP sgRNA and revealed that CP translated from the sgRNA is essential for ASGV infection. We mapped the transcription start sites of both the CP and the movement protein (MP) sgRNAs of ASGV and found a hexanucleotide motif, UUAGGU, conserved upstream from both sgRNA transcription start sites. Mutational analysis of the putative CP initiation codon and of the UUAGGU sequence upstream from the transcription start site of CP sgRNA demonstrated their importance for ASGV accumulation. Our results also demonstrated that potato virus T (PVT), an unassigned species closely related to ASGV, produces two sgRNAs putatively deployed for the CP and MP expression and that the same hexanucleotide motif as found in ASGV is located upstream from the transcription start sites of both sgRNAs. This motif, which constituted putative core elements of the sgRNA promoter, is broadly conserved among viruses in the families Alphaflexiviridae and Betaflexiviridae, suggesting that the gene expression strategy of the viruses in both families has been conserved throughout evolution. Copyright © 2012 Elsevier B.V. All rights reserved.
Massively Convergent Evolution for Ribosomal Protein Gene Content in Plastid and Mitochondrial Genomes

PubMed Central

Maier, Uwe-G; Zauner, Stefan; Woehle, Christian; Bolte, Kathrin; Hempel, Franziska; Allen, John F.; Martin, William F.

2013-01-01

Plastid and mitochondrial genomes have undergone parallel evolution to encode the same functional set of genes. These encode conserved protein components of the electron transport chain in their respective bioenergetic membranes and genes for the ribosomes that express them. This highly convergent aspect of organelle genome evolution is partly explained by the redox regulation hypothesis, which predicts a separate plastid or mitochondrial location for genes encoding bioenergetic membrane proteins of either photosynthesis or respiration. Here we show that convergence in organelle genome evolution is far stronger than previously recognized, because the same set of genes for ribosomal proteins is independently retained by both plastid and mitochondrial genomes. A hitherto unrecognized selective pressure retains genes for the same ribosomal proteins in both organelles. On the Escherichia coli ribosome assembly map, the retained proteins are implicated in 30S and 50S ribosomal subunit assembly and initial rRNA binding. We suggest that ribosomal assembly imposes functional constraints that govern the retention of ribosomal protein coding genes in organelles. These constraints are subordinate to redox regulation for electron transport chain components, which anchor the ribosome to the organelle genome in the first place. As organelle genomes undergo reduction, the rRNAs also become smaller. Below size thresholds of approximately 1,300 nucleotides (16S rRNA) and 2,100 nucleotides (26S rRNA), all ribosomal protein coding genes are lost from organelles, while electron transport chain components remain organelle encoded as long as the organelles use redox chemistry to generate a proton motive force. PMID:24259312
Conserved syntenic clusters of protein coding genes are missing in birds.

PubMed

Lovell, Peter V; Wirthlin, Morgan; Wilhelm, Larry; Minx, Patrick; Lazar, Nathan H; Carbone, Lucia; Warren, Wesley C; Mello, Claudio V

2014-01-01

Birds are one of the most highly successful and diverse groups of vertebrates, having evolved a number of distinct characteristics, including feathers and wings, a sturdy lightweight skeleton and unique respiratory and urinary/excretion systems. However, the genetic basis of these traits is poorly understood. Using comparative genomics based on extensive searches of 60 avian genomes, we have found that birds lack approximately 274 protein coding genes that are present in the genomes of most vertebrate lineages and are for the most part organized in conserved syntenic clusters in non-avian sauropsids and in humans. These genes are located in regions associated with chromosomal rearrangements, and are largely present in crocodiles, suggesting that their loss occurred subsequent to the split of dinosaurs/birds from crocodilians. Many of these genes are associated with lethality in rodents, human genetic disorders, or biological functions targeting various tissues. Functional enrichment analysis combined with orthogroup analysis and paralog searches revealed enrichments that were shared by non-avian species, present only in birds, or shared between all species. Together these results provide a clearer definition of the genetic background of extant birds, extend the findings of previous studies on missing avian genes, and provide clues about molecular events that shaped avian evolution. They also have implications for fields that largely benefit from avian studies, including development, immune system, oncogenesis, and brain function and cognition. With regards to the missing genes, birds can be considered ‘natural knockouts’ that may become invaluable model organisms for several human diseases.
ICAM-1-related long non-coding RNA: promoter analysis and expression in human retinal endothelial cells.

PubMed

Lumsden, Amanda L; Ma, Yuefang; Ashander, Liam M; Stempel, Andrew J; Keating, Damien J; Smith, Justine R; Appukuttan, Binoy

2018-05-09

Regulation of intercellular adhesion molecule (ICAM)-1 in retinal endothelial cells is a promising druggable target for retinal vascular diseases. The ICAM-1-related (ICR) long non-coding RNA stabilizes ICAM-1 transcript, increasing protein expression. However, studies of ICR involvement in disease have been limited as the promoter is uncharacterized. To address this issue, we undertook a comprehensive in silico analysis of the human ICR gene promoter region. We used genomic evolutionary rate profiling to identify a 115 base pair (bp) sequence within 500 bp upstream of the transcription start site of the annotated human ICR gene that was conserved across 25 eutherian genomes. A second constrained sequence upstream of the orthologous mouse gene (68 bp; conserved across 27 Eutherian genomes including human) was also discovered. Searching these elements identified 33 matrices predictive of binding sites for transcription factors known to be responsive to a broad range of pathological stimuli, including hypoxia, and metabolic and inflammatory proteins. Five phenotype-associated single nucleotide polymorphisms (SNPs) in the immediate vicinity of these elements included four SNPs (i.e. rs2569693, rs281439, rs281440 and rs11575074) predicted to impact binding motifs of transcription factors, and thus the expression of ICR and ICAM-1 genes, with potential to influence disease susceptibility. We verified that human retinal endothelial cells expressed ICR, and observed induction of expression by tumor necrosis factor-α.
Wheat CBF gene family: identification of polymorphisms in the CBF coding sequence.

PubMed

Mohseni, Sara; Che, Hua; Djillali, Zakia; Dumont, Estelle; Nankeu, Joseph; Danyluk, Jean

2012-12-01

Expression of cold-regulated genes needed for protection against freezing stress is mediated, in part, by the CBF transcription factor family. Previous studies with temperate cereals suggested that the CBF gene family in wheat was large, and that CBF genes were at the base of an important low temperature tolerance trait. Therefore, the goal of our study was to identify the CBF repertoire in the freezing-tolerant hexaploid wheat cultivar Norstar, and then to examine if the coding region of CBF genes in two spring cultivars contain polymorphisms that could affect the protein sequence and structure. Our analyses reveal that hexaploid wheat contains a complex CBF family consisting of at least 65 CBF genes of which 60 are known to be expressed in the cultivar Norstar. They represent 27 paralogous genes with 1-3 homeologous copies for the A, B, and D genomes. The cultivar Norstar contains two pseudogenes and at least 24 additional proteins having sequences and (or) structures that deviate from the consensus in the conserved AP2 DNA-binding and (or) C-terminal activation-domains. This suggests that in cultivars such as Norstar, low temperature tolerance may be increased through breeding of additional optimal alleles. The examination of the CBF repertoire present in the two spring cultivars, Chinese Spring and Manitou, reveals that they have additional polymorphisms affecting conserved positions in these domains. Understanding the effects of these polymorphisms will provide additional information for the selection of optimum CBF alleles in Triticeae breeding programs.
Probing the Boundaries of Orthology: The Unanticipated Rapid Evolution of Drosophila centrosomin

PubMed Central

Eisman, Robert C.; Kaufman, Thomas C.

2013-01-01

The rapid evolution of essential developmental genes and their protein products is both intriguing and problematic. The rapid evolution of gene products with simple protein folds and a lack of well-characterized functional domains typically result in a low discovery rate of orthologous genes. Additionally, in the absence of orthologs it is difficult to study the processes and mechanisms underlying rapid evolution. In this study, we have investigated the rapid evolution of centrosomin (cnn), an essential gene encoding centrosomal protein isoforms required during syncytial development in Drosophila melanogaster. Until recently the rapid divergence of cnn made identification of orthologs difficult and questionable because Cnn violates many of the assumptions underlying models for protein evolution. To overcome these limitations, we have identified a group of insect orthologs and present conserved features likely to be required for the functions attributed to cnn in D. melanogaster. We also show that the rapid divergence of Cnn isoforms is apparently due to frequent coding sequence indels and an accelerated rate of intronic additions and eliminations. These changes appear to be buffered by multi-exon and multi-reading frame maximum potential ORFs, simple protein folds, and the splicing machinery. These buffering features also occur in other genes in Drosophila and may help prevent potentially deleterious mutations due to indels in genes with large coding exons and exon-dense regions separated by small introns. This work promises to be useful for future investigations of cnn and potentially other rapidly evolving genes and proteins. PMID:23749319

Long-Range Control of Gene Expression: Emerging Mechanisms and Disruption in Disease

PubMed Central

Kleinjan, Dirk A.; van Heyningen, Veronica

2005-01-01

Transcriptional control is a major mechanism for regulating gene expression. The complex machinery required to effect this control is still emerging from functional and evolutionary analysis of genomic architecture. In addition to the promoter, many other regulatory elements are required for spatiotemporally and quantitatively correct gene expression. Enhancer and repressor elements may reside in introns or up- and downstream of the transcription unit. For some genes with highly complex expression patterns—often those that function as key developmental control genes—the cis-regulatory domain can extend long distances outside the transcription unit. Some of the earliest hints of this came from disease-associated chromosomal breaks positioned well outside the relevant gene. With the availability of wide-ranging genome sequence comparisons, strong conservation of many noncoding regions became obvious. Functional studies have shown many of these conserved sites to be transcriptional regulatory elements that sometimes reside inside unrelated neighboring genes. Such sequence-conserved elements generally harbor sites for tissue-specific DNA-binding proteins. Developmentally variable chromatin conformation can control protein access to these sites and can regulate transcription. Disruption of these finely tuned mechanisms can cause disease. Some regulatory element mutations will be associated with phenotypes distinct from any identified for coding-region mutations. PMID:15549674
The identification and functional annotation of RNA structures conserved in vertebrates.

PubMed

Seemann, Stefan E; Mirza, Aashiq H; Hansen, Claus; Bang-Berthelsen, Claus H; Garde, Christian; Christensen-Dalsgaard, Mikkel; Torarinsson, Elfar; Yao, Zizhen; Workman, Christopher T; Pociot, Flemming; Nielsen, Henrik; Tommerup, Niels; Ruzzo, Walter L; Gorodkin, Jan

2017-08-01

Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After careful correction for sequence identity and GC content, we predict ∼516,000 human genomic regions containing CRSs. We find that a substantial fraction of human-mouse CRS regions (1) colocalize consistently with binding sites of the same RNA binding proteins (RBPs) or (2) are transcribed in corresponding tissues. Additionally, a CaptureSeq experiment revealed expression of many of our CRS regions in human fetal brain, including 662 novel ones. For selected human and mouse candidate pairs, qRT-PCR and in vitro RNA structure probing supported both shared expression and shared structure despite low abundance and low sequence identity. About 30,000 CRS regions are located near coding or long noncoding RNA genes or within enhancers. Structured (CRS overlapping) enhancer RNAs and extended 3' ends have significantly increased expression levels over their nonstructured counterparts. Our findings of transcribed uncharacterized regulatory regions that contain CRSs support their RNA-mediated functionality. © 2017 Seemann et al.; Published by Cold Spring Harbor Laboratory Press.
The wheat cytochrome oxidase subunit II gene has an intron insert and three radical amino acid changes relative to maize

PubMed Central

Bonen, Linda; Boer, Poppo H.; Gray, Michael W.

1984-01-01

We have determined the sequence of the wheat mitochondrial gene for cytochrome oxidase subunit II (COII) and find that its derived protein sequence differs from that of maize at only three amino acid positions. Unexpectedly, all three replacements are non-conservative ones. The wheat COII gene has a highly-conserved intron at the same position as in maize, but the wheat intron is 1.5 times longer because of an insert relative to its maize counterpart. Hybridization analysis of mitochondrial DNA from rye, pea, broad bean and cucumber indicates strong sequence conservation of COII coding sequences among all these higher plants. However, only rye and maize mitochondrial DNA show homology with wheat COII intron sequences and rye alone with intron-insert sequences. We find that a sequence identical to the region of the 5' exon corresponding to the transmembrane domain of the COII protein is present at a second genomic location in wheat mitochondria. These variations in COII gene structure and size, as well as the presence of repeated COII sequences, illustrate at the DNA sequence level, factors which contribute to higher plant mitochondrial DNA diversity and complexity. ImagesFig. 3.Fig. 4.Fig. 5. PMID:16453565
Core histone genes of Giardia intestinalis: genomic organization, promoter structure, and expression

PubMed Central

Yee, Janet; Tang, Anita; Lau, Wei-Ling; Ritter, Heather; Delport, Dewald; Page, Melissa; Adam, Rodney D; Müller, Miklós; Wu, Gang

2007-01-01

Background Giardia intestinalis is a protist found in freshwaters worldwide, and is the most common cause of parasitic diarrhea in humans. The phylogenetic position of this parasite is still much debated. Histones are small, highly conserved proteins that associate tightly with DNA to form chromatin within the nucleus. There are two classes of core histone genes in higher eukaryotes: DNA replication-independent histones and DNA replication-dependent ones. Results We identified two copies each of the core histone H2a, H2b and H3 genes, and three copies of the H4 gene, at separate locations on chromosomes 3, 4 and 5 within the genome of Giardia intestinalis, but no gene encoding a H1 linker histone could be recognized. The copies of each gene share extensive DNA sequence identities throughout their coding and 5' noncoding regions, which suggests these copies have arisen from relatively recent gene duplications or gene conversions. The transcription start sites are at triplet A sequences 1–27 nucleotides upstream of the translation start codon for each gene. We determined that a 50 bp region upstream from the start of the histone H4 coding region is the minimal promoter, and a highly conserved 15 bp sequence called the histone motif (him) is essential for its activity. The Giardia core histone genes are constitutively expressed at approximately equivalent levels and their mRNAs are polyadenylated. Competition gel-shift experiments suggest that a factor within the protein complex that binds him may also be a part of the protein complexes that bind other promoter elements described previously in Giardia. Conclusion In contrast to other eukaryotes, the Giardia genome has only a single class of core histone genes that encode replication-independent histones. Our inability to locate a gene encoding the linker histone H1 leads us to speculate that the H1 protein may not be required for the compaction of Giardia's small and gene-rich genome. PMID:17425802
Alternative splicing for members of human mosaic domain superfamilies. I. The CH and LIM domains containing group of proteins.

PubMed

Friedberg, Felix

2009-05-01

In this paper we examine (restricted to homo sapiens) the products resulting from gene duplication and the subsequent alternative splicing for the members of a multidomain group of proteins which possess the evolutionary conserved calponin homology CH domain, i.e. an "actin binding domain", as a singlet and which, in addition, contain the conserved cysteine rich double Zn finger possessing Lim domain, also as a singlet. Seven genes, resulting from gene duplications, were identified that code for seven group members for which pre-mRNAs appear to have undergone multiple alternative splicing: Mical 1, 2 and 3 are located on chromosomes 6q21, 11p15 and 22q11, respectively. The LMO7 gene is present on chromosome 13q22 and the LIMCH1 gene on chromosome 4p13. Micall1 is mapped to chromosome 22q13 and Micall2 to chromosome 7p22. Translated Gen/Bank ESTs suggest the existence of multiple products alternatively spliced from the pre-mRNAs encoded by these genes. Characteristic indicators of such splicing among the proteins derived from one gene must include containment of some common extensive 100% identical regions. In some instances only one exon might be partly or completely eliminated. Sometimes alternative splicing is also associated with an increased frequency of creation of an exon or part of an exon from an intron. Not only coding regions for the body of the protein but also for its N- or -C ends could be affected by the splicing. If created forms are merely beginning at different starting points but remain identical in sequence thereafter, their existence as products of alternate splicing must be questioned. In the splicings, described in this paper, multiple isoforms rather than a single isoform appear as products during the gene expression.
The Fragmented Mitochondrial Ribosomal RNAs of Plasmodium falciparum

PubMed Central

Feagin, Jean E.; Harrell, Maria Isabel; Lee, Jung C.; Coe, Kevin J.; Sands, Bryan H.; Cannone, Jamie J.; Tami, Germaine; Schnare, Murray N.; Gutell, Robin R.

2012-01-01

Background The mitochondrial genome in the human malaria parasite Plasmodium falciparum is most unusual. Over half the genome is composed of the genes for three classic mitochondrial proteins: cytochrome oxidase subunits I and III and apocytochrome b. The remainder encodes numerous small RNAs, ranging in size from 23 to 190 nt. Previous analysis revealed that some of these transcripts have significant sequence identity with highly conserved regions of large and small subunit rRNAs, and can form the expected secondary structures. However, these rRNA fragments are not encoded in linear order; instead, they are intermixed with one another and the protein coding genes, and are coded on both strands of the genome. This unorthodox arrangement hindered the identification of transcripts corresponding to other regions of rRNA that are highly conserved and/or are known to participate directly in protein synthesis. Principal Findings The identification of 14 additional small mitochondrial transcripts from P. falcipaurm and the assignment of 27 small RNAs (12 SSU RNAs totaling 804 nt, 15 LSU RNAs totaling 1233 nt) to specific regions of rRNA are supported by multiple lines of evidence. The regions now represented are highly similar to those of the small but contiguous mitochondrial rRNAs of Caenorhabditis elegans. The P. falciparum rRNA fragments cluster on the interfaces of the two ribosomal subunits in the three-dimensional structure of the ribosome. Significance All of the rRNA fragments are now presumed to have been identified with experimental methods, and nearly all of these have been mapped onto the SSU and LSU rRNAs. Conversely, all regions of the rRNAs that are known to be directly associated with protein synthesis have been identified in the P. falciparum mitochondrial genome and RNA transcripts. The fragmentation of the rRNA in the P. falciparum mitochondrion is the most extreme example of any rRNA fragmentation discovered. PMID:22761677
A class of circadian long non-coding RNAs mark enhancers modulating long-range circadian gene regulation

PubMed Central

Fan, Zenghua; Zhao, Meng; Joshi, Parth D.; Li, Ping; Zhang, Yan; Guo, Weimin; Xu, Yichi; Wang, Haifang; Zhao, Zhihu

2017-01-01

Abstract Circadian rhythm exerts its influence on animal physiology and behavior by regulating gene expression at various levels. Here we systematically explored circadian long non-coding RNAs (lncRNAs) in mouse liver and examined their circadian regulation. We found that a significant proportion of circadian lncRNAs are expressed at enhancer regions, mostly bound by two key circadian transcription factors, BMAL1 and REV-ERBα. These circadian lncRNAs showed similar circadian phases with their nearby genes. The extent of their nuclear localization is higher than protein coding genes but less than enhancer RNAs. The association between enhancer and circadian lncRNAs is also observed in tissues other than liver. Comparative analysis between mouse and rat circadian liver transcriptomes showed that circadian transcription at lncRNA loci tends to be conserved despite of low sequence conservation of lncRNAs. One such circadian lncRNA termed lnc-Crot led us to identify a super-enhancer region interacting with a cluster of genes involved in circadian regulation of metabolism through long-range interactions. Further experiments showed that lnc-Crot locus has enhancer function independent of lnc-Crot's transcription. Our results suggest that the enhancer-associated circadian lncRNAs mark the genomic loci modulating long-range circadian gene regulation and shed new lights on the evolutionary origin of lncRNAs. PMID:28335007
18 CFR 410.1 - Basin regulations-Water Code and Administrative Manual-Part III Water Quality Regulations.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 18 Conservation of Power and Water Resources 2 2010-04-01 2010-04-01 false Basin regulations-Water Code and Administrative Manual-Part III Water Quality Regulations. 410.1 Section 410.1 Conservation of Power and Water Resources DELAWARE RIVER BASIN COMMISSION ADMINISTRATIVE MANUAL BASIN REGULATIONS; WATER CODE AND ADMINISTRATIVE MANUAL-PART III...
18 CFR 410.1 - Basin regulations-Water Code and Administrative Manual-Part III Water Quality Regulations.

Code of Federal Regulations, 2014 CFR

2014-04-01

... 18 Conservation of Power and Water Resources 2 2014-04-01 2014-04-01 false Basin regulations-Water Code and Administrative Manual-Part III Water Quality Regulations. 410.1 Section 410.1 Conservation of Power and Water Resources DELAWARE RIVER BASIN COMMISSION ADMINISTRATIVE MANUAL BASIN REGULATIONS; WATER CODE AND ADMINISTRATIVE MANUAL-PART III...
18 CFR 410.1 - Basin regulations-Water Code and Administrative Manual-Part III Water Quality Regulations.

Code of Federal Regulations, 2013 CFR

2013-04-01

... 18 Conservation of Power and Water Resources 2 2013-04-01 2012-04-01 true Basin regulations-Water Code and Administrative Manual-Part III Water Quality Regulations. 410.1 Section 410.1 Conservation of Power and Water Resources DELAWARE RIVER BASIN COMMISSION ADMINISTRATIVE MANUAL BASIN REGULATIONS; WATER CODE AND ADMINISTRATIVE MANUAL-PART III...
18 CFR 410.1 - Basin regulations-Water Code and Administrative Manual-Part III Water Quality Regulations.

Code of Federal Regulations, 2012 CFR

2012-04-01

... 18 Conservation of Power and Water Resources 2 2012-04-01 2012-04-01 false Basin regulations-Water Code and Administrative Manual-Part III Water Quality Regulations. 410.1 Section 410.1 Conservation of Power and Water Resources DELAWARE RIVER BASIN COMMISSION ADMINISTRATIVE MANUAL BASIN REGULATIONS; WATER CODE AND ADMINISTRATIVE MANUAL-PART III...
Cloning and characterization of a basic phospholipase A2 homologue from Micrurus corallinus (coral snake) venom gland.

PubMed

de Oliveira, Ursula Castro; Assui, Alessandra; da Silva, Alvaro Rossan de Brandão Prieto; de Oliveira, Jane Silveira; Ho, Paulo Lee

2003-09-01

During the cloning of abundant cDNAs expressed in the Micrurus corallinus coral snake venom gland, several putative toxins, including a phospholipase A2 homologue cDNA (clone V2), were identified. The V2 cDNA clone codes for a potential coral snake toxin with a signal peptide of 27 amino acid residues plus a predicted mature protein with 119 amino acid residues. The deduced protein is highly similar to known phospholipases A2, with seven deduced S-S bridges at the same conserved positions. This protein was expressed in Escherichia coli as a His-tagged protein that allowed the rapid purification of the recombinant protein. This protein was used to generate antibodies, which recognized the recombinant protein in Western blot. This antiserum was used to screen a large number of venoms, showing a ubiquitous distribution of immunorelated proteins in all elapidic venoms but not in the viperidic Bothrops jararaca venom. This is the first description of a complete primary structure of a phospholipase A2 homologue deduced by cDNA cloning from a coral snake.
MultitaskProtDB: a database of multitasking proteins.

PubMed

Hernández, Sergio; Ferragut, Gabriela; Amela, Isaac; Perez-Pons, JosepAntoni; Piñol, Jaume; Mozo-Villarias, Angel; Cedano, Juan; Querol, Enrique

2014-01-01

We have compiled MultitaskProtDB, available online at http://wallace.uab.es/multitask, to provide a repository where the many multitasking proteins found in the literature can be stored. Multitasking or moonlighting is the capability of some proteins to execute two or more biological functions. Usually, multitasking proteins are experimentally revealed by serendipity. This ability of proteins to perform multitasking functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Even so, the study of this phenomenon is complex because, among other things, there is no database of moonlighting proteins. The existence of such a tool facilitates the collection and dissemination of these important data. This work reports the database, MultitaskProtDB, which is designed as a friendly user web page containing >288 multitasking proteins with their NCBI and UniProt accession numbers, canonical and additional biological functions, monomeric/oligomeric states, PDB codes when available and bibliographic references. This database also serves to gain insight into some characteristics of multitasking proteins such as frequencies of the different pairs of functions, phylogenetic conservation and so forth.
Isolation, expression, and chromosomal localization of the human mitochondrial capsule selenoprotein gene (MCSP)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Hanne; Schwemmer, M.; Tessmann, D.

1996-03-01

The mitochondrial capsule selenoprotein (MCS) (HGMW-approved symbol MCSP) is one of three proteins that are important for the maintenance and stabilization of the crescent structure of the sperm mitochondria. We describe here the isolation of a cDNA, the exon-intron organization, the expression, and the chromosomal localization of the human MCS gene. Nucleotide sequence analysis of the human and mouse MCS cDNAs reveals that the 5{prime}- and 3{prime}-untranslated sequences are more conserved (71%) than the coding sequences (59%). The open reading frame encodes a 116-amino-acid protein and lacks the UGA codons, which have been reported to encode the selenocysteines in themore » N-terminal of the deduced mouse protein. The deduced human protein shows a low degree of amino acid sequence identity to the mouse protein. The deduced human protein shows a low degree of amino acid sequence identity to the mouse protein (39%). The most striking homology lies in the dicysteine motifs. Northern and Southern zooblot analyses reveal that the MCS gene in human, baboon, and bovine is more conserved than its counterparts in mouse and rat. The single intron in the human MCS gene is approximately 6 kb and interrupts the 5{prime}-untranslated region at a position equivalent to that in the mouse and rat genes. Northern blot and in situ hybridization experiments demonstrate that the expression of the human MCS gene is restricted to haploid spermatids. The human gene was assigned to q21 of chromosome 1. 30 refs., 9 figs.« less
The Genome of the Western Clawed Frog Xenopus tropicalis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hellsten, Uffe; Harland, Richard M.; Gilchrist, Michael J.

2009-10-01

The western clawed frog Xenopus tropicalis is an important model for vertebrate development that combines experimental advantages of the African clawed frog Xenopus laevis with more tractable genetics. Here we present a draft genome sequence assembly of X. tropicalis. This genome encodes over 20,000 protein-coding genes, including orthologs of at least 1,700 human disease genes. Over a million expressed sequence tags validated the annotation. More than one-third of the genome consists of transposable elements, with unusually prevalent DNA transposons. Like other tetrapods, the genome contains gene deserts enriched for conserved non-coding elements. The genome exhibits remarkable shared synteny with humanmore » and chicken over major parts of large chromosomes, broken by lineage-specific chromosome fusions and fissions, mainly in the mammalian lineage.« less
A primary microcephaly protein complex forms a ring around parental centrioles.

PubMed

Sir, Joo-Hee; Barr, Alexis R; Nicholas, Adeline K; Carvalho, Ofelia P; Khurshid, Maryam; Sossick, Alex; Reichelt, Stefanie; D'Santos, Clive; Woods, C Geoffrey; Gergely, Fanni

2011-10-09

Autosomal recessive primary microcephaly (MCPH) is characterized by a substantial reduction in prenatal human brain growth without alteration of the cerebral architecture and is caused by biallelic mutations in genes coding for a subset of centrosomal proteins. Although at least three of these proteins have been implicated in centrosome duplication, the nature of the centrosome dysfunction that underlies the neurodevelopmental defect in MCPH is unclear. Here we report a homozygous MCPH-causing mutation in human CEP63. CEP63 forms a complex with another MCPH protein, CEP152, a conserved centrosome duplication factor. Together, these two proteins are essential for maintaining normal centrosome numbers in cells. Using super-resolution microscopy, we found that CEP63 and CEP152 co-localize in a discrete ring around the proximal end of the parental centriole, a pattern specifically disrupted in CEP63-deficient cells derived from patients with MCPH. This work suggests that the CEP152-CEP63 ring-like structure ensures normal neurodevelopment and that its impairment particularly affects human cerebral cortex growth.
Gene 2 of the sigma rhabdovirus genome encodes the P protein, and gene 3 encodes a protein related to the reverse transcriptase of retroelements.

PubMed

Landès-Devauchelle, C; Bras, F; Dezélée, S; Teninges, D

1995-11-10

The nucleotide sequence of the genes 2 and 3 of the Drosophila rhabdovirus sigma was determined from cDNAs to viral genome and poly(A)+ mRNAs. Gene 2 comprises 1032 nucleotides and contains a long ORF encoding a molecular weight 35,208 polypeptide present in infected cells and in virions which migrates in SDS-PAGE as a doublet of M(r) about 60 kDa. The distribution of acidic charges as well as the electrophoretic properties of the protein are characteristic of the rhabdovirus P proteins. Gene 3 comprises 923 nucleotides and contains a long ORF capable of coding a polypeptide of 298 amino acids of MW 33,790. The putative protein (PP3) is similar in size to a minor component of the virions. Computer analysis shows that the sequence of PP3 contains three motifs related to the conserved motifs of reverse transcriptases.
A High-Resolution Gene Map of the Chloroplast Genome of the Red Alga Porphyra purpurea.

PubMed Central

Reith, M; Munholland, J

1993-01-01

Extensive DNA sequencing of the chloroplast genome of the red alga Porphyra purpurea has resulted in the detection of more than 125 genes. Fifty-eight (approximately 46%) of these genes are not found on the chloroplast genomes of land plants. These include genes encoding 17 photosynthetic proteins, three tRNAs, and nine ribosomal proteins. In addition, nine genes encoding proteins related to biosynthetic functions, six genes encoding proteins involved in gene expression, and at least five genes encoding miscellaneous proteins are among those not known to be located on land plant chloroplast genomes. The increased coding capacity of the P. purpurea chloroplast genome, along with other characteristics such as the absence of introns and the conservation of ancestral operons, demonstrate the primitive nature of the P. purpurea chloroplast genome. In addition, evidence for a monophyletic origin of chloroplasts is suggested by the identification of two groups of genes that are clustered in chloroplast genomes but not in cyanobacteria. PMID:12271072
Association of Amine-Receptor DNA Sequence Variants with Associative Learning in the Honeybee.

PubMed

Lagisz, Malgorzata; Mercer, Alison R; de Mouzon, Charlotte; Santos, Luana L S; Nakagawa, Shinichi

2016-03-01

Octopamine- and dopamine-based neuromodulatory systems play a critical role in learning and learning-related behaviour in insects. To further our understanding of these systems and resulting phenotypes, we quantified DNA sequence variations at six loci coding octopamine-and dopamine-receptors and their association with aversive and appetitive learning traits in a population of honeybees. We identified 79 polymorphic sequence markers (mostly SNPs and a few insertions/deletions) located within or close to six candidate genes. Intriguingly, we found that levels of sequence variation in the protein-coding regions studied were low, indicating that sequence variation in the coding regions of receptor genes critical to learning and memory is strongly selected against. Non-coding and upstream regions of the same genes, however, were less conserved and sequence variations in these regions were weakly associated with between-individual differences in learning-related traits. While these associations do not directly imply a specific molecular mechanism, they suggest that the cross-talk between dopamine and octopamine signalling pathways may influence olfactory learning and memory in the honeybee.
The primitive code and repeats of base oligomers as the primordial protein-encoding sequence.

PubMed Central

Ohno, S; Epplen, J T

1983-01-01

Even if the prebiotic self-replication of nucleic acids and the subsequent emergence of primitive, enzyme-independent tRNAs are accepted as plausible, the origin of life by spontaneous generation still appears improbable. This is because the just-emerged primitive translational machinery had to cope with base sequences that were not preselected for their coding potentials. Particularly if the primitive mitochondria-like code with four chain-terminating base triplets preceded the universal code, the translation of long, randomly generated, base sequences at this critical stage would have merely resulted in the production of short oligopeptides instead of long polypeptide chains. We present the base sequence of a mouse transcript containing tetranucleotide repeats conserved during evolution. Even if translated in accordance with the primitive mitochondria-like code, this transcript in its three reading frames can yield 245-, 246-, and 251-residue-long tetrapeptidic periodical polypeptides that are already acquiring longer periodicities. We contend that the first set of base sequences translated at the beginning of life were such oligonucleotide repeats. By quickly acquiring longer periodicities, their products must have soon gained characteristic secondary structures--alpha-helical or beta-sheet or both. PMID:6574491

A +1 ribosomal frameshifting motif prevalent among plant amalgaviruses.

PubMed

Nibert, Max L; Pyle, Jesse D; Firth, Andrew E

2016-11-01

Sequence accessions attributable to novel plant amalgaviruses have been found in the Transcriptome Shotgun Assembly database. Sixteen accessions, derived from 12 different plant species, appear to encompass the complete protein-coding regions of the proposed amalgaviruses, which would substantially expand the size of genus Amalgavirus from 4 current species. Other findings include evidence for UUU_CGN as a +1 ribosomal frameshifting motif prevalent among plant amalgaviruses; for a variant version of this motif found thus far in only two amalgaviruses from solanaceous plants; for a region of α-helical coiled coil propensity conserved in a central region of the ORF1 translation product of plant amalgaviruses; and for conserved sequences in a C-terminal region of the ORF2 translation product (RNA-dependent RNA polymerase) of plant amalgaviruses, seemingly beyond the region of conserved polymerase motifs. These results additionally illustrate the value of mining the TSA database and others for novel viral sequences for comparative analyses. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Multi-Omics Driven Assembly and Annotation of the Sandalwood (Santalum album) Genome.

PubMed

Mahesh, Hirehally Basavarajegowda; Subba, Pratigya; Advani, Jayshree; Shirke, Meghana Deepak; Loganathan, Ramya Malarini; Chandana, Shankara Lingu; Shilpa, Siddappa; Chatterjee, Oishi; Pinto, Sneha Maria; Prasad, Thottethodi Subrahmanya Keshava; Gowda, Malali

2018-04-01

Indian sandalwood ( Santalum album ) is an important tropical evergreen tree known for its fragrant heartwood-derived essential oil and its valuable carving wood. Here, we applied an integrated genomic, transcriptomic, and proteomic approach to assemble and annotate the Indian sandalwood genome. Our genome sequencing resulted in the establishment of a draft map of the smallest genome for any woody tree species to date (221 Mb). The genome annotation predicted 38,119 protein-coding genes and 27.42% repetitive DNA elements. In-depth proteome analysis revealed the identities of 72,325 unique peptides, which confirmed 10,076 of the predicted genes. The addition of transcriptomic and proteogenomic approaches resulted in the identification of 53 novel proteins and 34 gene-correction events that were missed by genomic approaches. Proteogenomic analysis also helped in reassigning 1,348 potential noncoding RNAs as bona fide protein-coding messenger RNAs. Gene expression patterns at the RNA and protein levels indicated that peptide sequencing was useful in capturing proteins encoded by nuclear and organellar genomes alike. Mass spectrometry-based proteomic evidence provided an unbiased approach toward the identification of proteins encoded by organellar genomes. Such proteins are often missed in transcriptome data sets due to the enrichment of only messenger RNAs that contain poly(A) tails. Overall, the use of integrated omic approaches enhanced the quality of the assembly and annotation of this nonmodel plant genome. The availability of genomic, transcriptomic, and proteomic data will enhance genomics-assisted breeding, germplasm characterization, and conservation of sandalwood trees. © 2018 American Society of Plant Biologists. All Rights Reserved.
Conservation of Transcription Start Sites within Genes across a Bacterial Genus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shao, Wenjun; Price, Morgan N.; Deutschbauer, Adam M.

Transcription start sites (TSSs) lying inside annotated genes, on the same or opposite strand, have been observed in diverse bacteria, but the function of these unexpected transcripts is unclear. Here, we use the metal-reducing bacterium Shewanella oneidensis MR-1 and its relatives to study the evolutionary conservation of unexpected TSSs. Using high-resolution tiling microarrays and 5'-end RNA sequencing, we identified 2,531 TSSs in S. oneidensis MR-1, of which 18% were located inside coding sequences (CDSs). Comparative transcriptome analysis with seven additional Shewanella species revealed that the majority (76%) of the TSSs within the upstream regions of annotated genes (gTSSs) were conserved.more » Thirty percent of the TSSs that were inside genes and on the sense strand (iTSSs) were also conserved. Sequence analysis around these iTSSs showed conserved promoter motifs, suggesting that many iTSS are under purifying selection. Furthermore, conserved iTSSs are enriched for regulatory motifs, suggesting that they are regulated, and they tend to eliminate polar effects, which confirms that they are functional. In contrast, the transcription of antisense TSSs located inside CDSs (aTSSs) was significantly less likely to be conserved (22%). However, aTSSs whose transcription was conserved often have conserved promoter motifs and drive the expression of nearby genes. Overall, our findings demonstrate that some internal TSSs are conserved and drive protein expression despite their unusual locations, but the majority are not conserved and may reflect noisy initiation of transcription rather than a biological function.« less
RNA editing makes mistakes in plant mitochondria: editing loses sense in transcripts of a rps19 pseudogene and in creating stop codons in coxI and rps3 mRNAs of Oenothera.

PubMed Central

Schuster, W; Brennicke, A

1991-01-01

An intact gene for the ribosomal protein S19 (rps19) is absent from Oenothera mitochondria. The conserved rps19 reading frame found in the mitochondrial genome is interrupted by a termination codon. This rps19 pseudogene is cotranscribed with the downstream rps3 gene and is edited on both sides of the translational stop. Editing, however, changes the amino acid sequence at positions that were well conserved before editing. Other strange editings create translational stops in open reading frames coding for functional proteins. In coxI and rps3 mRNAs CGA codons are edited to UGA stop codons only five and three codons, respectively, downstream to the initiation codon. These aberrant editings in essential open reading frames and in the rps19 pseudogene appear to have been shifted to these positions from other editing sites. These observations suggest a requirement for a continuous evolutionary constraint on the editing specificities in plant mitochondria. Images PMID:1762921
Localization of TFIIB binding regions using serial analysis of chromatin occupancy

PubMed Central

Yochum, Gregory S; Rajaraman, Veena; Cleland, Ryan; McWeeney, Shannon

2007-01-01

Background: RNA Polymerase II (RNAP II) is recruited to core promoters by the pre-initiation complex (PIC) of general transcription factors. Within the PIC, transcription factor for RNA polymerase IIB (TFIIB) determines the start site of transcription. TFIIB binding has not been localized, genome-wide, in metazoans. Serial analysis of chromatin occupancy (SACO) is an unbiased methodology used to empirically identify transcription factor binding regions. In this report, we use TFIIB and SACO to localize TFIIB binding regions across the rat genome. Results: A sample of the TFIIB SACO library was sequenced and 12,968 TFIIB genomic signature tags (GSTs) were assigned to the rat genome. GSTs are 20–22 base pair fragments that are derived from TFIIB bound chromatin. TFIIB localized to both non-protein coding and protein-coding loci. For 21% of the 1783 protein-coding genes in this sample of the SACO library, TFIIB binding mapped near the characterized 5' promoter that is upstream of the transcription start site (TSS). However, internal TFIIB binding positions were identified in 57% of the 1783 protein-coding genes. Internal positions are defined as those within an inclusive region greater than 2.5 kb downstream from the 5' TSS and 2.5 kb upstream from the transcription stop. We demonstrate that both TFIIB and TFIID (an additional component of PICs) bound to internal regions using chromatin immunoprecipitation (ChIP). The 5' cap of transcripts associated with internal TFIIB binding positions were identified using a cap-trapping assay. The 5' TSSs for internal transcripts were confirmed by primer extension. Additionally, an analysis of the functional annotation of mouse 3 (FANTOM3) databases indicates that internally initiated transcripts identified by TFIIB SACO in rat are conserved in mouse. Conclusion: Our findings that TFIIB binding is not restricted to the 5' upstream region indicates that the propensity for PIC to contribute to transcript diversity is far greater than previously appreciated. PMID:17997859
Phylogenomic analysis of the Chilean clade of Liolaemus lizards (Squamata: Liolaemidae) based on sequence capture data.

PubMed

Panzera, Alejandra; Leaché, Adam D; D'Elía, Guillermo; Victoriano, Pedro F

2017-01-01

The genus Liolaemus is one of the most ecologically diverse and species-rich genera of lizards worldwide. It currently includes more than 250 recognized species, which have been subject to many ecological and evolutionary studies. Nevertheless, Liolaemus lizards have a complex taxonomic history, mainly due to the incongruence between morphological and genetic data, incomplete taxon sampling, incomplete lineage sorting and hybridization. In addition, as many species have restricted and remote distributions, this has hampered their examination and inclusion in molecular systematic studies. The aims of this study are to infer a robust phylogeny for a subsample of lizards representing the Chilean clade (subgenus Liolaemus sensu stricto ), and to test the monophyly of several of the major species groups. We use a phylogenomic approach, targeting 541 ultra-conserved elements (UCEs) and 44 protein-coding genes for 16 taxa. We conduct a comparison of phylogenetic analyses using maximum-likelihood and several species tree inference methods. The UCEs provide stronger support for phylogenetic relationships compared to the protein-coding genes; however, the UCEs outnumber the protein-coding genes by 10-fold. On average, the protein-coding genes contain over twice the number of informative sites. Based on our phylogenomic analyses, all the groups sampled are polyphyletic. Liolaemus tenuis tenuis is difficult to place in the phylogeny, because only a few loci (nine) were recovered for this species. Topologies or support values did not change dramatically upon exclusion of L. t. tenuis from analyses, suggesting that missing data did not had a significant impact on phylogenetic inference in this data set. The phylogenomic analyses provide strong support for sister group relationships between L. fuscus , L. monticola , L. nigroviridis and L. nitidus , and L. platei and L. velosoi . Despite our limited taxon sampling, we have provided a reliable starting hypothesis for the relationships among many major groups of the Chilean clade of Liolaemus that will help future work aimed at resolving the Liolaemus phylogeny.
Comparative genomic analysis of four representative plant growth-promoting rhizobacteria in Pseudomonas.

PubMed

Shen, Xuemei; Hu, Hongbo; Peng, Huasong; Wang, Wei; Zhang, Xuehong

2013-04-22

Some Pseudomonas strains function as predominant plant growth-promoting rhizobacteria (PGPR). Within this group, Pseudomonas chlororaphis and Pseudomonas fluorescens are non-pathogenic biocontrol agents, and some Pseudomonas aeruginosa and Pseudomonas stutzeri strains are PGPR. P. chlororaphis GP72 is a plant growth-promoting rhizobacterium with a fully sequenced genome. We conducted a genomic analysis comparing GP72 with three other pseudomonad PGPR: P. fluorescens Pf-5, P. aeruginosa M18, and the nitrogen-fixing strain P. stutzeri A1501. Our aim was to identify the similarities and differences among these strains using a comparative genomic approach to clarify the mechanisms of plant growth-promoting activity. The genome sizes of GP72, Pf-5, M18, and A1501 ranged from 4.6 to 7.1 M, and the number of protein-coding genes varied among the four species. Clusters of Orthologous Groups (COGs) analysis assigned functions to predicted proteins. The COGs distributions were similar among the four species. However, the percentage of genes encoding transposases and their inactivated derivatives (COG L) was 1.33% of the total genes with COGs classifications in A1501, 0.21% in GP72, 0.02% in Pf-5, and 0.11% in M18. A phylogenetic analysis indicated that GP72 and Pf-5 were the most closely related strains, consistent with the genome alignment results. Comparisons of predicted coding sequences (CDSs) between GP72 and Pf-5 revealed 3544 conserved genes. There were fewer conserved genes when GP72 CDSs were compared with those of A1501 and M18. Comparisons among the four Pseudomonas species revealed 603 conserved genes in GP72, illustrating common plant growth-promoting traits shared among these PGPR. Conserved genes were related to catabolism, transport of plant-derived compounds, stress resistance, and rhizosphere colonization. Some strain-specific CDSs were related to different kinds of biocontrol activities or plant growth promotion. The GP72 genome contained the cus operon (related to heavy metal resistance) and a gene cluster involved in type IV pilus biosynthesis, which confers adhesion ability. Comparative genomic analysis of four representative PGPR revealed some conserved regions, indicating common characteristics (metabolism of plant-derived compounds, heavy metal resistance, and rhizosphere colonization) among these pseudomonad PGPR. Genomic regions specific to each strain provide clues to its lifestyle, ecological adaptation, and physiological role in the rhizosphere.
Long-PCR based next generation sequencing of the whole mitochondrial genome of the peacock skate Pavoraja nitida (Elasmobranchii: Arhynchobatidae).

PubMed

Yang, Lei; Naylor, Gavin J P

2016-01-01

We determined the complete mitochondrial genome sequence (16,760 bp) of the peacock skate Pavoraja nitida using a long-PCR based next generation sequencing method. It has 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 1 control region in the typical vertebrate arrangement. Primers, protocols, and procedures used to obtain this mitogenome are provided. We anticipate that this approach will facilitate rapid collection of mitogenome sequences for studies on phylogenetic relationships, population genetics, and conservation of cartilaginous fishes.
The crystal structure of the catalytic domain of the ser/thr kinase PknA from M. tuberculosis shows an Src-like autoinhibited conformation.

PubMed

Wagner, Tristan; Alexandre, Matthieu; Duran, Rosario; Barilone, Nathalie; Wehenkel, Annemarie; Alzari, Pedro M; Bellinzoni, Marco

2015-05-01

Signal transduction mediated by Ser/Thr phosphorylation in Mycobacterium tuberculosis has been intensively studied in the last years, as its genome harbors eleven genes coding for eukaryotic-like Ser/Thr kinases. Here we describe the crystal structure and the autophosphorylation sites of the catalytic domain of PknA, one of two protein kinases essential for pathogen's survival. The structure of the ligand-free kinase domain shows an auto-inhibited conformation similar to that observed in human Tyr kinases of the Src-family. These results reinforce the high conservation of structural hallmarks and regulation mechanisms between prokaryotic and eukaryotic protein kinases. © 2015 Wiley Periodicals, Inc.
Fate of mRNA following disaggregation of brain polysomes after administration of (+)-lysergic acid diethylamide in vivo.

PubMed

Mahony, J B; Brown, I R

1979-11-22

Intravenous injection of (+)-lysergic acid diethylamide into young rabbits induced a transient brain-specific disaggregation of polysomes to monosomes. Investigation of the fate of mRNA revealed that brain poly(A+)mRNA was conserved. In particular, mRNA coding for brain-specific S100 protein was not degraded, nor was it released into free ribonucleoprotein particles. Following the (+)-lysergic acid diethylamide-induced disaggregation of polysomes, mRNA shifted from polysomes and accumulated on monosomes. Formation of a blocked monosome complex, which contained intact mRNA and 40-S plus 60-S ribosomal subunits but lacked nascent peptide chains, suggested that (+)-lysergic acid diethylamide inhibited brain protein synthesis at a specific stage of late initiation or early elongation.
Isolation and sequence of partial cDNA clones of human L1: homology of human and rodent L1 in the cytoplasmic region.

PubMed

Harper, J R; Prince, J T; Healy, P A; Stuart, J K; Nauman, S J; Stallcup, W B

1991-03-01

We have isolated cDNA clones coding for the human homologue of the neuronal cell adhesion molecule L1. The nucleotide sequence of the cDNA clones and the deduced primary amino acid sequence of the carboxy terminal portion of the human L1 are homologous to the corresponding sequences of mouse L1 and rat NILE glycoprotein, with an especially high sequences identity in the cytoplasmic regions of the proteins. There is also protein sequence homology with the cytoplasmic region of the Drosophila cell adhesion molecule, neuroglian. The conservation of the cytoplasmic domain argues for an important functional role for this portion of the molecule.
Nucleotide sequence determination of guinea-pig casein B mRNA reveals homology with bovine and rat alpha s1 caseins and conservation of the non-coding regions of the mRNA.

PubMed Central

Hall, L; Laird, J E; Craig, R K

1984-01-01

Nucleotide sequence analysis of cloned guinea-pig casein B cDNA sequences has identified two casein B variants related to the bovine and rat alpha s1 caseins. Amino acid homology was largely confined to the known bovine or predicted rat phosphorylation sites and within the 'signal' precursor sequence. Comparison of the deduced nucleotide sequence of the guinea-pig and rat alpha s1 casein mRNA species showed greater sequence conservation in the non-coding than in the coding regions, suggesting a functional and possibly regulatory role for the non-coding regions of casein mRNA. The results provide insight into the evolution of the casein genes, and raise questions as to the role of conserved nucleotide sequences within the non-coding regions of mRNA species. Images Fig. 1. PMID:6548375
Assessment of allelic diversity in intron-containing Mal d 1 genes and their association to apple allergenicity

PubMed Central

Gao, Zhongshan; Weg, Eric W van de; Matos, Catarina I; Arens, Paul; Bolhaar, Suzanne THP; Knulst, Andre C; Li, Yinghui; Hoffmann-Sommergruber, Karin; Gilissen, Luud JWJ

2008-01-01

Background Mal d 1 is a major apple allergen causing food allergic symptoms of the oral allergy syndrome (OAS) in birch-pollen sensitised patients. The Mal d 1 gene family is known to have at least 7 intron-containing and 11 intronless members that have been mapped in clusters on three linkage groups. In this study, the allelic diversity of the seven intron-containing Mal d 1 genes was assessed among a set of apple cultivars by sequencing or indirectly through pedigree genotyping. Protein variant constitutions were subsequently compared with Skin Prick Test (SPT) responses to study the association of deduced protein variants with allergenicity in a set of 14 cultivars. Results From the seven intron-containing Mal d 1 genes investigated, Mal d 1.01 and Mal d 1.02 were highly conserved, as nine out of ten cultivars coded for the same protein variant, while only one cultivar coded for a second variant. Mal d 1.04, Mal d 1.05 and Mal d 1.06 A, B and C were more variable, coding for three to six different protein variants. Comparison of Mal d 1 allelic composition between the high-allergenic cultivar Golden Delicious and the low-allergenic cultivars Santana and Priscilla, which are linked in pedigree, showed an association between the protein variants coded by the Mal d 1.04 and -1.06A genes (both located on linkage group 16) with allergenicity. This association was confirmed in 10 other cultivars. In addition, Mal d 1.06A allele dosage effects associated with the degree of allergenicity based on prick to prick testing. Conversely, no associations were observed for the protein variants coded by the Mal d 1.01 (on linkage group 13), -1.02, -1.06B, -1.06C genes (all on linkage group 16), nor by the Mal d 1.05 gene (on linkage group 6). Conclusion Protein variant compositions of Mal d 1.04 and -1.06A and, in case of Mal d 1.06A, allele doses are associated with the differences in allergenicity among fourteen apple cultivars. This information indicates the involvement of qualitative as well as quantitative factors in allergenicity and warrants further research in the relative importance of quantitative and qualitative aspects of Mal d 1 gene expression on allergenicity. Results from this study have implications for medical diagnostics, immunotherapy, clinical research and breeding schemes for new hypo-allergenic cultivars. PMID:19014530
Identification and characterization of a novel family of mammalian ependymin-related proteins (MERPs) in hematopoietic, nonhematopoietic, and malignant tissues.

PubMed

Apostolopoulos, J; Sparrow, R L; McLeod, J L; Collier, F M; Darcy, P K; Slater, H R; Ngu, C; Gregorio-King, C C; Kirkland, M A

2001-10-01

Evidence is presented for a family of mammalian homologs of ependymin, which we have termed the mammalian ependymin-related proteins (MERPs). Ependymins are secreted glycoproteins that form the major component of the cerebrospinal fluid in many teleost fish. We have cloned the entire coding region of human MERP-1 and mapped the gene to chromosome 7p14.1 by fluorescence in situ hybridization. In addition, three human MERP pseudogenes were identified on chromosomes 8, 16, and X. We have also cloned the mouse MERP-1 homolog and an additional family member, mouse MERP-2. Then, using bioinformatics, the mouse MERP-2 gene was localized to chromosome 13, and we identified the monkey MERP-1 homolog and frog ependymin-related protein (ERP). Despite relatively low amino acid sequence conservation between piscine ependymins, toad ERP, and MERPs, several amino acids (including four key cysteine residues) are strictly conserved, and the hydropathy profiles are remarkably alike, suggesting the possibilities of similar protein conformation and function. As with fish ependymins, frog ERP and MERPs contain a signal peptide typical of secreted proteins. The MERPs were found to be expressed at high levels in several hematopoietic cell lines and in nonhematopoietic tissues such as brain, heart, and skeletal muscle, as well as several malignant tissues and malignant cell lines. These findings suggest that MERPs have several potential roles in a range of cells and tissues.
The PRC2-binding long non-coding RNAs in human and mouse genomes are associated with predictive sequence features

NASA Astrophysics Data System (ADS)

Tu, Shiqi; Yuan, Guo-Cheng; Shao, Zhen

2017-01-01

Recently, long non-coding RNAs (lncRNAs) have emerged as an important class of molecules involved in many cellular processes. One of their primary functions is to shape epigenetic landscape through interactions with chromatin modifying proteins. However, mechanisms contributing to the specificity of such interactions remain poorly understood. Here we took the human and mouse lncRNAs that were experimentally determined to have physical interactions with Polycomb repressive complex 2 (PRC2), and systematically investigated the sequence features of these lncRNAs by developing a new computational pipeline for sequences composition analysis, in which each sequence is considered as a series of transitions between adjacent nucleotides. Through that, PRC2-binding lncRNAs were found to be associated with a set of distinctive and evolutionarily conserved sequence features, which can be utilized to distinguish them from the others with considerable accuracy. We further identified fragments of PRC2-binding lncRNAs that are enriched with these sequence features, and found they show strong PRC2-binding signals and are more highly conserved across species than the other parts, implying their functional importance.
Analysis of Antisense Expression by Whole Genome Tiling Microarrays and siRNAs Suggests Mis-Annotation of Arabidopsis Orphan Protein-Coding Genes

PubMed Central

Richardson, Casey R.; Luo, Qing-Jun; Gontcharova, Viktoria; Jiang, Ying-Wen; Samanta, Manoj; Youn, Eunseog; Rock, Christopher D.

2010-01-01

Background MicroRNAs (miRNAs) and trans-acting small-interfering RNAs (tasi-RNAs) are small (20–22 nt long) RNAs (smRNAs) generated from hairpin secondary structures or antisense transcripts, respectively, that regulate gene expression by Watson-Crick pairing to a target mRNA and altering expression by mechanisms related to RNA interference. The high sequence homology of plant miRNAs to their targets has been the mainstay of miRNA prediction algorithms, which are limited in their predictive power for other kingdoms because miRNA complementarity is less conserved yet transitive processes (production of antisense smRNAs) are active in eukaryotes. We hypothesize that antisense transcription and associated smRNAs are biomarkers which can be computationally modeled for gene discovery. Principal Findings We explored rice (Oryza sativa) sense and antisense gene expression in publicly available whole genome tiling array transcriptome data and sequenced smRNA libraries (as well as C. elegans) and found evidence of transitivity of MIRNA genes similar to that found in Arabidopsis. Statistical analysis of antisense transcript abundances, presence of antisense ESTs, and association with smRNAs suggests several hundred Arabidopsis ‘orphan’ hypothetical genes are non-coding RNAs. Consistent with this hypothesis, we found novel Arabidopsis homologues of some MIRNA genes on the antisense strand of previously annotated protein-coding genes. A Support Vector Machine (SVM) was applied using thermodynamic energy of binding plus novel expression features of sense/antisense transcription topology and siRNA abundances to build a prediction model of miRNA targets. The SVM when trained on targets could predict the “ancient” (deeply conserved) class of validated Arabidopsis MIRNA genes with an accuracy of 84%, and 76% for “new” rapidly-evolving MIRNA genes. Conclusions Antisense and smRNA expression features and computational methods may identify novel MIRNA genes and other non-coding RNAs in plants and potentially other kingdoms, which can provide insight into antisense transcription, miRNA evolution, and post-transcriptional gene regulation. PMID:20520764
Crystal structure at 2.8 A of Huntingtin-interacting protein 1 (HIP1) coiled-coil domain reveals a charged surface suitable for HIP1 protein interactor (HIPPI).

PubMed

Niu, Qian; Ybe, Joel A

2008-02-01

Huntington's disease is a genetic neurological disorder that is triggered by the dissociation of the huntingtin protein (htt) from its obligate interaction partner Huntingtin-interacting protein 1 (HIP1). The release of the huntingtin protein permits HIP1 protein interactor (HIPPI) to bind to its recognition site on HIP1 to form a HIPPI/HIP1 complex that recruits procaspase-8 to begin the process of apoptosis. The interaction module between HIPPI and HIP1 was predicted to resemble a death-effector domain. Our 2.8-A crystal structure of the HIP1 371-481 subfragment that includes F432 and K474, which is important for HIPPI binding, is not a death-effector domain but is a partially opened coiled coil. The HIP1 371-481 model reveals a basic surface that we hypothesize to be suitable for binding HIPPI. There is an opened region next to the putative HIPPI site that is highly negatively charged. The acidic residues in this region are highly conserved in HIP1 and a related protein, HIP1R, from different organisms but are not conserved in the yeast homologue of HIP1, sla2p. We have modeled approximately 85% of the coiled-coil domain by joining our new HIP1 371-481 structure to the HIP1 482-586 model (Protein Data Bank code: 2NO2). Finally, the middle of this coiled-coil domain may be intrinsically flexible and suggests a new interaction model where HIPPI binds to a U-shaped HIP1 molecule.
Comparative Genome Analysis of “Candidatus Phytoplasma australiense” (Subgroup tuf-Australia I; rp-A) and “Ca. Phytoplasma asteris” Strains OY-M and AY-WB▿ †

PubMed Central

Tran-Nguyen, L. T. T.; Kube, M.; Schneider, B.; Reinhardt, R.; Gibb, K. S.

2008-01-01

The chromosome sequence of “Candidatus Phytoplasma australiense” (subgroup tuf-Australia I; rp-A), associated with dieback in papaya, Australian grapevine yellows in grapevine, and several other important plant diseases, was determined. The circular chromosome is represented by 879,324 nucleotides, a GC content of 27%, and 839 protein-coding genes. Five hundred two of these protein-coding genes were functionally assigned, while 337 genes were hypothetical proteins with unknown function. Potential mobile units (PMUs) containing clusters of DNA repeats comprised 12.1% of the genome. These PMUs encoded genes involved in DNA replication, repair, and recombination; nucleotide transport and metabolism; translation; and ribosomal structure. Elements with similarities to phage integrases found in these mobile units were difficult to classify, as they were similar to both insertion sequences and bacteriophages. Comparative analysis of “Ca. Phytoplasma australiense” with “Ca. Phytoplasma asteris” strains OY-M and AY-WB showed that the gene order was more conserved between the closely related “Ca. Phytoplasma asteris” strains than to “Ca. Phytoplasma australiense.” Differences observed between “Ca. Phytoplasma australiense” and “Ca. Phytoplasma asteris” strains included the chromosome size (18,693 bp larger than OY-M), a larger number of genes with assigned function, and hypothetical proteins with unknown function. PMID:18359806
The C terminus of Ku80 activates the DNA-dependent protein kinase catalytic subunit.

PubMed

Singleton, B K; Torres-Arzayus, M I; Rottinghaus, S T; Taccioli, G E; Jeggo, P A

1999-05-01

Ku is a heterodimeric protein with double-stranded DNA end-binding activity that operates in the process of nonhomologous end joining. Ku is thought to target the DNA-dependent protein kinase (DNA-PK) complex to the DNA and, when DNA bound, can interact and activate the DNA-PK catalytic subunit (DNA-PKcs). We have carried out a 3' deletion analysis of Ku80, the larger subunit of Ku, and shown that the C-terminal 178 amino acid residues are dispensable for DNA end-binding activity but are required for efficient interaction of Ku with DNA-PKcs. Cells expressing Ku80 proteins that lack the terminal 178 residues have low DNA-PK activity, are radiation sensitive, and can recombine the signal junctions but not the coding junctions during V(D)J recombination. These cells have therefore acquired the phenotype of mouse SCID cells despite expressing DNA-PKcs protein, suggesting that an interaction between DNA-PKcs and Ku, involving the C-terminal region of Ku80, is required for DNA double-strand break rejoining and coding but not signal joint formation. To gain further insight into important domains in Ku80, we report a point mutational change in Ku80 in the defective xrs-2 cell line. This residue is conserved among species and lies outside of the previously reported Ku70-Ku80 interaction domain. The mutational change nonetheless abrogates the Ku70-Ku80 interaction and DNA end-binding activity.
Single-cell transcriptome analysis of fish immune cells provides insight into the evolution of vertebrate immune cell types.

PubMed

Carmona, Santiago J; Teichmann, Sarah A; Ferreira, Lauren; Macaulay, Iain C; Stubbington, Michael J T; Cvejic, Ana; Gfeller, David

2017-03-01

The immune system of vertebrate species consists of many different cell types that have distinct functional roles and are subject to different evolutionary pressures. Here, we first analyzed conservation of genes specific for all major immune cell types in human and mouse. Our results revealed higher gene turnover and faster evolution of trans -membrane proteins in NK cells compared with other immune cell types, and especially T cells, but similar conservation of nuclear and cytoplasmic protein coding genes. To validate these findings in a distant vertebrate species, we used single-cell RNA sequencing of lck:GFP cells in zebrafish and obtained the first transcriptome of specific immune cell types in a nonmammalian species. Unsupervised clustering and single-cell TCR locus reconstruction identified three cell populations, T cells, a novel type of NK-like cells, and a smaller population of myeloid-like cells. Differential expression analysis uncovered new immune-cell-specific genes, including novel immunoglobulin-like receptors, and neofunctionalization of recently duplicated paralogs. Evolutionary analyses confirmed the higher gene turnover of trans -membrane proteins in NK cells compared with T cells in fish species, suggesting that this is a general property of immune cell types across all vertebrates. © 2017 Carmona et al.; Published by Cold Spring Harbor Laboratory Press.

Single-cell transcriptome analysis of fish immune cells provides insight into the evolution of vertebrate immune cell types

PubMed Central

Ferreira, Lauren; Macaulay, Iain C.; Stubbington, Michael J.T.

2017-01-01

The immune system of vertebrate species consists of many different cell types that have distinct functional roles and are subject to different evolutionary pressures. Here, we first analyzed conservation of genes specific for all major immune cell types in human and mouse. Our results revealed higher gene turnover and faster evolution of trans-membrane proteins in NK cells compared with other immune cell types, and especially T cells, but similar conservation of nuclear and cytoplasmic protein coding genes. To validate these findings in a distant vertebrate species, we used single-cell RNA sequencing of lck:GFP cells in zebrafish and obtained the first transcriptome of specific immune cell types in a nonmammalian species. Unsupervised clustering and single-cell TCR locus reconstruction identified three cell populations, T cells, a novel type of NK-like cells, and a smaller population of myeloid-like cells. Differential expression analysis uncovered new immune-cell–specific genes, including novel immunoglobulin-like receptors, and neofunctionalization of recently duplicated paralogs. Evolutionary analyses confirmed the higher gene turnover of trans-membrane proteins in NK cells compared with T cells in fish species, suggesting that this is a general property of immune cell types across all vertebrates. PMID:28087841
Chromosomal localization and cDNA cloning of the human DBP and TEF genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Khatib, Z.A.; Inaba, T.; Valentine, M.

1994-09-15

The authors have isolated cDNA and genomic clones and determined the human chromosome positions of two genes encoding transcription factors expressed in the liver and the pituitary gland: albumin D-site-binding protein (DBP) and thyrotroph embryonic factor (TEF). Both proteins have been identified as members of the PAR (proline and acidic amino acid-rich) subfamily of bZIP transcription factors in the rat, but human homologues have not been characterized. Using a fluorescence in situ hybridization technique, the DBP locus was assigned to chromosome 19q13, and TEF to chromosome 22q13. Each assignment was confirmed by means of human chromosome segregation in somatic cellmore » hybrids. Coding sequences of DBP and TEF, extending beyond the bZIP domain to the PAR region, were highly conserved in both human-human and interspecies comparisons. Conservation of the exon-intron boundaries of each bZIP domain-encoding exon suggested derivation from a common ancestral gene. DBP and TEF mRNAs were expressed in all tissues and cell lines examined, including brain, lung, liver, spleen, and kidney. Knowledge of the human chromosome locations of these PAR proteins will facilitate studies to assess their involvement in carcinogenesis and other fundamental biological processes. 37 refs., 5 figs., 1 tab.« less
Alternative splicing and promoter use in TFII-I genes.

PubMed

Makeyev, Aleksandr V; Bayarsaihan, Dashzeveg

2009-03-15

TFII-I proteins are ubiquitously expressed transcriptional factors involved in both basal transcription and signal transduction activation or repression. TFII-I proteins are detected as early as at two-cell stage and exhibit distinct and dynamic expression patterns in developing embryos as well as mark regional variation in the adult mouse brain. Analysis of atypical small and rare chromosomal deletions at 7q11.23 points to TFII-I genes (GTF2I and GTF2IRD1) as the prime candidates responsible for craniofacial and cognitive abnormalities in the Williams-Beuren syndrome. TFII-I genes are often subjected to alternative splicing, which generates isoforms that show different activities and play distinct biological roles. The coding regions of TFII-I genes are composed of more than 30 exons and are well conserved among vertebrates. However, their 5' untranslated regions are not as well conserved and all poorly characterized. In the present work, we analyzed promoter regions of TFII-I genes and described their additional exons, as well as tested tissue specificity of both previously reported and novel alternatively spliced isoforms. Our comprehensive analysis leads to further elucidation of the functional heterogeneity of TFII-I proteins, provides hints on search for regulatory pathways governing their expression, and opens up possibilities for examining the effect of different haplotypes on their promoter functions.
Crystal structure at 2.8Å of Huntingtin-interacting protein 1 (HIP1) coiled-coil domain reveals a charged surface suitable for HIP-protein interactor (HIPPI)

PubMed Central

Niu, Qian; Ybe, Joel A.

2008-01-01

Summary Huntington’s disease is a genetic neurological disorder that is triggered by the dissociation of the huntingtin protein (htt) from its obligate interaction partner Huntingtin-interacting protein 1 (HIP1). The release of htt permits HIP-protein interactor (HIPPI) to bind to its recognition site on HIP1 to form a HIPPI/HIP1 complex that recruits Procaspase-8 to begin the process of apoptosis. The interaction module between HIPPI and HIP1 was predicted to resemble a death-effector domain (DED). Our 2.8 Å crystal structure of the HIP1 371-481 sub-fragment that includes F432 and K474 important for HIPPI binding is not a DED, but is a partially opened coiled-coil. The HIP1 371-481 model reveals a basic surface we hypothesize is suitable for binding HIPPI. There is an opened region next to the putative HIPPI site that is highly negatively charged. The acidic residues in this region are highly conserved in HIP1 and a related protein, HIP1R from different organisms, but are not conserved in the yeast homolog of HIP1, sla2p. We have modeled ∼85% of the coiled-coil domain by joining our new HIP1 371-481 structure to the HIP1 482-586 model (PDB code: 2NO2). Finally, the middle of this coiled-coil domain may be intrinsically flexible and suggests a new interaction model where HIPPI binds to a “U” shaped HIP1 molecule. PMID:18155047
Arthropod phylogenetics in light of three novel millipede (myriapoda: diplopoda) mitochondrial genomes with comments on the appropriateness of mitochondrial genome sequence data for inferring deep level relationships.

PubMed

Brewer, Michael S; Swafford, Lynn; Spruill, Chad L; Bond, Jason E

2013-01-01

Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the resulting tree topologies as suspect. As such, these data are likely inappropriate for investigating such ancient relationships.
The Murine Norovirus Core Subgenomic RNA Promoter Consists of a Stable Stem-Loop That Can Direct Accurate Initiation of RNA Synthesis

PubMed Central

Yunus, Muhammad Amir; Lin, Xiaoyan; Bailey, Dalan; Karakasiliotis, Ioannis; Chaudhry, Yasmin; Vashist, Surender; Zhang, Guo; Thorne, Lucy; Kao, C. Cheng

2014-01-01

ABSTRACT All members of the Caliciviridae family of viruses produce a subgenomic RNA during infection. The subgenomic RNA typically encodes only the major and minor capsid proteins, but in murine norovirus (MNV), the subgenomic RNA also encodes the VF1 protein, which functions to suppress host innate immune responses. To date, the mechanism of norovirus subgenomic RNA synthesis has not been characterized. We have previously described the presence of an evolutionarily conserved RNA stem-loop structure on the negative-sense RNA, the complementary sequence of which codes for the viral RNA-dependent RNA polymerase (NS7). The conserved stem-loop is positioned 6 nucleotides 3′ of the start site of the subgenomic RNA in all caliciviruses. We demonstrate that the conserved stem-loop is essential for MNV viability. Mutant MNV RNAs with substitutions in the stem-loop replicated poorly until they accumulated mutations that revert to restore the stem-loop sequence and/or structure. The stem-loop sequence functions in a noncoding context, as it was possible to restore the replication of an MNV mutant by introducing an additional copy of the stem-loop between the NS7- and VP1-coding regions. Finally, in vitro biochemical data suggest that the stem-loop sequence is sufficient for the initiation of viral RNA synthesis by the recombinant MNV RNA-dependent RNA polymerase, confirming that the stem-loop forms the core of the norovirus subgenomic promoter. IMPORTANCE Noroviruses are a significant cause of viral gastroenteritis, and it is important to understand the mechanism of norovirus RNA synthesis. Here we describe the identification of an RNA stem-loop structure that functions as the core of the norovirus subgenomic RNA promoter in cells and in vitro. This work provides new insights into the molecular mechanisms of norovirus RNA synthesis and the sequences that determine the recognition of viral RNA by the RNA-dependent RNA polymerase. PMID:25392209
The murine norovirus core subgenomic RNA promoter consists of a stable stem-loop that can direct accurate initiation of RNA synthesis.

PubMed

Yunus, Muhammad Amir; Lin, Xiaoyan; Bailey, Dalan; Karakasiliotis, Ioannis; Chaudhry, Yasmin; Vashist, Surender; Zhang, Guo; Thorne, Lucy; Kao, C Cheng; Goodfellow, Ian

2015-01-15

All members of the Caliciviridae family of viruses produce a subgenomic RNA during infection. The subgenomic RNA typically encodes only the major and minor capsid proteins, but in murine norovirus (MNV), the subgenomic RNA also encodes the VF1 protein, which functions to suppress host innate immune responses. To date, the mechanism of norovirus subgenomic RNA synthesis has not been characterized. We have previously described the presence of an evolutionarily conserved RNA stem-loop structure on the negative-sense RNA, the complementary sequence of which codes for the viral RNA-dependent RNA polymerase (NS7). The conserved stem-loop is positioned 6 nucleotides 3' of the start site of the subgenomic RNA in all caliciviruses. We demonstrate that the conserved stem-loop is essential for MNV viability. Mutant MNV RNAs with substitutions in the stem-loop replicated poorly until they accumulated mutations that revert to restore the stem-loop sequence and/or structure. The stem-loop sequence functions in a noncoding context, as it was possible to restore the replication of an MNV mutant by introducing an additional copy of the stem-loop between the NS7- and VP1-coding regions. Finally, in vitro biochemical data suggest that the stem-loop sequence is sufficient for the initiation of viral RNA synthesis by the recombinant MNV RNA-dependent RNA polymerase, confirming that the stem-loop forms the core of the norovirus subgenomic promoter. Noroviruses are a significant cause of viral gastroenteritis, and it is important to understand the mechanism of norovirus RNA synthesis. Here we describe the identification of an RNA stem-loop structure that functions as the core of the norovirus subgenomic RNA promoter in cells and in vitro. This work provides new insights into the molecular mechanisms of norovirus RNA synthesis and the sequences that determine the recognition of viral RNA by the RNA-dependent RNA polymerase. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
The complete mitochondrial genome of eastern lowland gorilla, Gorilla beringei graueri, and comparative mitochondrial genomics of Gorilla species.

PubMed

Hu, Xiao-di; Gao, Li-zhi

2016-01-01

In this study, we determined the complete mitochondrial (mt) genome of eastern lowland gorilla, Gorilla beringei graueri for the first time. The total genome was 16,416 bp in length. It contained a total of 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes and 1 control region (D-loop region). The base composition was A (30.88%), G (13.10%), C (30.89%) and T (25.13%), indicating that the percentage of A+T (56.01%) was higher than G+C (43.99%). Comparisons with the other publicly available Gorilla mitogenome showed the conservation of gene order and base compositions but a bunch of nucleotide diversity. This complete mitochondrial genome sequence will provide valuable genetic information for further studies on conservation genetics of eastern lowland gorilla.
Conserved expression of transposon-derived non-coding transcripts in primate stem cells.

PubMed

Ramsay, LeeAnn; Marchetto, Maria C; Caron, Maxime; Chen, Shu-Huang; Busche, Stephan; Kwan, Tony; Pastinen, Tomi; Gage, Fred H; Bourque, Guillaume

2017-02-28

A significant portion of expressed non-coding RNAs in human cells is derived from transposable elements (TEs). Moreover, it has been shown that various long non-coding RNAs (lncRNAs), which come from the human endogenous retrovirus subfamily H (HERVH), are not only expressed but required for pluripotency in human embryonic stem cells (hESCs). To identify additional TE-derived functional non-coding transcripts, we generated RNA-seq data from induced pluripotent stem cells (iPSCs) of four primate species (human, chimpanzee, gorilla, and rhesus) and searched for transcripts whose expression was conserved. We observed that about 30% of TE instances expressed in human iPSCs had orthologous TE instances that were also expressed in chimpanzee and gorilla. Notably, our analysis revealed a number of repeat families with highly conserved expression profiles including HERVH but also MER53, which is known to be the source of a placental-specific family of microRNAs (miRNAs). We also identified a number of repeat families from all classes of TEs, including MLT1-type and Tigger families, that contributed a significant amount of sequence to primate lncRNAs whose expression was conserved. Together, these results describe TE families and TE-derived lncRNAs whose conserved expression patterns can be used to identify what are likely functional TE-derived non-coding transcripts in primate iPSCs.
Computational RNomics of Drosophilids

PubMed Central

Rose, Dominic; Hackermüller, Jörg; Washietl, Stefan; Reiche, Kristin; Hertel, Jana; Findeiß, Sven; Stadler, Peter F; Prohaska, Sonja J

2007-01-01

Background Recent experimental and computational studies have provided overwhelming evidence for a plethora of diverse transcripts that are unrelated to protein-coding genes. One subclass consists of those RNAs that require distinctive secondary structure motifs to exert their biological function and hence exhibit distinctive patterns of sequence conservation characteristic for positive selection on RNA secondary structure. The deep-sequencing of 12 drosophilid species coordinated by the NHGRI provides an ideal data set of comparative computational approaches to determine those genomic loci that code for evolutionarily conserved RNA motifs. This class of loci includes the majority of the known small ncRNAs as well as structured RNA motifs in mRNAs. We report here on a genome-wide survey using RNAz. Results We obtain 16 000 high quality predictions among which we recover the majority of the known ncRNAs. Taking a pessimistically estimated false discovery rate of 40% into account, this implies that at least some ten thousand loci in the Drosophila genome show the hallmarks of stabilizing selection action of RNA structure, and hence are most likely functional at the RNA level. A subset of RNAz predictions overlapping with TRF1 and BRF binding sites [Isogai et al., EMBO J. 26: 79–89 (2007)], which are plausible candidates of Pol III transcripts, have been studied in more detail. Among these sequences we identify several "clusters" of ncRNA candidates with striking structural similarities. Conclusion The statistical evaluation of the RNAz predictions in comparison with a similar analysis of vertebrate genomes [Washietl et al., Nat. Biotech. 23: 1383–1390 (2005)] shows that qualitatively similar fractions of structured RNAs are found in introns, UTRs, and intergenic regions. The intergenic RNA structures, however, are concentrated much more closely around known protein-coding loci, suggesting that flies have significantly smaller complement of independent structured ncRNAs compared to mammals. PMID:17996037
FIST: a sensory domain for diverse signal transduction pathways in prokaryotes and ubiquitin signaling in eukaryotes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Borziak, Kirill; Jouline, Igor B

2007-01-01

Motivation: Sensory domains that are conserved among Bacteria, Archaea and Eucarya are important detectors of common signals detected by living cells. Due to their high sequence divergence, sensory domains are difficult to identify. We systematically look for novel sensory domains using sensitive profile-based searches initi-ated with regions of signal transduction proteins where no known domains can be identified by current domain models. Results: Using profile searches followed by multiple sequence alignment, structure prediction, and domain architecture analysis, we have identified a novel sensory domain termed FIST, which is present in signal transduction proteins from Bacteria, Archaea and Eucarya. Remote similaritymore » to a known ligand-binding fold and chromosomal proximity of FIST-encoding genes to those coding for proteins involved in amino acid metabolism and transport suggest that FIST domains bind small ligands, such as amino acids.« less
The PL6-Family Plasmids of Haloquadratum Are Virus-Related.

PubMed

Dyall-Smith, Mike; Pfeiffer, Friedhelm

2018-01-01

Plasmids PL6A and PL6B are both carried by the C23 T strain of the square archaeon Haloquadratum walsbyi , and are closely related (76% nucleotide identity), circular, about 6 kb in size, and display the same gene synteny. They are unrelated to other known plasmids and all of the predicted proteins are cryptic in function. Here we describe two additional PL6-related plasmids, pBAJ9-6 and pLT53-7, each carried by distinct isolates of Haloquadratum walsbyi that were recovered from hypersaline waters in Australia. A third PL6-like plasmid, pLTMV-6, was assembled from metavirome data from Lake Tyrell, a salt-lake in Victoria, Australia. Comparison of all five plasmids revealed a distinct plasmid family with strong conservation of gene content and synteny, an average size of 6.2 kb (range 5.8-7.0 kb) and pairwise similarities between 61-79%. One protein (F3) was closely similar to a protein carried by betapleolipoviruses while another (R6) was similar to a predicted AAA-ATPase of His 1 halovirus (His1V_gp16). Plasmid pLT53-7 carried a gene for a FkbM family methyltransferase that was not present in any of the other plasmids. Comparative analysis of all PL6-like plasmids provided better resolution of conserved sequences and coding regions, confirmed the strong link to haloviruses, and showed that their sequences are highly conserved among examples from Haloquadratum isolates and metagenomic data that collectively cover geographically distant locations, indicating that these genetic elements are widespread.
Identifying functionally informative evolutionary sequence profiles.

PubMed

Gil, Nelson; Fiser, Andras

2018-04-15

Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.
ProfileGrids: a sequence alignment visualization paradigm that avoids the limitations of Sequence Logos

PubMed Central

2014-01-01

Background The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. Results The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. Conclusions The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org. PMID:25237393
Characterization of the Aspergillus nidulans aspnd1 gene demonstrates that the ASPND1 antigen, which it encodes, and several Aspergillus fumigatus immunodominant antigens belong to the same family.

PubMed Central

Calera, J A; Ovejero, M C; López-Medrano, R; Segurado, M; Puente, P; Leal, F

1997-01-01

For the first time, an immunodominant Aspergillus nidulans antigen (ASPND1) consistently reactive with serum samples from aspergilloma patients has been purified and characterized, and its coding gene (aspnd1) has been cloned and sequenced. ASPND1 is a glycoprotein with four N-glycosidically-bound sugar chains (around 2.1 kDa each) which are not necessary for reactivity with immune human sera. The polypeptide part is synthesized as a 277-amino-acid precursor of 30.6 kDa that after cleavage of a putative signal peptide of 16 amino acids, affords a mature protein of 261 amino acids with a molecular mass of 29 kDa and a pI of 4.24 (as deduced from the sequence). The ASPND1 protein is 53.1% identical to the AspfII allergen from Aspergillus fumigatus and 48% identical to an unpublished Candida albicans antigen. All of the cysteine residues and most of the glycosylation sites are perfectly conserved in the three proteins, suggesting a similar but yet unknown function. Analysis of the primary structure of the ASPND1 coding gene (aspnd1) has allowed the establishment of a clear relationship between several previously reported A. fumigatus and A. nidulans immunodominant antigens. PMID:9119471
Measles virus minigenomes encoding two autofluorescent proteins reveal cell-to-cell variation in reporter expression dependent on viral sequences between the transcription units.

PubMed

Rennick, Linda J; Duprex, W Paul; Rima, Bert K

2007-10-01

Transcription from morbillivirus genomes commences at a single promoter in the 3' non-coding terminus, with the six genes being transcribed sequentially. The 3' and 5' untranslated regions (UTRs) of the genes (mRNA sense), together with the intergenic trinucleotide spacer, comprise the non-coding sequences (NCS) of the virus and contain the conserved gene end and gene start signals, respectively. Bicistronic minigenomes containing transcription units (TUs) encoding autofluorescent reporter proteins separated by measles virus (MV) NCS were used to give a direct estimation of gene expression in single, living cells by assessing the relative amounts of each fluorescent protein in each cell. Initially, five minigenomes containing each of the MV NCS were generated. Assays were developed to determine the amount of each fluorescent protein in cells at both cell population and single-cell levels. This revealed significant variations in gene expression between cells expressing the same NCS-containing minigenome. The minigenome containing the M/F NCS produced significantly lower amounts of fluorescent protein from the second TU (TU2), compared with the other minigenomes. A minigenome with a truncated F 5' UTR had increased expression from TU2. This UTR is 524 nt longer than the other MV 5' UTRs. Insertions into the 5' UTR of the enhanced green fluorescent protein gene in the minigenome containing the N/P NCS showed that specific sequences, rather than just the additional length of F 5' UTR, govern this decreased expression from TU2.
A comprehensive catalog of human KRAB-associated zinc finger genes: Insights into the evolutionary history of a large family of transcriptional repressors

PubMed Central

Huntley, Stuart; Baggott, Daniel M.; Hamilton, Aaron T.; Tran-Gyamfi, Mary; Yang, Shan; Kim, Joomyeong; Gordon, Laurie; Branscomb, Elbert; Stubbs, Lisa

2006-01-01

Krüppel-type zinc finger (ZNF) motifs are prevalent components of transcription factor proteins in all eukaryotes. KRAB-ZNF proteins, in which a potent repressor domain is attached to a tandem array of DNA-binding zinc-finger motifs, are specific to tetrapod vertebrates and represent the largest class of ZNF proteins in mammals. To define the full repertoire of human KRAB-ZNF proteins, we searched the genome sequence for key motifs and then constructed and manually curated gene models incorporating those sequences. The resulting gene catalog contains 423 KRAB-ZNF protein-coding loci, yielding alternative transcripts that altogether predict at least 742 structurally distinct proteins. Active rounds of segmental duplication, involving single genes or larger regions and including both tandem and distributed duplication events, have driven the expansion of this mammalian gene family. Comparisons between the human genes and ZNF loci mined from the draft mouse, dog, and chimpanzee genomes not only identified 103 KRAB-ZNF genes that are conserved in mammals but also highlighted a substantial level of lineage-specific change; at least 136 KRAB-ZNF coding genes are primate specific, including many recent duplicates. KRAB-ZNF genes are widely expressed and clustered genes are typically not coregulated, indicating that paralogs have evolved to fill roles in many different biological processes. To facilitate further study, we have developed a Web-based public resource with access to gene models, sequences, and other data, including visualization tools to provide genomic context and interaction with other public data sets. PMID:16606702
Hydration in drug design. 3. Conserved water molecules at the ligand-binding sites of homologous proteins

NASA Astrophysics Data System (ADS)

Poornima, C. S.; Dean, P. M.

1995-12-01

Water molecules are known to play an important rôle in mediating protein-ligand interactions. If water molecules are conserved at the ligand-binding sites of homologous proteins, such a finding may suggest the structural importance of water molecules in ligand binding. Structurally conserved water molecules change the conventional definition of `binding sites' by changing the shape and complementarity of these sites. Such conserved water molecules can be important for site-directed ligand/drug design. Therefore, five different sets of homologous protein/protein-ligand complexes have been examined to identify the conserved water molecules at the ligand-binding sites. Our analysis reveals that there are as many as 16 conserved water molecules at the FAD binding site of glutathione reductase between the crystal structures obtained from human and E. coli. In the remaining four sets of high-resolution crystal structures, 2-4 water molecules have been found to be conserved at the ligand-binding sites. The majority of these conserved water molecules are either bound in deep grooves at the protein-ligand interface or completely buried in cavities between the protein and the ligand. All these water molecules, conserved between the protein/protein-ligand complexes from different species, have identical or similar apolar and polar interactions in a given set. The site residues interacting with the conserved water molecules at the ligand-binding sites have been found to be highly conserved among proteins from different species; they are more conserved compared to the other site residues interacting with the ligand. These water molecules, in general, make multiple polar contacts with protein-site residues.
78 FR 51139 - Notice of Proposed Changes to the National Handbook of Conservation Practices for the Natural...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-08-20

... (Code 324), Field Border (Code 386), Filter Strip (Code 393), Land Smoothing (Code 466), Livestock... the implementation requirement document to the specifications and plans. Filter Strip (Code 393)--The...
Human homolog of the mouse sperm receptor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chamberlin, M.E.; Dean, J.

1990-08-01

The human zona pellucida, composed of three glycoproteins (ZP1, ZP2, and ZP3), forms an extracellular matrix that surrounds ovulated eggs and mediates species-specific fertilization. The genes that code for at least two of the zona proteins (ZP2 and ZP3) cross-hybridize with other mammalian DNA. The recently characterized mouse sperm receptor gene (Zp-3) was used to isolate its human homolog. The human homolog spans {approx}18.3 kilobase pairs (kbp) (compared to 8.6 kbp for the mouse gene) and contains eight exons, the sizes of which are strictly conserved between the two species. Four short (8-15 bp) sequences within the first 250 bpmore » of the 5{prime} flanking region in the human Zp-3 homolog are also present upstream of mouse Zp-3. These elements may modulate oocyte-specific gene expression. By using the polymerase chain reaction, a full-length cDNA of human ZP3 was isolated from human ovarian poly(A){sup +} RNA and used to deduce the structure of human ZP3 mRNA. Certain features of the human and mouse ZP3 transcripts are conserved. Both have unusually short 5{prime} and 3{prime} untranslated regions, both contain a single open reading frame that is 74% identical, and both code for 424 amino acid polypeptides that are 67% the same. The similarity between the two proteins may define domains that are important in maintaining the structural integrity of the zona pellucida, while the differences may play a role in mediating the species-specific events of mammalian fertilization.« less

Small proteins in cyanobacteria provide a paradigm for the functional analysis of the bacterial micro-proteome.

PubMed

Baumgartner, Desiree; Kopf, Matthias; Klähn, Stephan; Steglich, Claudia; Hess, Wolfgang R

2016-11-28

Despite their versatile functions in multimeric protein complexes, in the modification of enzymatic activities, intercellular communication or regulatory processes, proteins shorter than 80 amino acids (μ-proteins) are a systematically underestimated class of gene products in bacteria. Photosynthetic cyanobacteria provide a paradigm for small protein functions due to extensive work on the photosynthetic apparatus that led to the functional characterization of 19 small proteins of less than 50 amino acids. In analogy, previously unstudied small ORFs with similar degrees of conservation might encode small proteins of high relevance also in other functional contexts. Here we used comparative transcriptomic information available for two model cyanobacteria, Synechocystis sp. PCC 6803 and Synechocystis sp. PCC 6714 for the prediction of small ORFs. We found 293 transcriptional units containing candidate small ORFs ≤80 codons in Synechocystis sp. PCC 6803, also including the known mRNAs encoding small proteins of the photosynthetic apparatus. From these transcriptional units, 146 are shared between the two strains, 42 are shared with the higher plant Arabidopsis thaliana and 25 with E. coli. To verify the existence of the respective μ-proteins in vivo, we selected five genes as examples to which a FLAG tag sequence was added and re-introduced them into Synechocystis sp. PCC 6803. These were the previously annotated gene ssr1169, two newly defined genes norf1 and norf4, as well as nsiR6 (nitrogen stress-induced RNA 6) and hliR1(high light-inducible RNA 1) , which originally were considered non-coding. Upon activation of expression via the Cu 2+. responsive petE promoter or from the native promoters, all five proteins were detected in Western blot experiments. The distribution and conservation of these five genes as well as their regulation of expression and the physico-chemical properties of the encoded proteins underline the likely great bandwidth of small protein functions in bacteria and makes them attractive candidates for functional studies.
18 CFR Table 1 to Part 301 - Functionalization and Escalation Codes

Code of Federal Regulations, 2010 CFR

2010-04-01

... 18 Conservation of Power and Water Resources 1 2010-04-01 2010-04-01 false Functionalization and Escalation Codes 1 Table 1 to Part 301 Conservation of Power and Water Resources FEDERAL ENERGY REGULATORY COMMISSION, DEPARTMENT OF ENERGY REGULATIONS FOR FEDERAL POWER MARKETING ADMINISTRATIONS AVERAGE SYSTEM COST...
18 CFR Table 1 to Part 301 - Functionalization and Escalation Codes

Code of Federal Regulations, 2012 CFR

2012-04-01

... 18 Conservation of Power and Water Resources 1 2012-04-01 2012-04-01 false Functionalization and Escalation Codes 1 Table 1 to Part 301 Conservation of Power and Water Resources FEDERAL ENERGY REGULATORY COMMISSION, DEPARTMENT OF ENERGY REGULATIONS FOR FEDERAL POWER MARKETING ADMINISTRATIONS AVERAGE SYSTEM COST...
18 CFR Table 1 to Part 301 - Functionalization and Escalation Codes

Code of Federal Regulations, 2013 CFR

2013-04-01

... 18 Conservation of Power and Water Resources 1 2013-04-01 2013-04-01 false Functionalization and Escalation Codes 1 Table 1 to Part 301 Conservation of Power and Water Resources FEDERAL ENERGY REGULATORY COMMISSION, DEPARTMENT OF ENERGY REGULATIONS FOR FEDERAL POWER MARKETING ADMINISTRATIONS AVERAGE SYSTEM COST...
18 CFR Table 1 to Part 301 - Functionalization and Escalation Codes

Code of Federal Regulations, 2014 CFR

2014-04-01

... 18 Conservation of Power and Water Resources 1 2014-04-01 2014-04-01 false Functionalization and Escalation Codes 1 Table 1 to Part 301 Conservation of Power and Water Resources FEDERAL ENERGY REGULATORY COMMISSION, DEPARTMENT OF ENERGY REGULATIONS FOR FEDERAL POWER MARKETING ADMINISTRATIONS AVERAGE SYSTEM COST...
18 CFR Table 1 to Part 301 - Functionalization and Escalation Codes

Code of Federal Regulations, 2011 CFR

2011-04-01

... 18 Conservation of Power and Water Resources 1 2011-04-01 2011-04-01 false Functionalization and Escalation Codes 1 Table 1 to Part 301 Conservation of Power and Water Resources FEDERAL ENERGY REGULATORY COMMISSION, DEPARTMENT OF ENERGY REGULATIONS FOR FEDERAL POWER MARKETING ADMINISTRATIONS AVERAGE SYSTEM COST...
Characterization and Comparative Profiling of MiRNA Transcriptomes in Bighead Carp and Silver Carp

PubMed Central

Chi, Wei; Tong, Chaobo; Gan, Xiaoni; He, Shunping

2011-01-01

MicroRNAs (miRNAs) are small non-coding RNA molecules that are processed from large ‘hairpin’ precursors and function as post-transcriptional regulators of target genes. Although many individual miRNAs have recently been extensively studied, there has been very little research on miRNA transcriptomes in teleost fishes. By using high throughput sequencing technology, we have identified 167 and 166 conserved miRNAs (belonging to 108 families) in bighead carp (Hypophthalmichthys nobilis) and silver carp (Hypophthalmichthys molitrix), respectively. We compared the expression patterns of conserved miRNAs by means of hierarchical clustering analysis and log2 ratio. Results indicated that there is not a strong correlation between sequence conservation and expression conservation, most of these miRNAs have similar expression patterns. However, high expression differences were also identified for several individual miRNAs. Several miRNA* sequences were also found in our dataset and some of them may have regulatory functions. Two computational strategies were used to identify novel miRNAs from un-annotated data in the two carps. A first strategy based on zebrafish genome, identified 8 and 22 novel miRNAs in bighead carp and silver carp, respectively. We postulate that these miRNAs should also exist in the zebrafish, but the methodologies used have not allowed for their detection. In the second strategy we obtained several carp-specific miRNAs, 31 in bighead carp and 32 in silver carp, which showed low expression. Gain and loss of family members were observed in several miRNA families, which suggests that duplication of animal miRNA genes may occur through evolutionary processes which are similar to the protein-coding genes. PMID:21858165
Robust prediction of consensus secondary structures using averaged base pairing probability matrices.

PubMed

Kiryu, Hisanori; Kin, Taishin; Asai, Kiyoshi

2007-02-15

Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures. We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs. The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/. Supplementary data are available at Bioinformatics online.
Reverse genetics of Mononegavirales: How they work, new vaccines, and new cancer therapeutics

PubMed Central

Pfaller, Christian K.; Cattaneo, Roberto; Schnell, Matthias J.

2015-01-01

The order Mononegavirales includes five families: Bornaviridae, Filoviridae, Nyamaviridae, Paramyxoviridae, and Rhabdoviridae. The genome of these viruses is one molecule of negative-sense single strand RNA coding for five to ten genes in a conserved order. The RNA is not infectious until packaged by the nucleocapsid protein and transcribed by the polymerase and co-factors. Reverse genetics approaches have answered fundamental questions about the biology of Mononegavirales. The lack of icosahedral symmetry and modular organization in the genome of these viruses has facilitated engineering of viruses expressing fluorescent proteins, and these fluorescent proteins have provided important insights about the molecular and cellular basis of tissue tropism and pathogenesis. Studies have assessed the relevance for virulence of different receptors and the interactions with cellular proteins governing the innate immune responses. Research has also analyzed the mechanisms of attenuation. Based on these findings, ongoing clinical trials are exploring new live attenuated vaccines and the use of viruses re-engineered as cancer therapeutics. PMID:25702088
Evolutionary dynamics of Newcastle disease virus

USGS Publications Warehouse

Miller, P.J.; Kim, L.M.; Ip, Hon S.; Afonso, C.L.

2009-01-01

A comprehensive dataset of NDV genome sequences was evaluated using bioinformatics to characterize the evolutionary forces affecting NDV genomes. Despite evidence of recombination in most genes, only one event in the fusion gene of genotype V viruses produced evolutionarily viable progenies. The codon-associated rate of change for the six NDV proteins revealed that the highest rate of change occurred at the fusion protein. All proteins were under strong purifying (negative) selection; the fusion protein displayed the highest number of amino acids under positive selection. Regardless of the phylogenetic grouping or the level of virulence, the cleavage site motif was highly conserved implying that mutations at this site that result in changes of virulence may not be favored. The coding sequence of the fusion gene and the genomes of viruses from wild birds displayed higher yearly rates of change in virulent viruses than in viruses of low virulence, suggesting that an increase in virulence may accelerate the rate of NDV evolution. ?? 2009 Elsevier Inc.
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

PubMed Central

Pruitt, Kim D.; Tatusova, Tatiana; Maglott, Donna R.

2005-01-01

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. PMID:15608248
Characterization of the complete genome segments from BmCPV-SZ, a novel Bombyx mori cypovirus 1 isolate.

PubMed

Cao, Guangli; Meng, Xiangkun; Xue, Renyu; Zhu, Yuexiong; Zhang, Xiaorong; Pan, Zhonghua; Zheng, Xiaojian; Gong, Chengliang

2012-07-01

A novel Bombyx mori cypovirus 1 isolated from infected silkworm larvae and tentatively assigned as Bombyx mori cypovirus 1 isolate Suzhou (BmCPV-SZ). The complete nucleotide sequences of genomic segments S1-S10 from BmCPV-SZ were determined. All segments possessed a single open reading frame; however, bioinformatic evidence suggested a short overlapping coding sequence in S1. Each BmCPV-SZ segment possessed the conserved terminal sequences AGUAA and GUUAGCC at the 5' and 3' ends, respectively. The conserved A/G at the -3 position in relation to the AUG codon could be found in the BmCPV-SZ genome, and it was postulated that this conserved A/G may be the most important nucleotide for efficient translation initiation in cypoviruses (CPVs). Examination of the putative amino acid sequences encoded by BmCPV-SZ revealed some characteristic motifs. Homology searches showed that viral structural proteins VP1, VP3, and VP4 had localized homologies with proteins of Rice ragged stunt virus , a member of the genus Oryzavirus within the family Reoviridae. A phylogenetic tree based on RNA-dependent RNA polymerase sequences demonstrated that CPV is more closely related to Rice ragged stunt virus and Aedes pseudoscutellaris reovirus than to other members of Reoviridae, suggesting that they may have originated from common ancestors.
Comparative genomics reveals conservation of filaggrin and loss of caspase-14 in dolphins.

PubMed

Strasser, Bettina; Mlitz, Veronika; Fischer, Heinz; Tschachler, Erwin; Eckhart, Leopold

2015-05-01

The expression of filaggrin and its stepwise proteolytic degradation are critical events in the terminal differentiation of epidermal keratinocytes and in the formation of the skin barrier to the environment. Here, we investigated whether the evolutionary transition from a terrestrial to a fully aquatic lifestyle of cetaceans, that is dolphins and whales, has been associated with changes in genes encoding filaggrin and proteins involved in the processing of filaggrin. We used comparative genomics, PCRs and re-sequencing of gene segments to screen for the presence and integrity of genes coding for filaggrin and proteases implicated in the maturation of (pro)filaggrin. Filaggrin has been conserved in dolphins (bottlenose dolphin, orca and baiji) but has been lost in whales (sperm whale and minke whale). All other S100 fused-type genes have been lost in cetaceans. Among filaggrin-processing proteases, aspartic peptidase retroviral-like 1 (ASPRV1), also known as saspase, has been conserved, whereas caspase-14 has been lost in all cetaceans investigated. In conclusion, our results suggest that filaggrin is dispensable for the acquisition of fully aquatic lifestyles of whales, whereas it appears to confer an evolutionary advantage to dolphins. The discordant evolution of filaggrin, saspase and caspase-14 in cetaceans indicates that the biological roles of these proteins are not strictly interdependent. © 2015 The Authors. Experimental Dermatology Published by John Wiley & Sons Ltd.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

PubMed

Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

2017-04-15

Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. Copyright © 2017 Sinclair et al.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

PubMed Central

Sinclair, Robert M.; Ravantti, Janne J.

2017-01-01

ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. PMID:28122979
The Hypothesis that the Genetic Code Originated in Coupled Synthesis of Proteins and the Evolutionary Predecessors of Nucleic Acids in Primitive Cells

PubMed Central

Francis, Brian R.

2015-01-01

Although analysis of the genetic code has allowed explanations for its evolution to be proposed, little evidence exists in biochemistry and molecular biology to offer an explanation for the origin of the genetic code. In particular, two features of biology make the origin of the genetic code difficult to understand. First, nucleic acids are highly complicated polymers requiring numerous enzymes for biosynthesis. Secondly, proteins have a simple backbone with a set of 20 different amino acid side chains synthesized by a highly complicated ribosomal process in which mRNA sequences are read in triplets. Apparently, both nucleic acid and protein syntheses have extensive evolutionary histories. Supporting these processes is a complex metabolism and at the hub of metabolism are the carboxylic acid cycles. This paper advances the hypothesis that the earliest predecessor of the nucleic acids was a β-linked polyester made from malic acid, a highly conserved metabolite in the carboxylic acid cycles. In the β-linked polyester, the side chains are carboxylic acid groups capable of forming interstrand double hydrogen bonds. Evolution of the nucleic acids involved changes to the backbone and side chain of poly(β-d-malic acid). Conversion of the side chain carboxylic acid into a carboxamide or a longer side chain bearing a carboxamide group, allowed information polymers to form amide pairs between polyester chains. Aminoacylation of the hydroxyl groups of malic acid and its derivatives with simple amino acids such as glycine and alanine allowed coupling of polyester synthesis and protein synthesis. Use of polypeptides containing glycine and l-alanine for activation of two different monomers with either glycine or l-alanine allowed simple coded autocatalytic synthesis of polyesters and polypeptides and established the first genetic code. A primitive cell capable of supporting electron transport, thioester synthesis, reduction reactions, and synthesis of polyesters and polypeptides is proposed. The cell consists of an iron-sulfide particle enclosed by tholin, a heterogeneous organic material that is produced by Miller-Urey type experiments that simulate conditions on the early Earth. As the synthesis of nucleic acids evolved from β-linked polyesters, the singlet coding system for replication evolved into a four nucleotide/four amino acid process (AMP = aspartic acid, GMP = glycine, UMP = valine, CMP = alanine) and then into the triplet ribosomal process that permitted multiple copies of protein to be synthesized independent of replication. This hypothesis reconciles the “genetics first” and “metabolism first” approaches to the origin of life and explains why there are four bases in the genetic alphabet. PMID:25679748
Presence of tannins in sorghum grains is conditioned by different natural alleles of Tannin1

PubMed Central

Wu, Yuye; Li, Xianran; Xiang, Wenwen; Zhu, Chengsong; Lin, Zhongwei; Wu, Yun; Li, Jiarui; Pandravada, Satchidanand; Ridder, Dustan D.; Bai, Guihua; Wang, Ming L.; Trick, Harold N.; Bean, Scott R.; Tuinstra, Mitchell R.; Tesso, Tesfaye T.; Yu, Jianming

2012-01-01

Sorghum, an ancient old-world cereal grass, is the dietary staple of over 500 million people in more than 30 countries in the tropics and semitropics. Its C4 photosynthesis, drought resistance, wide adaptation, and high nutritional value hold the promise to alleviate hunger in Africa. Not present in other major cereals, such as rice, wheat, and maize, condensed tannins (proanthocyanidins) in the pigmented testa of some sorghum cultivars have been implicated in reducing protein digestibility but recently have been shown to promote human health because of their high antioxidant capacity and ability to fight obesity through reduced digestion. Combining quantitative trait locus mapping, meta-quantitative trait locus fine-mapping, and association mapping, we showed that the nucleotide polymorphisms in the Tan1 gene, coding a WD40 protein, control the tannin biosynthesis in sorghum. A 1-bp G deletion in the coding region, causing a frame shift and a premature stop codon, led to a nonfunctional allele, tan1-a. Likewise, a different 10-bp insertion resulted in a second nonfunctional allele, tan1-b. Transforming the sorghum Tan1 ORF into a nontannin Arabidopsis mutant restored the tannin phenotype. In addition, reduction in nucleotide diversity from wild sorghum accessions to landraces and cultivars was found at the region that codes the highly conserved WD40 repeat domains and the C-terminal region of the protein. Genetic research in crops, coupled with nutritional and medical research, could open the possibility of producing different levels and combinations of phenolic compounds to promote human health. PMID:22699509
MultitaskProtDB: a database of multitasking proteins

PubMed Central

Hernández, Sergio; Ferragut, Gabriela; Amela, Isaac; Perez-Pons, JosepAntoni; Piñol, Jaume; Mozo-Villarias, Angel; Cedano, Juan; Querol, Enrique

2014-01-01

We have compiled MultitaskProtDB, available online at http://wallace.uab.es/multitask, to provide a repository where the many multitasking proteins found in the literature can be stored. Multitasking or moonlighting is the capability of some proteins to execute two or more biological functions. Usually, multitasking proteins are experimentally revealed by serendipity. This ability of proteins to perform multitasking functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Even so, the study of this phenomenon is complex because, among other things, there is no database of moonlighting proteins. The existence of such a tool facilitates the collection and dissemination of these important data. This work reports the database, MultitaskProtDB, which is designed as a friendly user web page containing >288 multitasking proteins with their NCBI and UniProt accession numbers, canonical and additional biological functions, monomeric/oligomeric states, PDB codes when available and bibliographic references. This database also serves to gain insight into some characteristics of multitasking proteins such as frequencies of the different pairs of functions, phylogenetic conservation and so forth. PMID:24253302
A DEK Domain-Containing Protein Modulates Chromatin Structure and Function in Arabidopsis[W][OPEN

PubMed Central

Waidmann, Sascha; Kusenda, Branislav; Mayerhofer, Juliane; Mechtler, Karl; Jonak, Claudia

2014-01-01

Chromatin is a major determinant in the regulation of virtually all DNA-dependent processes. Chromatin architectural proteins interact with nucleosomes to modulate chromatin accessibility and higher-order chromatin structure. The evolutionarily conserved DEK domain-containing protein is implicated in important chromatin-related processes in animals, but little is known about its DNA targets and protein interaction partners. In plants, the role of DEK has remained elusive. In this work, we identified DEK3 as a chromatin-associated protein in Arabidopsis thaliana. DEK3 specifically binds histones H3 and H4. Purification of other proteins associated with nuclear DEK3 also established DNA topoisomerase 1α and proteins of the cohesion complex as in vivo interaction partners. Genome-wide mapping of DEK3 binding sites by chromatin immunoprecipitation followed by deep sequencing revealed enrichment of DEK3 at protein-coding genes throughout the genome. Using DEK3 knockout and overexpressor lines, we show that DEK3 affects nucleosome occupancy and chromatin accessibility and modulates the expression of DEK3 target genes. Furthermore, functional levels of DEK3 are crucial for stress tolerance. Overall, data indicate that DEK3 contributes to modulation of Arabidopsis chromatin structure and function. PMID:25387881
18 CFR 410.1 - Basin regulations-Water Code and Administrative Manual-Part III Water Quality Regulations.

Code of Federal Regulations, 2011 CFR

2011-04-01

... 18 Conservation of Power and Water Resources 2 2011-04-01 2011-04-01 false Basin regulations-Water Code and Administrative Manual-Part III Water Quality Regulations. 410.1 Section 410.1 Conservation of Power and Water Resources DELAWARE RIVER BASIN COMMISSION ADMINISTRATIVE MANUAL BASIN REGULATIONS; WATER...

Recommendations on Implementing the Energy Conservation Building Code in Rajasthan, India

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yu, Sha; Makela, Eric J.; Evans, Meredydd

India launched the Energy Conservation Building Code (ECBC) in 2007 and Indian Bureau of Energy Efficiency (BEE) recently indicated that it would move to mandatory implementation in the 12th Five-Year Plan. The State of Rajasthan adopted ECBC with minor modifications; the new regulation is known as the Energy Conservation Building Directives – Rajasthan 2011 (ECBD-R). It became mandatory in Rajasthan on September 28, 2011. This report provides recommendations on an ECBD-R enforcement roadmap for the State of Rajasthan.
EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data.

PubMed

Linard, Benjamin; Nguyen, Ngoc Hoan; Prosdocimi, Francisco; Poch, Olivier; Thompson, Julie D

2012-01-01

Evolutionary systems biology aims to uncover the general trends and principles governing the evolution of biological networks. An essential part of this process is the reconstruction and analysis of the evolutionary histories of these complex, dynamic networks. Unfortunately, the methodologies for representing and exploiting such complex evolutionary histories in large scale studies are currently limited. Here, we propose a new formalism, called EvoluCode (Evolutionary barCode), which allows the integration of different evolutionary parameters (eg, sequence conservation, orthology, synteny …) in a unifying format and facilitates the multilevel analysis and visualization of complex evolutionary histories at the genome scale. The advantages of the approach are demonstrated by constructing barcodes representing the evolution of the complete human proteome. Two large-scale studies are then described: (i) the mapping and visualization of the barcodes on the human chromosomes and (ii) automatic clustering of the barcodes to highlight protein subsets sharing similar evolutionary histories and their functional analysis. The methodologies developed here open the way to the efficient application of other data mining and knowledge extraction techniques in evolutionary systems biology studies. A database containing all EvoluCode data is available at: http://lbgi.igbmc.fr/barcodes.
The ribonucleoprotein Csr network.

PubMed

Seyll, Ethel; Van Melderen, Laurence

2013-11-08

Ribonucleoprotein complexes are essential regulatory components in bacteria. In this review, we focus on the carbon storage regulator (Csr) network, which is well conserved in the bacterial world. This regulatory network is composed of the CsrA master regulator, its targets and regulators. CsrA binds to mRNA targets and regulates translation either negatively or positively. Binding to small non-coding RNAs controls activity of this protein. Expression of these regulators is tightly regulated at the level of transcription and stability by various global regulators (RNAses, two-component systems, alarmone). We discuss the implications of these complex regulations in bacterial adaptation.
Editing of the grapevine mitochondrial cytochrome b mRNA and molecular modeling of the protein.

PubMed

Islas-Osuna, María A; Silva-Moreno, Begonia; Caceres-Carrizosa, Nidia; García-Robles, Jesús M; Sotelo-Mundo, Rogerio R; Yepiz-Plascencia, Gloria M

2006-05-01

Cytochrome b (COB), the central catalytic subunit of ubiquinol cytochrome c reductase, is a component of the transmembrane electron transfer chain that generates proton motive force. Some plant COB mRNAs are processed by RNA editing, which changes the gene coding sequence. This report presents the sequences of the grapevine (Vitis vinifera L.) mitochondrial gene for apocytochrome b (cob), the edited mRNA and the deduced protein. Grapevine COB is 393 amino acids long and is 98% identical to homologs in rapeseed, Arabidopsis thaliana and Oenothera sp. Twenty-one C-U editing sites were identified in the grapevine cob mRNA, resulting in 20 amino acid changes. These changes increase the overall hydrophobicity of the protein and result in a more conserved protein. Molecular modeling of grapevine COB shows that residues changed by RNA editing fit the secondary structure characteristic of an integral membrane protein. This is the first complete mitochondrial gene reported for grapevine. Novel RNA editing sites were identified in grapevine cob, which have not been previously reported for other plants.
Spliced X-box Binding Protein 1 Couples the Unfolded Protein Response to Hexosamine Biosynthetic Pathway

PubMed Central

Wang, Zhao V.; Deng, Yingfeng; Gao, Ningguo; Pedrozo, Zully; Li, Dan L.; Morales, Cyndi R.; Criollo, Alfredo; Luo, Xiang; Tan, Wei; Jiang, Nan; Lehrman, Mark A.; Rothermel, Beverly A.; Lee, Ann-Hwee; Lavandero, Sergio; Mammen, Pradeep P.A.; Ferdous, Anwarul; Gillette, Thomas G.; Scherer, Philipp E.; Hill, Joseph A.

2014-01-01

SUMMARY The hexosamine biosynthetic pathway (HBP) generates UDP-GlcNAc (uridine diphosphate N-acetylglucosamine) for glycan synthesis and O-linked GlcNAc (O-GlcNAc) protein modifications. Despite the established role of the HBP in metabolism and multiple diseases, regulation of the HBP remains largely undefined. Here, we show that spliced X-box binding protein 1 (Xbp1s), the most conserved signal transducer of the unfolded protein response (UPR), is a direct transcriptional activator of the HBP. We demonstrate that the UPR triggers HBP activation via Xbp1s-dependent transcription of genes coding for key, rate-limiting enzymes. We further establish that this previously unrecognized UPR-HBP axis is triggered in a variety of stress conditions. Finally, we demonstrate a physiologic role for the UPR-HBP axis, by showing that acute stimulation of Xbp1s in heart by ischemia/reperfusion confers robust cardioprotection in part through induction of the HBP. Collectively, these studies reveal that Xbp1s couples the UPR to the HBP to protect cells under stress. PMID:24630721
Spliced X-box binding protein 1 couples the unfolded protein response to hexosamine biosynthetic pathway.

PubMed

Wang, Zhao V; Deng, Yingfeng; Gao, Ningguo; Pedrozo, Zully; Li, Dan L; Morales, Cyndi R; Criollo, Alfredo; Luo, Xiang; Tan, Wei; Jiang, Nan; Lehrman, Mark A; Rothermel, Beverly A; Lee, Ann-Hwee; Lavandero, Sergio; Mammen, Pradeep P A; Ferdous, Anwarul; Gillette, Thomas G; Scherer, Philipp E; Hill, Joseph A

2014-03-13

The hexosamine biosynthetic pathway (HBP) generates uridine diphosphate N-acetylglucosamine (UDP-GlcNAc) for glycan synthesis and O-linked GlcNAc (O-GlcNAc) protein modifications. Despite the established role of the HBP in metabolism and multiple diseases, regulation of the HBP remains largely undefined. Here, we show that spliced X-box binding protein 1 (Xbp1s), the most conserved signal transducer of the unfolded protein response (UPR), is a direct transcriptional activator of the HBP. We demonstrate that the UPR triggers HBP activation via Xbp1s-dependent transcription of genes coding for key, rate-limiting enzymes. We further establish that this previously unrecognized UPR-HBP axis is triggered in a variety of stress conditions. Finally, we demonstrate a physiologic role for the UPR-HBP axis by showing that acute stimulation of Xbp1s in heart by ischemia/reperfusion confers robust cardioprotection in part through induction of the HBP. Collectively, these studies reveal that Xbp1s couples the UPR to the HBP to protect cells under stress. Copyright © 2014 Elsevier Inc. All rights reserved.
Variations in the non-coding transcriptome as a driver of inter-strain divergence and physiological adaptation in bacteria.

PubMed

Kopf, Matthias; Klähn, Stephan; Scholz, Ingeborg; Hess, Wolfgang R; Voß, Björn

2015-04-22

In all studied organisms, a substantial portion of the transcriptome consists of non-coding RNAs that frequently execute regulatory functions. Here, we have compared the primary transcriptomes of the cyanobacteria Synechocystis sp. PCC 6714 and PCC 6803 under 10 different conditions. These strains share 2854 protein-coding genes and a 16S rRNA identity of 99.4%, indicating their close relatedness. Conserved major transcriptional start sites (TSSs) give rise to non-coding transcripts within the sigB gene, from the 5'UTRs of cmpA and isiA, and 168 loci in antisense orientation. Distinct differences include single nucleotide polymorphisms rendering promoters inactive in one of the strains, e.g., for cmpR and for the asRNA PsbA2R. Based on the genome-wide mapped location, regulation and classification of TSSs, non-coding transcripts were identified as the most dynamic component of the transcriptome. We identified a class of mRNAs that originate by read-through from an sRNA that accumulates as a discrete and abundant transcript while also serving as the 5'UTR. Such an sRNA/mRNA structure, which we name 'actuaton', represents another way for bacteria to remodel their transcriptional network. Our findings support the hypothesis that variations in the non-coding transcriptome constitute a major evolutionary element of inter-strain divergence and capability for physiological adaptation.
75 FR 4525 - Notice of Proposed Changes to the National Handbook of Conservation Practices for the Natural...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-01-28

... National Handbook of Conservation Practices for the Natural Resources Conservation Service AGENCY: Natural... National Handbook of Conservation Practices for public review and comment. SUMMARY: Notice is hereby given... Handbook of Conservation Practices. These standards include: Air Filtration and Scrubbing (Code 371...
Huntingtin gene evolution in Chordata and its peculiar features in the ascidian Ciona genus

PubMed Central

Gissi, Carmela; Pesole, Graziano; Cattaneo, Elena; Tartari, Marzia

2006-01-01

Background To gain insight into the evolutionary features of the huntingtin (htt) gene in Chordata, we have sequenced and characterized the full-length htt mRNA in the ascidian Ciona intestinalis, a basal chordate emerging as new invertebrate model organism. Moreover, taking advantage of the availability of genomic and EST sequences, the htt gene structure of a number of chordate species, including the cogeneric ascidian Ciona savignyi, and the vertebrates Xenopus and Gallus was reconstructed. Results The C. intestinalis htt transcript exhibits some peculiar features, such as spliced leader trans-splicing in the 98 nt-long 5' untranslated region (UTR), an alternative splicing in the coding region, eight alternative polyadenylation sites, and no similarities of both 5' and 3'UTRs compared to homologs of the cogeneric C. savignyi. The predicted protein is 2946 amino acids long, shorter than its vertebrate homologs, and lacks the polyQ and the polyP stretches found in the the N-terminal regions of mammalian homologs. The exon-intron organization of the htt gene is almost identical among vertebrates, and significantly conserved between Ciona and vertebrates, allowing us to hypothesize an ancestral chordate gene consisting of at least 40 coding exons. Conclusion During chordate diversification, events of gain/loss, sliding, phase changes, and expansion of introns occurred in both vertebrate and ascidian lineages predominantly in the 5'-half of the htt gene, where there is also evidence of lineage-specific evolutionary dynamics in vertebrates. On the contrary, the 3'-half of the gene is highly conserved in all chordates at the level of both gene structure and protein sequence. Between the two Ciona species, a fast evolutionary rate and/or an early divergence time is suggested by the absence of significant similarity between UTRs, protein divergence comparable to that observed between mammals and fishes, and different distribution of repetitive elements. PMID:17092333
Tests in mice of a dengue vaccine candidate made of chimeric Junin virus-like particles and conserved dengue virus envelope sequences.

PubMed

Mareze, Vania Aparecida; Borio, Cristina Silvia; Bilen, Marcos F; Fleith, Renata; Mirazo, Santiago; Mansur, Daniel Santos; Arbiza, Juan; Lozano, Mario Enrique; Bruña-Romero, Oscar

2016-01-01

Two new vaccine candidates against dengue virus (DENV) infection were generated by fusing the coding sequences of the self-budding Z protein from Junin virus (Z-JUNV) to those of two cryptic peptides (Z/DENV-P1 and Z/DENV-P2) conserved on the envelope protein of all serotypes of DENV. The capacity of these chimeras to generate virus-like particles (VLPs) and to induce virus-neutralizing antibodies in mice was determined. First, recombinant proteins that displayed reactivity with a Z-JUNV-specific serum by immunofluorescence were detected in HEK-293 cells transfected with each of the two plasmids and VLP formation was also observed by transmission electron microscopy. Next, we determined the presence of antibodies against the envelope peptides of DENV in the sera of immunized C57BL/6 mice. Results showed that those animals that received Z/DENV-P2 DNA coding sequences followed by a boost with DENV-P2 synthetic peptides elicited significant specific antibody titers (≥6.400). Finally, DENV plaque-reduction neutralization tests (PRNT) were performed. Although no significant protective effect was observed when using sera of Z/DENV-P1-immunized animals, antibodies raised against vaccine candidate Z/DENV-P2 (diluted 1:320) were able to reduce in over 50 % the number of viral plaques generated by infectious DENV particles. This reduction was comparable to that of the 4G2 DENV-specific monoclonal cross-reactive (all serotypes) neutralizing antibody. We conclude that Z-JUNV-VLP is a valid carrier to induce antibody-mediated immune responses in mice and that Z/DENV-P2 is not only immunogenic but also protective in vitro against infection of cells with DENV, deserving further studies. On the other side, DENV's fusion peptide-derived chimera Z/DENV-P1 did not display similar protective properties.
Atomic structure of the Y complex of the nuclear pore

DOE PAGES

Kelley, Kotaro; Knockenhauer, Kevin E.; Kabachinski, Greg; ...

2015-03-30

The nuclear pore complex (NPC) is the principal gateway for transport into and out of the nucleus. Selectivity is achieved through the hydrogel-like core of the NPC. The structural integrity of the NPC depends on ~15 architectural proteins, which are organized in distinct subcomplexes to form the >40-MDa ring-like structure. In this paper, we present the 4.1-Å crystal structure of a heterotetrameric core element ('hub') of the Y complex, the essential NPC building block, from Myceliophthora thermophila. Using the hub structure together with known Y-complex fragments, we built the entire ~0.5-MDa Y complex. Our data reveal that the conserved coremore » of the Y complex has six rather than seven members. Finally, evolutionarily distant Y-complex assemblies share a conserved core that is very similar in shape and dimension, thus suggesting that there are closely related architectural codes for constructing the NPC in all eukaryotes.« less
The Non-Coding RNA Ncr0700/PmgR1 is Required for Photomixotrophic Growth and the Regulation of Glycogen Accumulation in the Cyanobacterium Synechocystis sp. PCC 6803.

PubMed

de Porcellinis, Alice J; Klähn, Stephan; Rosgaard, Lisa; Kirsch, Rebekka; Gutekunst, Kirstin; Georg, Jens; Hess, Wolfgang R; Sakuragi, Yumiko

2016-10-01

Carbohydrate metabolism is a tightly regulated process in photosynthetic organisms. In the cyanobacterium Synechocystis sp. PCC 6803, the photomixotrophic growth protein A (PmgA) is involved in the regulation of glucose and storage carbohydrate (i.e. glycogen) metabolism, while its biochemical activity and possible factors acting downstream of PmgA are unknown. Here, a genome-wide microarray analysis of a ΔpmgA strain identified the expression of 36 protein-coding genes and 42 non-coding transcripts as significantly altered. From these, the non-coding RNA Ncr0700 was identified as the transcript most strongly reduced in abundance. Ncr0700 is widely conserved among cyanobacteria. In Synechocystis its expression is inversely correlated with light intensity. Similarly to a ΔpmgA mutant, a Δncr0700 deletion strain showed an approximately 2-fold increase in glycogen content under photoautotrophic conditions and wild-type-like growth. Moreover, its growth was arrested by 38 h after a shift to photomixotrophic conditions. Ectopic expression of Ncr0700 in Δncr0700 and ΔpmgA restored the glycogen content and photomixotrophic growth to wild-type levels. These results indicate that Ncr0700 is required for photomixotrophic growth and the regulation of glycogen accumulation, and acts downstream of PmgA. Hence Ncr0700 is renamed here as PmgR1 for photomixotrophic growth RNA 1. © The Author 2016. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.
77 FR 74456 - Notice of Proposed Changes to the National Handbook of Conservation Practices for the Natural...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-12-14

...), Row Arrangement (Code 557), Sprinkler System (Code 442), Tree/Shrub Site Preparation (Code 490), Waste.... Tree/Shrub Site Preparation (Code 490)--Only minor changes were made to the standard including...
Structure, synthesis, and molecular cloning of dermaseptins B, a family of skin peptide antibiotics.

PubMed

Charpentier, S; Amiche, M; Mester, J; Vouille, V; Le Caer, J P; Nicolas, P; Delfour, A

1998-06-12

Analysis of antimicrobial activities that are present in the skin secretions of the South American frog Phyllomedusa bicolor revealed six polycationic (lysine-rich) and amphipathic alpha-helical peptides, 24-33 residues long, termed dermaseptins B1 to B6, respectively. Prepro-dermaseptins B all contain an almost identical signal peptide, which is followed by a conserved acidic propiece, a processing signal Lys-Arg, and a dermaseptin progenitor sequence. The 22-residue signal peptide plus the first 3 residues of the acidic propiece are encoded by conserved nucleotides encompassed by the first coding exon of the dermaseptin genes. The 25-residue amino-terminal region of prepro-dermaseptins B shares 50% identity with the corresponding region of precursors for D-amino acid containing opioid peptides or for antimicrobial peptides originating from the skin of distantly related frog species. The remarkable similarity found between prepro-proteins that encode end products with strikingly different sequences, conformations, biological activities and modes of action suggests that the corresponding genes have evolved through dissemination of a conserved "secretory cassette" exon.
A comprehensive analysis of three Asiatic black bear mitochondrial genomes (subspecies ussuricus, formosanus and mupinensis), with emphasis on the complete mtDNA sequence of Ursus thibetanus ussuricus (Ursidae).

PubMed

Hwang, Dae-Sik; Ki, Jang-Seu; Jeong, Dong-Hyuk; Kim, Bo-Hyun; Lee, Bae-Keun; Han, Sang-Hoon; Lee, Jae-Seong

2008-08-01

In the present paper, we describe the mitochondrial genome sequence of the Asiatic black bear (Ursus thibetanus ussuricus) with particular emphasis on the control region (CR), and compared with mitochondrial genomes on molecular relationships among the bears. The mitochondrial genome sequence of U. thibetanus ussuricus was 16,700 bp in size with mostly conserved structures (e.g. 13 protein-coding, two rRNA genes, 22 tRNA genes). The CR consisted of several typical conserved domains such as F, E, D, and C boxes, and a conserved sequence block. Nucleotide sequences and the repeated motifs in the CR were different among the bear species, and their copy numbers were also variable according to populations, even within F1 generations of U. thibetanus ussuricus. Comparative analyses showed that the CR D1 region was highly informative for the discrimination of the bear family. These findings suggest that nucleotide sequences of both repeated motifs and CR D1 in the bear family are good markers for species discriminations.
Cloning & sequence identification of Hsp27 gene and expression analysis of the protein on thermal stress in Lucilia cuprina.

PubMed

Singh, Manish K; Tiwari, Pramod K

2016-08-01

Hsp27, a highly conserved small molecular weight heat shock protein, is widely known to be developmentally regulated and heat inducible. Its role in thermotolerance is also implicated. This study is a sequel of our earlier studies to understand the molecular organization of heat shock genes/proteins and their role in development and thermal adaptation in a sheep pest, Lucilia cuprina (blowfly), which exhibits unusually high adaptability to a variety of environmental stresses, including heat and chemicals. In this report our aim was to understand the evolutionary relationship of Lucilia hsp27 gene/protein with those of other species and its role in thermal adaptation. We sequence characterized the Lchsp27 gene (coding region) and analyzed its expression in various larval and adult tissues under normal as well as heat shock conditions. The nucleotide sequence analysis of 678 bps long-coding region of Lchsp27 exhibited closest evolutionary proximity with Drosophila (90.09%), which belongs to the same order, Diptera. Heat shock caused significant enhancement in the expression of Lchsp27 gene in all the larval and adult tissues examined, however, in a tissue specific manner. Significantly, in Malpighian tubules, while the heat-induced level of hsp27 transcript (mRNA) appeared increased as compared to control, the protein level remained unaltered and nuclear localized. We infer that Lchsp27 may have significant role in the maintenance of cellular homeostasis, particularly, during summer months, when the fly remains exposed to high heat in its natural habitat. © 2015 Institute of Zoology, Chinese Academy of Sciences.
Characterization of a Novel Polerovirus Infecting Maize in China

PubMed Central

Chen, Sha; Jiang, Guangzhuang; Wu, Jianxiang; Liu, Yong; Qian, Yajuan; Zhou, Xueping

2016-01-01

A novel virus, tentatively named Maize Yellow Mosaic Virus (MaYMV), was identified from the field-grown maize plants showing yellow mosaic symptoms on the leaves collected from the Yunnan Province of China by the deep sequencing of small RNAs. The complete 5642 nucleotide (nt)-long genome of the MaYMV shared the highest nucleotide sequence identity (73%) to Maize Yellow Dwarf Virus-RMV. Sequence comparisons and phylogenetic analyses suggested that MaYMV represents a new member of the genus Polerovirus in the family Luteoviridae. Furthermore, the P0 protein encoded by MaYMV was demonstrated to inhibit both local and systemic RNA silencing by co-infiltration assays using transgenic Nicotiana benthamiana line 16c carrying the GFP reporter gene, which further supported the identification of a new polerovirus. The biologically-active cDNA clone of MaYMV was generated by inserting the full-length cDNA of MaYMV into the binary vector pCB301. RT-PCR and Northern blot analyses showed that this clone was systemically infectious upon agro-inoculation into N. benthamiana. Subsequently, 13 different isolates of MaYMV from field-grown maize plants in different geographical locations of Yunnan and Guizhou provinces of China were sequenced. Analyses of their molecular variation indicate that the 3′ half of P3–P5 read-through protein coding region was the most variable, whereas the coat protein- (CP-) and movement protein- (MP-)coding regions were the most conserved. PMID:27136578
Characterization of a Novel Polerovirus Infecting Maize in China.

PubMed

Chen, Sha; Jiang, Guangzhuang; Wu, Jianxiang; Liu, Yong; Qian, Yajuan; Zhou, Xueping

2016-04-28

A novel virus, tentatively named Maize Yellow Mosaic Virus (MaYMV), was identified from the field-grown maize plants showing yellow mosaic symptoms on the leaves collected from the Yunnan Province of China by the deep sequencing of small RNAs. The complete 5642 nucleotide (nt)-long genome of the MaYMV shared the highest nucleotide sequence identity (73%) to Maize Yellow Dwarf Virus-RMV. Sequence comparisons and phylogenetic analyses suggested that MaYMV represents a new member of the genus Polerovirus in the family Luteoviridae. Furthermore, the P0 protein encoded by MaYMV was demonstrated to inhibit both local and systemic RNA silencing by co-infiltration assays using transgenic Nicotiana benthamiana line 16c carrying the GFP reporter gene, which further supported the identification of a new polerovirus. The biologically-active cDNA clone of MaYMV was generated by inserting the full-length cDNA of MaYMV into the binary vector pCB301. RT-PCR and Northern blot analyses showed that this clone was systemically infectious upon agro-inoculation into N. benthamiana. Subsequently, 13 different isolates of MaYMV from field-grown maize plants in different geographical locations of Yunnan and Guizhou provinces of China were sequenced. Analyses of their molecular variation indicate that the 3' half of P3-P5 read-through protein coding region was the most variable, whereas the coat protein- (CP-) and movement protein- (MP-)coding regions were the most conserved.
[Interconnection between architecture of protein globule and disposition of conformational conservative oligopeptides in proteins from one protein family].

PubMed

Batianovskiĭ, A V; Filatov, I V; Namiot, V A; Esipova, N G; Volotovskiĭ, I D

2012-01-01

It was shown that selective interactions between helical segments of macromolecules can realize in globular proteins in the segments characterized by the same periodicities of charge distribution i.e. between conformationally conservative oligopeptides. It was found that in the macromolecules of alpha-helical proteins conformationally conservative oligopeptides are disposed at a distance being characteristic of direct interactions. For representatives of many structural families of alpha-type proteins specific disposition of conformationally conservative segments is observed. This disposition is inherent to a particular structural family. Disposition of conformationally conservative segments is not related to homology of the amino acid sequence but reflects peculiarities of native 3D-architectures of protein globules.
Myotonia-related mutations in the distal C-terminus of ClC-1 and ClC-0 chloride channels affect the structure of a poly-proline helix

PubMed Central

Macías, María J.; Teijido, Oscar; Zifarelli, Giovanni; Martin, Pau; Ramirez-Espain, Ximena; Zorzano, Antonio; Palacín, Manuel; Pusch, Michael; Estévez, Raúl

2006-01-01

Myotonia is a state of hyperexcitability of skeletal-muscle fibres. Mutations in the ClC-1 Cl− channel cause recessive and dominant forms of this disease. Mutations have been described throughout the protein-coding region, including three sequence variations (A885P, R894X and P932L) in a distal C-terminal stretch of residues [CTD (C-terminal domain) region] that are not conserved between CLC proteins. We show that surface expression of these mutants is reduced in Xenopus oocytes compared with wild-type ClC-1. Functional, biochemical and NMR spectroscopy studies revealed that the CTD region encompasses a segment conserved in most voltage-dependent CLC channels that folds with a secondary structure containing a short type II poly-proline helix. We found that the myotonia-causing mutation A885P disturbs this structure by extending the poly-proline helix. We hypothesize that this structural modification results in the observed alteration of the common gate that acts on both pores of the channel. We provide the first experimental investigation of structural changes resulting from myotonia-causing mutations. PMID:17107341

Role of Nrf2/HO-1 system in development, oxidative stress response and diseases: an evolutionarily conserved mechanism.

PubMed

Loboda, Agnieszka; Damulewicz, Milena; Pyza, Elzbieta; Jozkowicz, Alicja; Dulak, Jozef

2016-09-01

The multifunctional regulator nuclear factor erythroid 2-related factor (Nrf2) is considered not only as a cytoprotective factor regulating the expression of genes coding for anti-oxidant, anti-inflammatory and detoxifying proteins, but it is also a powerful modulator of species longevity. The vertebrate Nrf2 belongs to Cap 'n' Collar (Cnc) bZIP family of transcription factors and shares a high homology with SKN-1 from Caenorhabditis elegans or CncC found in Drosophila melanogaster. The major characteristics of Nrf2 are to some extent mimicked by Nrf2-dependent genes and their proteins including heme oxygenase-1 (HO-1), which besides removing toxic heme, produces biliverdin, iron ions and carbon monoxide. HO-1 and their products exert beneficial effects through the protection against oxidative injury, regulation of apoptosis, modulation of inflammation as well as contribution to angiogenesis. On the other hand, the disturbances in the proper HO-1 level are associated with the pathogenesis of some age-dependent disorders, including neurodegeneration, cancer or macular degeneration. This review summarizes our knowledge about Nrf2 and HO-1 across different phyla suggesting their conservative role as stress-protective and anti-aging factors.
Impaired mitotic progression and preimplantation lethality in mice lacking OMCG1, a new evolutionarily conserved nuclear protein.

PubMed

Artus, Jérôme; Vandormael-Pournin, Sandrine; Frödin, Morten; Nacerddine, Karim; Babinet, Charles; Cohen-Tannoudji, Michel

2005-07-01

While highly conserved through evolution, the cell cycle has been extensively modified to adapt to new developmental programs. Recently, analyses of mouse mutants revealed that several important cell cycle regulators are either dispensable for development or have a tissue- or cell-type-specific function, indicating that many aspects of cell cycle regulation during mammalian embryo development remain to be elucidated. Here, we report on the characterization of a new gene, Omcg1, which codes for a nuclear zinc finger protein. Embryos lacking Omcg1 die by the end of preimplantation development. In vitro cultured Omcg1-null blastocysts exhibit a dramatic reduction in the total cell number, a high mitotic index, and the presence of abnormal mitotic figures. Importantly, we found that Omcg1 disruption results in the lengthening of M phase rather than in a mitotic block. We show that the mitotic delay in Omcg1-/- embryos is associated with neither a dysfunction of the spindle checkpoint nor abnormal global histone modifications. Taken together, these results suggest that Omcg1 is an important regulator of the cell cycle in the preimplantation embryo.
18 CFR 284.403 - Code of conduct for persons holding blanket marketing certificates.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 18 Conservation of Power and Water Resources 1 2010-04-01 2010-04-01 false Code of conduct for persons holding blanket marketing certificates. 284.403 Section 284.403 Conservation of Power and Water... information upon which it billed the prices it charged for the natural gas sold pursuant to its market based...
18 CFR 284.288 - Code of conduct for unbundled sales service.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 18 Conservation of Power and Water Resources 1 2010-04-01 2010-04-01 false Code of conduct for unbundled sales service. 284.288 Section 284.288 Conservation of Power and Water Resources FEDERAL ENERGY... information upon which it billed the prices it charged for natural gas it sold pursuant to its market based...
Identification of Conserved Water Sites in Protein Structures for Drug Design.

PubMed

Jukič, Marko; Konc, Janez; Gobec, Stanislav; Janežič, Dušanka

2017-12-26

Identification of conserved waters in protein structures is a challenging task with applications in molecular docking and protein stability prediction. As an alternative to computationally demanding simulations of proteins in water, experimental cocrystallized waters in the Protein Data Bank (PDB) in combination with a local structure alignment algorithm can be used for reliable prediction of conserved water sites. We developed the ProBiS H2O approach based on the previously developed ProBiS algorithm, which enables identification of conserved water sites in proteins using experimental protein structures from the PDB or a set of custom protein structures available to the user. With a protein structure, a binding site, or an individual water molecule as a query, ProBiS H2O collects similar proteins from the PDB and performs local or binding site-specific superimpositions of the query structure with similar proteins using the ProBiS algorithm. It collects the experimental water molecules from the similar proteins and transposes them to the query protein. Transposed waters are clustered by their mutual proximity, which enables identification of discrete sites in the query protein with high water conservation. ProBiS H2O is a robust and fast new approach that uses existing experimental structural data to identify conserved water sites on the interfaces of protein complexes, for example protein-small molecule interfaces, and elsewhere on the protein structures. It has been successfully validated in several reported proteins in which conserved water molecules were found to play an important role in ligand binding with applications in drug design.
Hundreds of conserved non-coding genomic regions are independently lost in mammals

PubMed Central

Hiller, Michael; Schaar, Bruce T.; Bejerano, Gill

2012-01-01

Conserved non-protein-coding DNA elements (CNEs) often encode cis-regulatory elements and are rarely lost during evolution. However, CNE losses that do occur can be associated with phenotypic changes, exemplified by pelvic spine loss in sticklebacks. Using a computational strategy to detect complete loss of CNEs in mammalian genomes while strictly controlling for artifacts, we find >600 CNEs that are independently lost in at least two mammalian lineages, including a spinal cord enhancer near GDF11. We observed several genomic regions where multiple independent CNE loss events happened; the most extreme is the DIAPH2 locus. We show that CNE losses often involve deletions and that CNE loss frequencies are non-uniform. Similar to less pleiotropic enhancers, we find that independently lost CNEs are shorter, slightly less constrained and evolutionarily younger than CNEs without detected losses. This suggests that independently lost CNEs are less pleiotropic and that pleiotropic constraints contribute to non-uniform CNE loss frequencies. We also detected 35 CNEs that are independently lost in the human lineage and in other mammals. Our study uncovers an interesting aspect of the evolution of functional DNA in mammalian genomes. Experiments are necessary to test if these independently lost CNEs are associated with parallel phenotype changes in mammals. PMID:23042682
Novel Structure and Unexpected RNA-Binding Ability of the C-Terminal Domain of Herpes Simplex Virus 1 Tegument Protein UL21

DOE Office of Scientific and Technical Information (OSTI.GOV)

Metrick, Claire M.; Heldwein, Ekaterina E.; Sandri-Goldin, R. M.

Proteins forming the tegument layers of herpesviral virions mediate many essential processes in the viral replication cycle, yet few have been characterized in detail. UL21 is one such multifunctional tegument protein and is conserved among alphaherpesviruses. While UL21 has been implicated in many processes in viral replication, ranging from nuclear egress to virion morphogenesis to cell-cell spread, its precise roles remain unclear. Here we report the 2.7-Å crystal structure of the C-terminal domain of herpes simplex virus 1 (HSV-1) UL21 (UL21C), which has a unique α-helical fold resembling a dragonfly. Analysis of evolutionary conservation patterns and surface electrostatics pinpointed fourmore » regions of potential functional importance on the surface of UL21C to be pursued by mutagenesis. In combination with the previously determined structure of the N-terminal domain of UL21, the structure of UL21C provides a 3-dimensional framework for targeted exploration of the multiple roles of UL21 in the replication and pathogenesis of alphaherpesviruses. Additionally, we describe an unanticipated ability of UL21 to bind RNA, which may hint at a yet unexplored function. IMPORTANCEDue to the limited genomic coding capacity of viruses, viral proteins are often multifunctional, which makes them attractive antiviral targets. Such multifunctionality, however, complicates their study, which often involves constructing and characterizing null mutant viruses. Systematic exploration of these multifunctional proteins requires detailed road maps in the form of 3-dimensional structures. In this work, we determined the crystal structure of the C-terminal domain of UL21, a multifunctional tegument protein that is conserved among alphaherpesviruses. Structural analysis pinpointed surface areas of potential functional importance that provide a starting point for mutagenesis. In addition, the unexpected RNA-binding ability of UL21 may expand its functional repertoire. The structure of UL21C and the observation of its RNA-binding ability are the latest additions to the navigational chart that can guide the exploration of the multiple functions of UL21.« less
Emergence and Evolution

PubMed Central

Bullwinkle, Tammy J.

2013-01-01

The aminoacyl-tRNA synthetases (aaRSs) are essential components of the protein synthesis machinery responsible for defining the genetic code by pairing the correct amino acids to their cognate tRNAs. The aaRSs are an ancient enzyme family believed to have origins that may predate the last common ancestor and as such they provide insights into the evolution and development of the extant genetic code. Although the aaRSs have long been viewed as a highly conserved group of enzymes, findings within the last couple of decades have started to demonstrate how diverse and versatile these enzymes really are. Beyond their central role in translation, aaRSs and their numerous homologs have evolved a wide array of alternative functions both inside and outside translation. Current understanding of the emergence of the aaRSs, and their subsequent evolution into a functionally diverse enzyme family, are discussed in this chapter. PMID:23478877
The conservation pattern of short linear motifs is highly correlated with the function of interacting protein domains.

PubMed

Ren, Siyuan; Yang, Guang; He, Youyu; Wang, Yiguo; Li, Yixue; Chen, Zhengjun

2008-10-01

Many well-represented domains recognize primary sequences usually less than 10 amino acids in length, called Short Linear Motifs (SLiMs). Accurate prediction of SLiMs has been difficult because they are short (often < 10 amino acids) and highly degenerate. In this study, we combined scoring matrixes derived from peptide library and conservation analysis to identify protein classes enriched of functional SLiMs recognized by SH2, SH3, PDZ and S/T kinase domains. Our combined approach revealed that SLiMs are highly conserved in proteins from functional classes that are known to interact with a specific domain, but that they are not conserved in most other protein groups. We found that SLiMs recognized by SH2 domains were highly conserved in receptor kinases/phosphatases, adaptor molecules, and tyrosine kinases/phosphatases, that SLiMs recognized by SH3 domains were highly conserved in cytoskeletal and cytoskeletal-associated proteins, that SLiMs recognized by PDZ domains were highly conserved in membrane proteins such as channels and receptors, and that SLiMs recognized by S/T kinase domains were highly conserved in adaptor molecules, S/T kinases/phosphatases, and proteins involved in transcription or cell cycle control. We studied Tyr-SLiMs recognized by SH2 domains in more detail, and found that SH2-recognized Tyr-SLiMs on the cytoplasmic side of membrane proteins are more highly conserved than those on the extra-cellular side. Also, we found that SH2-recognized Tyr-SLiMs that are associated with SH3 motifs and a tyrosine kinase phosphorylation motif are more highly conserved. The interactome of protein domains is reflected by the evolutionary conservation of SLiMs recognized by these domains. Combining scoring matrixes derived from peptide libraries and conservation analysis, we would be able to find those protein groups that are more likely to interact with specific domains.
The conservation pattern of short linear motifs is highly correlated with the function of interacting protein domains

PubMed Central

Ren, Siyuan; Yang, Guang; He, Youyu; Wang, Yiguo; Li, Yixue; Chen, Zhengjun

2008-01-01

Background Many well-represented domains recognize primary sequences usually less than 10 amino acids in length, called Short Linear Motifs (SLiMs). Accurate prediction of SLiMs has been difficult because they are short (often < 10 amino acids) and highly degenerate. In this study, we combined scoring matrixes derived from peptide library and conservation analysis to identify protein classes enriched of functional SLiMs recognized by SH2, SH3, PDZ and S/T kinase domains. Results Our combined approach revealed that SLiMs are highly conserved in proteins from functional classes that are known to interact with a specific domain, but that they are not conserved in most other protein groups. We found that SLiMs recognized by SH2 domains were highly conserved in receptor kinases/phosphatases, adaptor molecules, and tyrosine kinases/phosphatases, that SLiMs recognized by SH3 domains were highly conserved in cytoskeletal and cytoskeletal-associated proteins, that SLiMs recognized by PDZ domains were highly conserved in membrane proteins such as channels and receptors, and that SLiMs recognized by S/T kinase domains were highly conserved in adaptor molecules, S/T kinases/phosphatases, and proteins involved in transcription or cell cycle control. We studied Tyr-SLiMs recognized by SH2 domains in more detail, and found that SH2-recognized Tyr-SLiMs on the cytoplasmic side of membrane proteins are more highly conserved than those on the extra-cellular side. Also, we found that SH2-recognized Tyr-SLiMs that are associated with SH3 motifs and a tyrosine kinase phosphorylation motif are more highly conserved. Conclusion The interactome of protein domains is reflected by the evolutionary conservation of SLiMs recognized by these domains. Combining scoring matrixes derived from peptide libraries and conservation analysis, we would be able to find those protein groups that are more likely to interact with specific domains. PMID:18828911
Analysis of Complete Nucleotide Sequences of 12 Gossypium Chloroplast Genomes: Origin and Evolution of Allotetraploids

PubMed Central

Xu, Qin; Xiong, Guanjun; Li, Pengbo; He, Fei; Huang, Yi; Wang, Kunbo; Li, Zhaohu; Hua, Jinping

2012-01-01

Background Cotton (Gossypium spp.) is a model system for the analysis of polyploidization. Although ascertaining the donor species of allotetraploid cotton has been intensively studied, sequence comparison of Gossypium chloroplast genomes is still of interest to understand the mechanisms underlining the evolution of Gossypium allotetraploids, while it is generally accepted that the parents were A- and D-genome containing species. Here we performed a comparative analysis of 13 Gossypium chloroplast genomes, twelve of which are presented here for the first time. Methodology/Principal Findings The size of 12 chloroplast genomes under study varied from 159,959 bp to 160,433 bp. The chromosomes were highly similar having >98% sequence identity. They encoded the same set of 112 unique genes which occurred in a uniform order with only slightly different boundary junctions. Divergence due to indels as well as substitutions was examined separately for genome, coding and noncoding sequences. The genome divergence was estimated as 0.374% to 0.583% between allotetraploid species and A-genome, and 0.159% to 0.454% within allotetraploids. Forty protein-coding genes were completely identical at the protein level, and 20 intergenic sequences were completely conserved. The 9 allotetraploids shared 5 insertions and 9 deletions in whole genome, and 7-bp substitutions in protein-coding genes. The phylogenetic tree confirmed a close relationship between allotetraploids and the ancestor of A-genome, and the allotetraploids were divided into four separate groups. Progenitor allotetraploid cotton originated 0.43–0.68 million years ago (MYA). Conclusion Despite high degree of conservation between the Gossypium chloroplast genomes, sequence variations among species could still be detected. Gossypium chloroplast genomes preferred for 5-bp indels and 1–3-bp indels are mainly attributed to the SSR polymorphisms. This study supports that the common ancestor of diploid A-genome species in Gossypium is the maternal source of extant allotetraploid species and allotetraploids have a monophyletic origin. G. hirsutum AD1 lineages have experienced more sequence variations than other allotetraploids in intergenic regions. The available complete nucleotide sequences of 12 Gossypium chloroplast genomes should facilitate studies to uncover the molecular mechanisms of compartmental co-evolution and speciation of Gossypium allotetraploids. PMID:22876273
ScaffoldSeq: Software for characterization of directed evolution populations.

PubMed

Woldring, Daniel R; Holec, Patrick V; Hackel, Benjamin J

2016-07-01

ScaffoldSeq is software designed for the numerous applications-including directed evolution analysis-in which a user generates a population of DNA sequences encoding for partially diverse proteins with related functions and would like to characterize the single site and pairwise amino acid frequencies across the population. A common scenario for enzyme maturation, antibody screening, and alternative scaffold engineering involves naïve and evolved populations that contain diversified regions, varying in both sequence and length, within a conserved framework. Analyzing the diversified regions of such populations is facilitated by high-throughput sequencing platforms; however, length variability within these regions (e.g., antibody CDRs) encumbers the alignment process. To overcome this challenge, the ScaffoldSeq algorithm takes advantage of conserved framework sequences to quickly identify diverse regions. Beyond this, unintended biases in sequence frequency are generated throughout the experimental workflow required to evolve and isolate clones of interest prior to DNA sequencing. ScaffoldSeq software uniquely handles this issue by providing tools to quantify and remove background sequences, cluster similar protein families, and dampen the impact of dominant clones. The software produces graphical and tabular summaries for each region of interest, allowing users to evaluate diversity in a site-specific manner as well as identify epistatic pairwise interactions. The code and detailed information are freely available at http://research.cems.umn.edu/hackel. Proteins 2016; 84:869-874. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Alternative splicing and promoter use in TFII-I genes

PubMed Central

Makeyev, Aleksandr V.; Bayarsaihan, Dashzeveg

2008-01-01

TFII-I proteins are ubiquitously expressed transcriptional factors involved in both basal transcription and signal transduction activation or repression. TFII-I proteins are detected as early as at two-cell stage and exhibit distinct and dynamic expression patterns in developing embryos as well as mark regional variation in the adult mouse brain. Analysis of atypical small and rare chromosomal deletions at 7q11.23 points to TFII-I genes (GTF2I and GTF2IRD1) as the prime candidates responsible for craniofacial and cognitive abnormalities in the Williams-Beuren syndrome. TFII-I genes are often subjected to alternative splicing, which generates isoforms that that show different activities and play distinct biological roles. The coding regions of TFII-I genes are composed of more than 30 exons and are well conserved among vertebrates. However, their 5′ untranslated regions are not as well conserved and all poorly characterized. In the present work, we analyzed promoter regions of TFII-I genes and described their additional exons, as well as tested tissue specificity of both previously reported and novel alternatively spliced isoforms. Our comprehensive analysis leads to further elucidation of the functional heterogeneity of TFII-I proteins, provides hints on search for regulatory pathways governing their expression, and opens up possibilities for examining the effect of different haplotypes on their promoter functions. PMID:19111598
Dose–Sensitivity, Conserved Non-Coding Sequences, and Duplicate Gene Retention Through Multiple Tetraploidies in the Grasses

PubMed Central

Schnable, James C.; Pedersen, Brent S.; Subramaniam, Sabarinath; Freeling, Michael

2011-01-01

Whole genome duplications, or tetraploidies, are an important source of increased gene content. Following whole genome duplication, duplicate copies of many genes are lost from the genome. This loss of genes is biased both in the classes of genes deleted and the subgenome from which they are lost. Many or all classes are genes preferentially retained as duplicate copies are engaged in dose sensitive protein–protein interactions, such that deletion of any one duplicate upsets the status quo of subunit concentrations, and presumably lowers fitness as a result. Transcription factors are also preferentially retained following every whole genome duplications studied. This has been explained as a consequence of protein–protein interactions, just as for other highly retained classes of genes. We show that the quantity of conserved noncoding sequences (CNSs) associated with genes predicts the likelihood of their retention as duplicate pairs following whole genome duplication. As many CNSs likely represent binding sites for transcriptional regulators, we propose that the likelihood of gene retention following tetraploidy may also be influenced by dose–sensitive protein–DNA interactions between the regulatory regions of CNS-rich genes – nicknamed bigfoot genes – and the proteins that bind to them. Using grass genomes, we show that differential loss of CNSs from one member of a pair following the pre-grass tetraploidy reduces its chance of retention in the subsequent maize lineage tetraploidy. PMID:22645525
Mutational Analysis of Drosophila Basigin Function in the Visual System

PubMed Central

Munro, Michelle; Akkam, Yazan; Curtin, Kathryn D.

2009-01-01

Drosophila basigin is a cell-surface glycoprotein of the Ig superfamily and a member of a protein family that includes mammalian EMMPRIN/CD147/basigin, neuroplastin, and embigin. Our previous work on Drosophila basigin has shown that it is required for normal photoreceptor cell structure and normal neuron-glia interaction in the fly visual system. Specifically, the photoreceptor neurons of mosaic animals that are mutant in the eye for basigin show altered cell structure with nuclei, mitochondria and rER misplaced and variable axon diameter compared to wild-type. In addition, glia cells in the optic lamina that contact photoreceptor axons are misplaced and show altered structure. All these defects are rescued by expression of either transgenic fly basigin or transgenic mouse basigin in the photoreceptors demonstrating that mouse basigin can functionally replace fly basigin. To determine what regions of the basigin protein are required for each of these functions, we have created mutant basigin transgenes coding for proteins that are altered in conserved residues, introduced these into the fly genome, and tested them for their ability to rescue both photoreceptor cell structure defects and neuron-glia interaction defects of basigin. The results suggest that the highly conserved transmembrane domain and the extracellular domains are crucial for basigin function in the visual system while the short intracellular tail may not play a role in these functions. PMID:19782733
Mitochondrial Genome Analysis of Wild Rice (Oryza minuta) and Its Comparison with Other Related Species.

PubMed

Asaf, Sajjad; Khan, Abdul Latif; Khan, Abdur Rahim; Waqas, Muhammad; Kang, Sang-Mo; Khan, Muhammad Aaqil; Shahzad, Raheem; Seo, Chang-Woo; Shin, Jae-Ho; Lee, In-Jung

2016-01-01

Oryza minuta (Poaceae family) is a tetraploid wild relative of cultivated rice with a BBCC genome. O. minuta has the potential to resist against various pathogenic diseases such as bacterial blight (BB), white backed planthopper (WBPH) and brown plant hopper (BPH). Here, we sequenced and annotated the complete mitochondrial genome of O. minuta. The mtDNA genome is 515,022 bp, containing 60 protein coding genes, 31 tRNA genes and two rRNA genes. The mitochondrial genome organization and the gene content at the nucleotide level are highly similar (89%) to that of O. rufipogon. Comparison with other related species revealed that most of the genes with known function are conserved among the Poaceae members. Similarly, O. minuta mt genome shared 24 protein-coding genes, 15 tRNA genes and 1 ribosomal RNA gene with other rice species (indica and japonica). The evolutionary relationship and phylogenetic analysis revealed that O. minuta is more closely related to O. rufipogon than to any other related species. Such studies are essential to understand the evolutionary divergence among species and analyze common gene pools to combat risks in the current scenario of a changing environment.
Complete Mitochondrial Genome of the Red Fox (Vuples vuples) and Phylogenetic Analysis with Other Canid Species.

PubMed

Zhong, Hua-Ming; Zhang, Hong-Hai; Sha, Wei-Lai; Zhang, Cheng-De; Chen, Yu-Cai

2010-04-01

The whole mitochondrial genome sequence of red fox (Vuples vuples) was determined. It had a total length of 16 723 bp. As in most mammal mitochondrial genome, it contained 13 protein coding genes, two ribosome RNA genes, 22 transfer RNA genes and one control region. The base composition was 31.3% A, 26.1% C, 14.8% G and 27.8% T, respectively. The codon usage of red fox, arctic fox, gray wolf, domestic dog and coyote followed the same pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 3 gene in the red fox. A long tandem repeat rich in AC was found between conserved sequence block 1 and 2 in the control region. In order to confirm the phylogenetic relationships of red fox to other canids, phylogenetic trees were reconstructed by neighbor-joining and maximum parsimony methods using 12 concatenated heavy-strand protein-coding genes. The result indicated that arctic fox was the sister group of red fox and they both belong to the red fox-like clade in family Canidae, while gray wolf, domestic dog and coyote belong to wolf-like clade. The result was in accordance with existing phylogenetic results.
lncRScan-SVM: A Tool for Predicting Long Non-Coding RNAs Using Support Vector Machine.

PubMed

Sun, Lei; Liu, Hui; Zhang, Lin; Meng, Jia

2015-01-01

Functional long non-coding RNAs (lncRNAs) have been bringing novel insight into biological study, however it is still not trivial to accurately distinguish the lncRNA transcripts (LNCTs) from the protein coding ones (PCTs). As various information and data about lncRNAs are preserved by previous studies, it is appealing to develop novel methods to identify the lncRNAs more accurately. Our method lncRScan-SVM aims at classifying PCTs and LNCTs using support vector machine (SVM). The gold-standard datasets for lncRScan-SVM model training, lncRNA prediction and method comparison were constructed according to the GENCODE gene annotations of human and mouse respectively. By integrating features derived from gene structure, transcript sequence, potential codon sequence and conservation, lncRScan-SVM outperforms other approaches, which is evaluated by several criteria such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and area under curve (AUC). In addition, several known human lncRNA datasets were assessed using lncRScan-SVM. LncRScan-SVM is an efficient tool for predicting the lncRNAs, and it is quite useful for current lncRNA study.
Non-coding stem-bulge RNAs are required for cell proliferation and embryonic development in C. elegans

PubMed Central

Kowalski, Madzia P.; Baylis, Howard A.; Krude, Torsten

2015-01-01

ABSTRACT Stem bulge RNAs (sbRNAs) are a family of small non-coding stem-loop RNAs present in Caenorhabditis elegans and other nematodes, the function of which is unknown. Here, we report the first functional characterisation of nematode sbRNAs. We demonstrate that sbRNAs from a range of nematode species are able to reconstitute the initiation of chromosomal DNA replication in the presence of replication proteins in vitro, and that conserved nucleotide sequence motifs are essential for this function. By functionally inactivating sbRNAs with antisense morpholino oligonucleotides, we show that sbRNAs are required for S phase progression, early embryonic development and the viability of C. elegans in vivo. Thus, we demonstrate a new and essential role for sbRNAs during the early development of C. elegans. sbRNAs show limited nucleotide sequence similarity to vertebrate Y RNAs, which are also essential for the initiation of DNA replication. Our results therefore establish that the essential function of small non-coding stem-loop RNAs during DNA replication extends beyond vertebrates. PMID:25908866
Cloning and expression of a cDNA coding for catalase from zebrafish (Danio rerio).

PubMed

Ken, C F; Lin, C T; Wu, J L; Shaw, J F

2000-06-01

A full-length complementary DNA (cDNA) clone encoding a catalase was amplified by the rapid amplication of cDNA ends-polymerase chain reaction (RACE-PCR) technique from zebrafish (Danio rerio) mRNA. Nucleotide sequence analysis of this cDNA clone revealed that it comprised a complete open reading frame coding for 526 amino acid residues and that it had a molecular mass of 59 654 Da. The deduced amino acid sequence showed high similarity with the sequences of catalase from swine (86.9%), mouse (85.8%), rat (85%), human (83.7%), fruit fly (75.6%), nematode (71.1%), and yeast (58.6%). The amino acid residues for secondary structures are apparently conserved as they are present in other mammal species. Furthermore, the coding region of zebrafish catalase was introduced into an expression vector, pET-20b(+), and transformed into Escherichia coli expression host BL21(DE3)pLysS. A 60-kDa active catalase protein was expressed and detected by Coomassie blue staining as well as activity staining on polyacrylamide gel followed electrophoresis.

Interleukin-1 homologues IL-1F7b and IL-18 contain functional mRNA instability elements within the coding region responsive to lipopolysaccharide

PubMed Central

2004-01-01

IL-1F7b, a novel homologue of the IL-1 (interleukin 1) family, was discovered by computational cloning. We demonstrated that IL-1F7b shares critical amino acid residues with IL-18 and binds to the IL-18-binding protein enhancing its ability to inhibit IL-18-induced interferon-γ. We also showed that low levels of IL-1F7b are constitutively present intracellularly in human blood monocytes. In this study, we demonstrate that similar to IL-18, both mRNA and intracellular protein expression of IL-1F7b are up-regulated by LPS (lipopolysaccharide) in human monocytes. In stable transfectants of murine RAW264.7 macrophage cells, there was no IL-1F7b protein expression despite a highly active CMV promoter. We found that IL-1F7b-specific mRNA was rapidly degraded in transfected cells, via a 3′-UTR (untranslated region)-independent control of IL-1F7b transcript stability. After LPS stimulation, there was a rapid transient increase in IL-1F7b-specific mRNA and concomitant protein levels. Using sequence alignment, we found a conserved ten-nucleotide homology box within the open reading frame of IL-F7b, which is flanking the coding region instability elements of some selective genes. In-frame deletion of downstream exon 5 from the full-length IL-1F7b cDNA markedly increased the levels of IL-1F7b mRNA. A similar coding region element is located in IL-18. When transfected into RAW264.7 macrophages, IL-18 mRNA was also unstable unless treated with LPS. These results indicate that both IL-1F7b and IL-18 mRNA contain functional instability determinants within their coding region, which influence mRNA decay as a novel mechanism to regulate the expression of IL-1 family members. PMID:15046617
Highly Conserved Keratin-Associated Protein 7-1 Gene in Yak, Taurine and Zebu Cattle.

PubMed

Arlud, S; He, N; Sari, E M; Ma, Z-J; Zhang, H; An, T-W; Han, J-L

2017-01-01

Keratin-associated proteins (KRTAPs) play a critical role in cross-linking the keratin intermediate filaments to build a hair shaft. The genetic polymorphisms of the bovine KRTAP7-1 gene were investigated for the first time in this study. The complete coding sequence of the KRTAP7-1 gene in 108 domestic yak, taurine and zebu cattle from China and Indonesia were successfully amplified using polymerase chain reaction and then directly sequenced. Only two single-nucleotide polymorphisms (one nonsynonymous at c.7C/G and another synonymous at c.21C/T) and three haplotypes (BOVIN-KRTAP7-1*A, B and C) were identified in the complete coding sequence of the bovine KRTAP7-1 gene among all animals. There was no polymorphism across three Chinese indigenous yak breeds and one Indonesian zebu cattle population, all sharing the BOVINKRTAP71*A haplotype. The four taurine cattle populations also had BOVIN-KRTAP7-1*A as the most common haplotype with a frequency of 0.80. The frequency of novel haplotype BOVIN-KRTAP7-1*B was only 0.07 present in one heterozygous animal in each of the four taurine cattle populations, while BOVINKRTAP7- 1*C was only found in a Simmental and a local Chinese Yellow cattle population with frequencies of 0.17 and 0.36, respectively. The monomorphic yak KRTAP7-1 gene in particular, and highly conserved bovine, sheep and goat KRTAP7-1 genes in general, demonstrated its unique intrinsic structural property (e.g., > 21% high glycine content) and primary functional importance in supporting the mechanical strength and shape of hair.
Unusual conservation of mitochondrial gene order in Crassostrea oysters: evidence for recent speciation in Asia

PubMed Central

2010-01-01

Background Oysters are morphologically plastic and hence difficult subjects for taxonomic and evolutionary studies. It is long been suspected, based on the extraordinary species diversity observed, that Asia Pacific is the epicenter of oyster speciation. To understand the species diversity and its evolutionary history, we collected five Crassostrea species from Asia and sequenced their complete mitochondrial (mt) genomes in addition to two newly released Asian oysters (C. iredalei and Saccostrea mordax) for a comprehensive analysis. Results The six Asian Crassostrea mt genomes ranged from 18,226 to 22,446 bp in size, and all coded for 39 genes (12 proteins, 2 rRNAs and 25 tRNAs) on the same strand. Their genomes contained a split of the rrnL gene and duplication of trnM, trnK and trnQ genes. They shared the same gene order that differed from an Atlantic sister species by as many as nine tRNA changes (6 transpositions and 3 duplications) and even differed significantly from S. mordax in protein-coding genes. Phylogenetic analysis indicates that the six Asian Crassostrea species emerged between 3 and 43 Myr ago, while the Atlantic species evolved 83 Myr ago. Conclusions The complete conservation of gene order in the six Asian Crassostrea species over 43 Myr is highly unusual given the remarkable rate of rearrangements in their sister species and other bivalves. It provides strong evidence for the recent speciation of the six Crassostrea species in Asia. It further indicates that changes in mt gene order may not be strictly a function of time but subject to other constraints that are presently not well understood. PMID:21189147
Trichodesmium genome maintains abundant, widespread noncoding DNA in situ, despite oligotrophic lifestyle

DOE PAGES

Walworth, Nathan G.; Pfreundt, Ulrike; Nelson, William C.; ...

2015-04-07

Understanding the evolution of the free-living, cyanobacterial, diazotroph Trichodesmium is of great importance due to its critical role in oceanic biogeochemistry and primary production. Unlike the other >150 available genomes of free-living cyanobacteria, only 63.8% of the Trichodesmium erythraeum (strain IMS101) genome is predicted to encode protein, which is 20-25% less than the average for other cyanobacteria and non-pathogenic, free-living bacteria. We use distinctive isolates and metagenomic data to show that low coding density observed in IMS101 is a common feature of the Trichodesmium genus both in culture and in situ. Transcriptome analysis indicates that 86% of the non-coding spacemore » is expressed, although the function of these transcripts is unclear. The density of noncoding, possible regulatory elements predicted in Trichodesmium, when normalized per intergenic kilobase, was comparable and two fold higher than that found in the gene dense genomes of the sympatric cyanobacterial genera Synechococcus and Prochlorococcus, respectively. Conserved Trichodesmium ncRNA secondary structures were predicted between most culture and metagenomic sequences lending support to the structural conservation. Conservation of these intergenic regions in spatiotemporally separated Trichodesmium populations suggests possible genus-wide selection for their maintenance. These large intergenic spacers may have developed during intervals of strong genetic drift caused by periodic blooms of a subset of genotypes, which may have reduced effective population size. Our data suggest that transposition of selfish DNA, low effective population size, and high fidelity replication allowed the unusual ‘inflation’ of noncoding sequence observed in Trichodesmium despite its oligotrophic lifestyle.« less
cncRNAs: Bi-functional RNAs with protein coding and non-coding functions

PubMed Central

Kumari, Pooja; Sampath, Karuna

2015-01-01

For many decades, the major function of mRNA was thought to be to provide protein-coding information embedded in the genome. The advent of high-throughput sequencing has led to the discovery of pervasive transcription of eukaryotic genomes and opened the world of RNA-mediated gene regulation. Many regulatory RNAs have been found to be incapable of protein coding and are hence termed as non-coding RNAs (ncRNAs). However, studies in recent years have shown that several previously annotated non-coding RNAs have the potential to encode proteins, and conversely, some coding RNAs have regulatory functions independent of the protein they encode. Such bi-functional RNAs, with both protein coding and non-coding functions, which we term as ‘cncRNAs’, have emerged as new players in cellular systems. Here, we describe the functions of some cncRNAs identified from bacteria to humans. Because the functions of many RNAs across genomes remains unclear, we propose that RNAs be classified as coding, non-coding or both only after careful analysis of their functions. PMID:26498036
A comparison of complete mitochondrial genomes of silver carp hypophthalmichthys molitrix and bighead carp hypophthalmichthys nobilis: Implications for their taxonomic relationship and phylogeny

USGS Publications Warehouse

Li, S.-F.; Xu, J.-W.; Yang, Q.-L.; Wang, C.H.; Chen, Q.; Chapman, D.C.; Lu, G.

2009-01-01

Based upon morphological characters, Silver carp Hypophthalmichthys molitrix and bighead carp Hypophthalmichthys nobilis (or Aristichthys nobilis) have been classified into either the same genus or two distinct genera. Consequently, the taxonomic relationship of the two species at the generic level remains equivocal. This issue is addressed by sequencing complete mitochondrial genomes of H. molitrix and H. nobilis, comparing their mitogenome organization, structure and sequence similarity, and conducting a comprehensive phylogenetic analysis of cyprinid species. As with other cyprinid fishes, the mitogenomes of the two species were structurally conserved, containing 37 genes including 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA (tRNAs) genes and a putative control region (D-loop). Sequence similarity between the two mitogenomes varied in different genes or regions, being highest in the tRNA genes (98??8%), lowest in the control region (89??4%) and intermediate in the protein-coding genes (94??2%). Analyses of the sequence comparison and phylogeny using concatenated protein sequences support the view that the two species belong to the genus Hypophthalmichthys. Further studies using nuclear markers and involving more closely related species, and the systematic combination of traditional biology and molecular biology are needed in order to confirm this conclusion. ?? 2009 The Fisheries Society of the British Isles.
Comprehensive analysis of coding-lncRNA gene co-expression network uncovers conserved functional lncRNAs in zebrafish.

PubMed

Chen, Wen; Zhang, Xuan; Li, Jing; Huang, Shulan; Xiang, Shuanglin; Hu, Xiang; Liu, Changning

2018-05-09

Zebrafish is a full-developed model system for studying development processes and human disease. Recent studies of deep sequencing had discovered a large number of long non-coding RNAs (lncRNAs) in zebrafish. However, only few of them had been functionally characterized. Therefore, how to take advantage of the mature zebrafish system to deeply investigate the lncRNAs' function and conservation is really intriguing. We systematically collected and analyzed a series of zebrafish RNA-seq data, then combined them with resources from known database and literatures. As a result, we obtained by far the most complete dataset of zebrafish lncRNAs, containing 13,604 lncRNA genes (21,128 transcripts) in total. Based on that, a co-expression network upon zebrafish coding and lncRNA genes was constructed and analyzed, and used to predict the Gene Ontology (GO) and the KEGG annotation of lncRNA. Meanwhile, we made a conservation analysis on zebrafish lncRNA, identifying 1828 conserved zebrafish lncRNA genes (1890 transcripts) that have their putative mammalian orthologs. We also found that zebrafish lncRNAs play important roles in regulation of the development and function of nervous system; these conserved lncRNAs present a significant sequential and functional conservation, with their mammalian counterparts. By integrative data analysis and construction of coding-lncRNA gene co-expression network, we gained the most comprehensive dataset of zebrafish lncRNAs up to present, as well as their systematic annotations and comprehensive analyses on function and conservation. Our study provides a reliable zebrafish-based platform to deeply explore lncRNA function and mechanism, as well as the lncRNA commonality between zebrafish and human.
DASS: efficient discovery and p-value calculation of substructures in unordered data.

PubMed

Hollunder, Jens; Friedel, Maik; Beyer, Andreas; Workman, Christopher T; Wilhelm, Thomas

2007-01-01

Pattern identification in biological sequence data is one of the main objectives of bioinformatics research. However, few methods are available for detecting patterns (substructures) in unordered datasets. Data mining algorithms mainly developed outside the realm of bioinformatics have been adapted for that purpose, but typically do not determine the statistical significance of the identified patterns. Moreover, these algorithms do not exploit the often modular structure of biological data. We present the algorithm DASS (Discovery of All Significant Substructures) that first identifies all substructures in unordered data (DASS(Sub)) in a manner that is especially efficient for modular data. In addition, DASS calculates the statistical significance of the identified substructures, for sets with at most one element of each type (DASS(P(set))), or for sets with multiple occurrence of elements (DASS(P(mset))). The power and versatility of DASS is demonstrated by four examples: combinations of protein domains in multi-domain proteins, combinations of proteins in protein complexes (protein subcomplexes), combinations of transcription factor target sites in promoter regions and evolutionarily conserved protein interaction subnetworks. The program code and additional data are available at http://www.fli-leibniz.de/tsb/DASS
Hierarchical Partitioning of Metazoan Protein Conservation Profiles Provides New Functional Insights

PubMed Central

Witztum, Jonathan; Persi, Erez; Horn, David; Pasmanik-Chor, Metsada; Chor, Benny

2014-01-01

The availability of many complete, annotated proteomes enables the systematic study of the relationships between protein conservation and functionality. We explore this question based solely on the presence or absence of protein homologues (a.k.a. conservation profiles). We study 18 metazoans, from two distinct points of view: the human's and the fly's. Using the GOrilla gene ontology (GO) analysis tool, we explore functional enrichment of the “universal proteins”, those with homologues in all 17 other species, and of the “non-universal proteins”. A large number of GO terms are strongly enriched in both human and fly universal proteins. Most of these functions are known to be essential. A smaller number of GO terms, exhibiting markedly different properties, are enriched in both human and fly non-universal proteins. We further explore the non-universal proteins, whose conservation profiles are consistent with the “tree of life” (TOL consistent), as well as the TOL inconsistent proteins. Finally, we applied Quantum Clustering to the conservation profiles of the TOL consistent proteins. Each cluster is strongly associated with one or a small number of specific monophyletic clades in the tree of life. The proteins in many of these clusters exhibit strong functional enrichment associated with the “life style” of the related clades. Most previous approaches for studying function and conservation are “bottom up”, studying protein families one by one, and separately assessing the conservation of each. By way of contrast, our approach is “top down”. We globally partition the set of all proteins hierarchically, as described above, and then identify protein families enriched within different subdivisions. While supporting previous findings, our approach also provides a tool for discovering novel relations between protein conservation profiles, functionality, and evolutionary history as represented by the tree of life. PMID:24594619
Transposon Mutagenesis of the Zika Virus Genome Highlights Regions Essential for RNA Replication and Restricted for Immune Evasion.

PubMed

Fulton, Benjamin O; Sachs, David; Schwarz, Megan C; Palese, Peter; Evans, Matthew J

2017-08-01

The molecular constraints affecting Zika virus (ZIKV) evolution are not well understood. To investigate ZIKV genetic flexibility, we used transposon mutagenesis to add 15-nucleotide insertions throughout the ZIKV MR766 genome and subsequently deep sequenced the viable mutants. Few ZIKV insertion mutants replicated, which likely reflects a high degree of functional constraints on the genome. The NS1 gene exhibited distinct mutational tolerances at different stages of the screen. This result may define regions of the NS1 protein that are required for the different stages of the viral life cycle. The ZIKV structural genes showed the highest degree of insertional tolerance. Although the envelope (E) protein exhibited particular flexibility, the highly conserved envelope domain II (EDII) fusion loop of the E protein was intolerant of transposon insertions. The fusion loop is also a target of pan-flavivirus antibodies that are generated against other flaviviruses and neutralize a broad range of dengue virus and ZIKV isolates. The genetic restrictions identified within the epitopes in the EDII fusion loop likely explain the sequence and antigenic conservation of these regions in ZIKV and among multiple flaviviruses. Thus, our results provide insights into the genetic restrictions on ZIKV that may affect the evolution of this virus. IMPORTANCE Zika virus recently emerged as a significant human pathogen. Determining the genetic constraints on Zika virus is important for understanding the factors affecting viral evolution. We used a genome-wide transposon mutagenesis screen to identify where mutations were tolerated in replicating viruses. We found that the genetic regions involved in RNA replication were mostly intolerant of mutations. The genes coding for structural proteins were more permissive to mutations. Despite the flexibility observed in these regions, we found that epitopes bound by broadly reactive antibodies were genetically constrained. This finding may explain the genetic conservation of these epitopes among flaviviruses. Copyright © 2017 American Society for Microbiology.
Mutations that Cause Human Disease: A Computational/Experimental Approach

DOE Office of Scientific and Technical Information (OSTI.GOV)

Beernink, P; Barsky, D; Pesavento, B

International genome sequencing projects have produced billions of nucleotides (letters) of DNA sequence data, including the complete genome sequences of 74 organisms. These genome sequences have created many new scientific opportunities, including the ability to identify sequence variations among individuals within a species. These genetic differences, which are known as single nucleotide polymorphisms (SNPs), are particularly important in understanding the genetic basis for disease susceptibility. Since the report of the complete human genome sequence, over two million human SNPs have been identified, including a large-scale comparison of an entire chromosome from twenty individuals. Of the protein coding SNPs (cSNPs), approximatelymore » half leads to a single amino acid change in the encoded protein (non-synonymous coding SNPs). Most of these changes are functionally silent, while the remainder negatively impact the protein and sometimes cause human disease. To date, over 550 SNPs have been found to cause single locus (monogenic) diseases and many others have been associated with polygenic diseases. SNPs have been linked to specific human diseases, including late-onset Parkinson disease, autism, rheumatoid arthritis and cancer. The ability to predict accurately the effects of these SNPs on protein function would represent a major advance toward understanding these diseases. To date several attempts have been made toward predicting the effects of such mutations. The most successful of these is a computational approach called ''Sorting Intolerant From Tolerant'' (SIFT). This method uses sequence conservation among many similar proteins to predict which residues in a protein are functionally important. However, this method suffers from several limitations. First, a query sequence must have a sufficient number of relatives to infer sequence conservation. Second, this method does not make use of or provide any information on protein structure, which can be used to understand how an amino acid change affects the protein. The experimental methods that provide the most detailed structural information on proteins are X-ray crystallography and NMR spectroscopy. However, these methods are labor intensive and currently cannot be carried out on a genomic scale. Nonetheless, Structural Genomics projects are being pursued by more than a dozen groups and consortia worldwide and as a result the number of experimentally determined structures is rising exponentially. Based on the expectation that protein structures will continue to be determined at an ever-increasing rate, reliable structure prediction schemes will become increasingly valuable, leading to information on protein function and disease for many different proteins. Given known genetic variability and experimentally determined protein structures, can we accurately predict the effects of single amino acid substitutions? An objective assessment of this question would involve comparing predicted and experimentally determined structures, which thus far has not been rigorously performed. The completed research leveraged existing expertise at LLNL in computational and structural biology, as well as significant computing resources, to address this question.« less
Complete sequence of two tick-borne flaviviruses isolated from Siberia and the UK: analysis and significance of the 5' and 3'-UTRs.

PubMed

Gritsun, T S; Venugopal, K; Zanotto, P M; Mikhailov, M V; Sall, A A; Holmes, E C; Polkinghorne, I; Frolova, T V; Pogodina, V V; Lashkevich, V A; Gould, E A

1997-05-01

The complete nucleotide sequence of two tick-transmitted flaviviruses, Vasilchenko (Vs) from Siberia and louping ill (LI) from the UK, have been determined. The genomes were respectively, 10928 and 10871 nucleotides (nt) in length. The coding strategy and functional protein sequence motifs of tick-borne flaviviruses are presented in both Vs and LI viruses. The phylogenies based on maximum likelihood, maximum parsimony and distance analysis of the polyproteins, identified Vs virus as a member of the tick-borne encephalitis virus subgroup within the tick-borne serocomplex, genus Flavivirus, family Flaviviridae. Comparative alignment of the 3'-untranslated regions revealed deletions of different lengths essentially at the same position downstream of the stop codon for all tick-borne viruses. Two direct 27 nucleotide repeats at the 3'-end were found only for Vs and LI virus. Immediately following the deletions a region of 332-334 nt with relatively conserved primary structure (67-94% identity) was observed at the 3'-non-coding end of the virus genome. Pairwise comparisons of the nucleotide sequence data revealed similar levels of variation between the coding region, and the 5' and 3'-termini of the genome, implying an equivalent strong selective control for translated and untranslated regions. Indeed the predicted folding of the 5' and 3'-untranslated regions revealed patterns of stem and loop structures conserved for all tick-borne flaviviruses suggesting a purifying selection for preservation of essential RNA secondary structures which could be involved in translational control and replication. The possible implications of these findings are discussed.
Characteristics and significance of intergenic polyadenylated RNA transcription in Arabidopsis.

PubMed

Moghe, Gaurav D; Lehti-Shiu, Melissa D; Seddon, Alex E; Yin, Shan; Chen, Yani; Juntawong, Piyada; Brandizzi, Federica; Bailey-Serres, Julia; Shiu, Shin-Han

2013-01-01

The Arabidopsis (Arabidopsis thaliana) genome is the most well-annotated plant genome. However, transcriptome sequencing in Arabidopsis continues to suggest the presence of polyadenylated (polyA) transcripts originating from presumed intergenic regions. It is not clear whether these transcripts represent novel noncoding or protein-coding genes. To understand the nature of intergenic polyA transcription, we first assessed its abundance using multiple messenger RNA sequencing data sets. We found 6,545 intergenic transcribed fragments (ITFs) occupying 3.6% of Arabidopsis intergenic space. In contrast to transcribed fragments that map to protein-coding and RNA genes, most ITFs are significantly shorter, are expressed at significantly lower levels, and tend to be more data set specific. A surprisingly large number of ITFs (32.1%) may be protein coding based on evidence of translation. However, our results indicate that these "translated" ITFs tend to be close to and are likely associated with known genes. To investigate if ITFs are under selection and are functional, we assessed ITF conservation through cross-species as well as within-species comparisons. Our analysis reveals that 237 ITFs, including 49 with translation evidence, are under strong selective constraint and relatively distant from annotated features. These ITFs are likely parts of novel genes. However, the selective pressure imposed on most ITFs is similar to that of randomly selected, untranscribed intergenic sequences. Our findings indicate that despite the prevalence of ITFs, apart from the possibility of genomic contamination, many may be background or noisy transcripts derived from "junk" DNA, whose production may be inherent to the process of transcription and which, on rare occasions, may act as catalysts for the creation of novel genes.
Serum amyloid A1: Structure, function and gene polymorphism

PubMed Central

Sun, Lei; Ye, Richard D.

2017-01-01

Inducible expression of serum amyloid A (SAA) is a hallmark of the acute-phase response, which is a conserved reaction of vertebrates to environmental challenges such as tissue injury, infection and surgery. Human SAA1 is encoded by one of the four SAA genes and is the best-characterized SAA protein. Initially known as a major precursor of amyloid A (AA), SAA1 has been found to play an important role in lipid metabolism and contributes to bacterial clearance, the regulation of inflammation and tumor pathogenesis. SAA1 has five polymorphic coding alleles (SAA1.1 – SAA1.5) that encode distinct proteins with minor amino acid substitutions. Single nucleotide polymorphism (SNP) has been identified in both the coding and non-coding regions of human SAA1. Despite high levels of sequence homology among these variants, SAA1 polymorphisms have been reported as risk factors of cardiovascular diseases and several types of cancer. A recently solved crystal structure of SAA1.1 reveals a hexameric bundle with each of the SAA1 subunits assuming a 4-helix structure stabilized by the C-terminal tail. Analysis of the native SAA1.1 structure has led to the identification of a competing site for high-density lipoprotein (HDL) and heparin, thus providing the structural basis for a role of heparin and heparan sulfate in the conversion of SAA1 to AA. In this brief review, we compares human SAA1 with other forms of human and mouse SAAs, and discuss how structural and genetic studies of SAA1 have advanced our understanding of the physiological functions of the SAA proteins. PMID:26945629
Conservation of hot regions in protein-protein interaction in evolution.

PubMed

Hu, Jing; Li, Jiarui; Chen, Nansheng; Zhang, Xiaolong

2016-11-01

The hot regions of protein-protein interactions refer to the active area which formed by those most important residues to protein combination process. With the research development on protein interactions, lots of predicted hot regions can be discovered efficiently by intelligent computing methods, while performing biology experiments to verify each every prediction is hardly to be done due to the time-cost and the complexity of the experiment. This study based on the research of hot spot residue conservations, the proposed method is used to verify authenticity of predicted hot regions that using machine learning algorithm combined with protein's biological features and sequence conservation, though multiple sequence alignment, module substitute matrix and sequence similarity to create conservation scoring algorithm, and then using threshold module to verify the conservation tendency of hot regions in evolution. This research work gives an effective method to verify predicted hot regions in protein-protein interactions, which also provides a useful way to deeply investigate the functional activities of protein hot regions. Copyright © 2016. Published by Elsevier Inc.
Conservation of CD44 exon v3 functional elements in mammals

PubMed Central

Vela, Elena; Hilari, Josep M; Delclaux, María; Fernández-Bellon, Hugo; Isamat, Marcos

2008-01-01

Background The human CD44 gene contains 10 variable exons (v1 to v10) that can be alternatively spliced to generate hundreds of different CD44 protein isoforms. Human CD44 variable exon v3 inclusion in the final mRNA depends on a multisite bipartite splicing enhancer located within the exon itself, which we have recently described, and provides the protein domain responsible for growth factor binding to CD44. Findings We have analyzed the sequence of CD44v3 in 95 mammalian species to report high conservation levels for both its splicing regulatory elements (the 3' splice site and the exonic splicing enhancer), and the functional glycosaminglycan binding site coded by v3. We also report the functional expression of CD44v3 isoforms in peripheral blood cells of different mammalian taxa with both consensus and variant v3 sequences. Conclusion CD44v3 mammalian sequences maintain all functional splicing regulatory elements as well as the GAG binding site with the same relative positions and sequence identity previously described during alternative splicing of human CD44. The sequence within the GAG attachment site, which in turn contains the Y motif of the exonic splicing enhancer, is more conserved relative to the rest of exon. Amplification of CD44v3 sequence from mammalian species but not from birds, fish or reptiles, may lead to classify CD44v3 as an exclusive mammalian gene trait. PMID:18710510
Positive selection in the SLC11A1 gene in the family Equidae.

PubMed

Bayerova, Zuzana; Janova, Eva; Matiasovic, Jan; Orlando, Ludovic; Horin, Petr

2016-05-01

Immunity-related genes are a suitable model for studying effects of selection at the genomic level. Some of them are highly conserved due to functional constraints and purifying selection, while others are variable and change quickly to cope with the variation of pathogens. The SLC11A1 gene encodes a transporter protein mediating antimicrobial activity of macrophages. Little is known about the patterns of selection shaping this gene during evolution. Although it is a typical evolutionarily conserved gene, functionally important polymorphisms associated with various diseases were identified in humans and other species. We analyzed the genomic organization, genetic variation, and evolution of the SLC11A1 gene in the family Equidae to identify patterns of selection within this important gene. Nucleotide SLC11A1 sequences were shown to be highly conserved in ten equid species, with more than 97 % sequence identity across the family. Single nucleotide polymorphisms (SNPs) were found in the coding and noncoding regions of the gene. Seven codon sites were identified to be under strong purifying selection. Codons located in three regions, including the glycosylated extracellular loop, were shown to be under diversifying selection. A 3-bp indel resulting in a deletion of the amino acid 321 in the predicted protein was observed in all horses, while it has been maintained in all other equid species. This codon comprised in an N-glycosylation site was found to be under positive selection. Interspecific variation in the presence of predicted N-glycosylation sites was observed.
Regulation of Six1 expression by evolutionarily conserved enhancers in tetrapods.

PubMed

Sato, Shigeru; Ikeda, Keiko; Shioi, Go; Nakao, Kazuki; Yajima, Hiroshi; Kawakami, Kiyoshi

2012-08-01

The Six1 homeobox gene plays critical roles in vertebrate organogenesis. Mice deficient for Six1 show severe defects in organs such as skeletal muscle, kidney, thymus, sensory organs and ganglia derived from cranial placodes, and mutations in human SIX1 cause branchio-oto-renal syndrome, an autosomal dominant developmental disorder characterized by hearing loss and branchial defects. The present study was designed to identify enhancers responsible for the dynamic expression pattern of Six1 during mouse embryogenesis. The results showed distinct enhancer activities of seven conserved non-coding sequences (CNSs) retained in tetrapod Six1 loci. The activities were detected in all cranial placodes (excluding the lens placode), dorsal root ganglia, somites, nephrogenic cord, notochord and cranial mesoderm. The major Six1-expression domains during development were covered by the sum of activities of these enhancers, together with the previously identified enhancer for the pre-placodal region and foregut endoderm. Thus, the eight CNSs identified in a series of our study represent major evolutionarily conserved enhancers responsible for the expression of Six1 in tetrapods. The results also confirmed that chick electroporation is a robust means to decipher regulatory information stored in vertebrate genomes. Mutational analysis of the most conserved placode-specific enhancer, Six1-21, indicated that the enhancer integrates a variety of inputs from Sox, Pax, Fox, Six, Wnt/Lef1 and basic helix-loop-helix proteins. Positive autoregulation of Six1 is achieved through the regulation of Six protein-binding sites. The identified Six1 enhancers provide valuable tools to understand the mechanism of Six1 regulation and to manipulate gene expression in the developing embryo, particularly in the sensory organs. Copyright © 2012 Elsevier Inc. All rights reserved.
High-Throughput Sequencing of Arabidopsis microRNAs: Evidence for Frequent Birth and Death of MIRNA Genes

PubMed Central

Fahlgren, Noah; Howell, Miya D.; Kasschau, Kristin D.; Chapman, Elisabeth J.; Sullivan, Christopher M.; Cumbie, Jason S.; Givan, Scott A.; Law, Theresa F.; Grant, Sarah R.; Dangl, Jeffery L.; Carrington, James C.

2007-01-01

In plants, microRNAs (miRNAs) comprise one of two classes of small RNAs that function primarily as negative regulators at the posttranscriptional level. Several MIRNA genes in the plant kingdom are ancient, with conservation extending between angiosperms and the mosses, whereas many others are more recently evolved. Here, we use deep sequencing and computational methods to identify, profile and analyze non-conserved MIRNA genes in Arabidopsis thaliana. 48 non-conserved MIRNA families, nearly all of which were represented by single genes, were identified. Sequence similarity analyses of miRNA precursor foldback arms revealed evidence for recent evolutionary origin of 16 MIRNA loci through inverted duplication events from protein-coding gene sequences. Interestingly, these recently evolved MIRNA genes have taken distinct paths. Whereas some non-conserved miRNAs interact with and regulate target transcripts from gene families that donated parental sequences, others have drifted to the point of non-interaction with parental gene family transcripts. Some young MIRNA loci clearly originated from one gene family but form miRNAs that target transcripts in another family. We suggest that MIRNA genes are undergoing relatively frequent birth and death, with only a subset being stabilized by integration into regulatory networks. PMID:17299599
Initial verification and validation of RAZORBACK - A research reactor transient analysis code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Talley, Darren G.

2015-09-01

This report describes the work and results of the initial verification and validation (V&V) of the beta release of the Razorback code. Razorback is a computer code designed to simulate the operation of a research reactor (such as the Annular Core Research Reactor (ACRR)) by a coupled numerical solution of the point reactor kinetics equations, the energy conservation equation for fuel element heat transfer, and the mass, momentum, and energy conservation equations for the water cooling of the fuel elements. This initial V&V effort was intended to confirm that the code work to-date shows good agreement between simulation and actualmore » ACRR operations, indicating that the subsequent V&V effort for the official release of the code will be successful.« less

On the relationship between residue structural environment and sequence conservation in proteins.

PubMed

Liu, Jen-Wei; Lin, Jau-Ji; Cheng, Chih-Wen; Lin, Yu-Feng; Hwang, Jenn-Kang; Huang, Tsun-Tsao

2017-09-01

Residues that are crucial to protein function or structure are usually evolutionarily conserved. To identify the important residues in protein, sequence conservation is estimated, and current methods rely upon the unbiased collection of homologous sequences. Surprisingly, our previous studies have shown that the sequence conservation is closely correlated with the weighted contact number (WCN), a measure of packing density for residue's structural environment, calculated only based on the C α positions of a protein structure. Moreover, studies have shown that sequence conservation is correlated with environment-related structural properties calculated based on different protein substructures, such as a protein's all atoms, backbone atoms, side-chain atoms, or side-chain centroid. To know whether the C α atomic positions are adequate to show the relationship between residue environment and sequence conservation or not, here we compared C α atoms with other substructures in their contributions to the sequence conservation. Our results show that C α positions are substantially equivalent to the other substructures in calculations of various measures of residue environment. As a result, the overlapping contributions between C α atoms and the other substructures are high, yielding similar structure-conservation relationship. Take the WCN as an example, the average overlapping contribution to sequence conservation is 87% between C α and all-atom substructures. These results indicate that only C α atoms of a protein structure could reflect sequence conservation at the residue level. © 2017 Wiley Periodicals, Inc.
Decoding the phosphorylation code in Hedgehog signal transduction

PubMed Central

Chen, Yongbin; Jiang, Jin

2013-01-01

Hedgehog (Hh) signaling plays pivotal roles in embryonic development and adult tissue homeostasis, and its deregulation leads to numerous human disorders including cancer. Binding of Hh to Patched (Ptc), a twelve-transmembrane protein, alleviates its inhibition of Smoothened (Smo), a seven-transmembrane protein related to G-protein-coupled receptors (GPCRs), leading to Smo phosphorylation and activation. Smo acts through intracellular signaling complexes to convert the latent transcription factor Cubitus interruptus (Ci)/Gli from a truncated repressor to a full-length activator, leading to derepression/activation of Hh target genes. Increasing evidence suggests that phosphorylation participates in almost every step in the signal relay from Smo to Ci/Gli, and that differential phosphorylation of several key pathway components may be crucial for translating the Hh morphogen gradient into graded pathway activities. In this review, we focus on the multifaceted roles that phosphorylation plays in Hh signal transduction, and discuss the conservation and difference between Drosophila and mammalian Hh signaling mechanisms. PMID:23337587
ExoLocator--an online view into genetic makeup of vertebrate proteins.

PubMed

Khoo, Aik Aun; Ogrizek-Tomas, Mario; Bulovic, Ana; Korpar, Matija; Gürler, Ece; Slijepcevic, Ivan; Šikic, Mile; Mihalek, Ivana

2014-01-01

ExoLocator (http://exolocator.eopsf.org) collects in a single place information needed for comparative analysis of protein-coding exons from vertebrate species. The main source of data--the genomic sequences, and the existing exon and homology annotation--is the ENSEMBL database of completed vertebrate genomes. To these, ExoLocator adds the search for ostensibly missing exons in orthologous protein pairs across species, using an extensive computational pipeline to narrow down the search region for the candidate exons and find a suitable template in the other species, as well as state-of-the-art implementations of pairwise alignment algorithms. The resulting complements of exons are organized in a way currently unique to ExoLocator: multiple sequence alignments, both on the nucleotide and on the peptide levels, clearly indicating the exon boundaries. The alignments can be inspected in the web-embedded viewer, downloaded or used on the spot to produce an estimate of conservation within orthologous sets, or functional divergence across paralogues.
Genome-wide analysis of the DNA-binding with one zinc finger (Dof) transcription factor family in bananas.

PubMed

Dong, Chen; Hu, Huigang; Xie, Jianghui

2016-12-01

DNA-binding with one finger (Dof) domain proteins are a multigene family of plant-specific transcription factors involved in numerous aspects of plant growth and development. In this study, we report a genome-wide search for Musa acuminata Dof (MaDof) genes and their expression profiles at different developmental stages and in response to various abiotic stresses. In addition, a complete overview of the Dof gene family in bananas is presented, including the gene structures, chromosomal locations, cis-regulatory elements, conserved protein domains, and phylogenetic inferences. Based on the genome-wide analysis, we identified 74 full-length protein-coding MaDof genes unevenly distributed on 11 chromosomes. Phylogenetic analysis with Dof members from diverse plant species showed that MaDof genes can be classified into four subgroups (StDof I, II, III, and IV). The detailed genomic information of the MaDof gene homologs in the present study provides opportunities for functional analyses to unravel the exact role of the genes in plant growth and development.
Dissociation of Paramyxovirus Interferon Evasion Activities: Universal and Virus-Specific Requirements for Conserved V Protein Amino Acids in MDA5 Interference ▿

PubMed Central

Ramachandran, Aparna; Horvath, Curt M.

2010-01-01

The V protein of the paramyxovirus subfamily Paramyxovirinae is an important virulence factor that can interfere with host innate immunity by inactivating the cytosolic pathogen recognition receptor MDA5. This interference is a result of a protein-protein interaction between the highly conserved carboxyl-terminal domain of the V protein and the helicase domain of MDA5. The V protein C-terminal domain (CTD) is an evolutionarily conserved 49- to 68-amino-acid region that coordinates two zinc atoms per protein chain. Site-directed mutagenesis of conserved residues in the V protein CTD has revealed both universal and virus-specific requirements for zinc coordination in MDA5 engagement and has also identified other conserved residues as critical for MDA5 interaction and interference. Mutation of these residues produces V proteins that are specifically defective for MDA5 interference and not impaired in targeting STAT1 for proteasomal degradation via the VDC ubiquitin ligase complex. Results demonstrate that mutation of conserved charged residues in the V proteins of Nipah virus, measles virus, and mumps virus also abolishes MDA5 interaction. These findings clearly define molecular determinants for MDA5 inhibition by the paramyxovirus V proteins. PMID:20719949
76 FR 19971 - Notice of Proposed Changes to the National Handbook of Conservation Practices for the Natural...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-04-11

... 344), Silvopasture Establishment (Code 381), Tree/Shrub Establishment (Code 612), Waste Recycling... Criteria were added. Tree/Shrub Establishment (Code 612)--A new Purpose of ``Develop Renewable Energy...
A Global Overview of the Genetic and Functional Diversity in the Helicobacter pylori cag Pathogenicity Island

PubMed Central

Moodley, Yoshan; Uhr, Markus; Stamer, Christiana; Vauterin, Marc; Suerbaum, Sebastian; Achtman, Mark

2010-01-01

The Helicobacter pylori cag pathogenicity island (cagPAI) encodes a type IV secretion system. Humans infected with cagPAI–carrying H. pylori are at increased risk for sequelae such as gastric cancer. Housekeeping genes in H. pylori show considerable genetic diversity; but the diversity of virulence factors such as the cagPAI, which transports the bacterial oncogene CagA into host cells, has not been systematically investigated. Here we compared the complete cagPAI sequences for 38 representative isolates from all known H. pylori biogeographic populations. Their gene content and gene order were highly conserved. The phylogeny of most cagPAI genes was similar to that of housekeeping genes, indicating that the cagPAI was probably acquired only once by H. pylori, and its genetic diversity reflects the isolation by distance that has shaped this bacterial species since modern humans migrated out of Africa. Most isolates induced IL-8 release in gastric epithelial cells, indicating that the function of the Cag secretion system has been conserved despite some genetic rearrangements. More than one third of cagPAI genes, in particular those encoding cell-surface exposed proteins, showed signatures of diversifying (Darwinian) selection at more than 5% of codons. Several unknown gene products predicted to be under Darwinian selection are also likely to be secreted proteins (e.g. HP0522, HP0535). One of these, HP0535, is predicted to code for either a new secreted candidate effector protein or a protein which interacts with CagA because it contains two genetic lineages, similar to cagA. Our study provides a resource that can guide future research on the biological roles and host interactions of cagPAI proteins, including several whose function is still unknown. PMID:20808891
A global overview of the genetic and functional diversity in the Helicobacter pylori cag pathogenicity island.

PubMed

Olbermann, Patrick; Josenhans, Christine; Moodley, Yoshan; Uhr, Markus; Stamer, Christiana; Vauterin, Marc; Suerbaum, Sebastian; Achtman, Mark; Linz, Bodo

2010-08-19

The Helicobacter pylori cag pathogenicity island (cagPAI) encodes a type IV secretion system. Humans infected with cagPAI-carrying H. pylori are at increased risk for sequelae such as gastric cancer. Housekeeping genes in H. pylori show considerable genetic diversity; but the diversity of virulence factors such as the cagPAI, which transports the bacterial oncogene CagA into host cells, has not been systematically investigated. Here we compared the complete cagPAI sequences for 38 representative isolates from all known H. pylori biogeographic populations. Their gene content and gene order were highly conserved. The phylogeny of most cagPAI genes was similar to that of housekeeping genes, indicating that the cagPAI was probably acquired only once by H. pylori, and its genetic diversity reflects the isolation by distance that has shaped this bacterial species since modern humans migrated out of Africa. Most isolates induced IL-8 release in gastric epithelial cells, indicating that the function of the Cag secretion system has been conserved despite some genetic rearrangements. More than one third of cagPAI genes, in particular those encoding cell-surface exposed proteins, showed signatures of diversifying (Darwinian) selection at more than 5% of codons. Several unknown gene products predicted to be under Darwinian selection are also likely to be secreted proteins (e.g. HP0522, HP0535). One of these, HP0535, is predicted to code for either a new secreted candidate effector protein or a protein which interacts with CagA because it contains two genetic lineages, similar to cagA. Our study provides a resource that can guide future research on the biological roles and host interactions of cagPAI proteins, including several whose function is still unknown.
X-Ray Crystal Structure of the passenger domain of Plasmid encoded toxin(Pet), an Autotransporter Enterotoxin from enteroaggregative Escherichia coli (EAEC)

PubMed Central

Meza-Aguilar, J. Domingo; Fromme, Petra; Torres-Larios, Alfredo; Mendoza-Hernández, Guillermo; Hernandez-Chiñas, Ulises; Monteros, Roberto A. Arreguin-Espinosa de los; Campos, Carlos A. Eslava; Fromme, Raimund

2014-01-01

Autotransporters (ATs) represent a superfamily of proteins produced by a variety of pathogenic bacteria, which include the pathogenic groups of Escherichia coli (E. coli) associated with gastrointestinal and urinary tract infections. We present the first X-ray structure of the passenger domain from the Plasmid-encoded toxin (Pet) a 100 kDa protein at 2.3 Å resolution which is a cause of acute diarrhea in both developing and industrialized countries. Pet is a cytoskeleton-altering toxin that induces loss of actin stress fibers. While Pet (pdb code: 4OM9) shows only a sequence identity of 50 % compared to the closest related protein sequence, extracellular serine protease plasmid (EspP) the structural features of both proteins are conserved. A closer structural look reveals that Pet contains a β-pleaded sheet at the sequence region of residues 181-190, the corresponding structural domain in EspP consists of a coiled loop. Secondary, the Pet passenger domain features a more pronounced beta sheet between residues 135-143 compared to the structure of EspP. PMID:24530907
Crystal Structures of SlyA Protein, a Master Virulence Regulator of Salmonella, in Free and DNA-bound States

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dolan, Kyle T.; Duguid, Erica M.; He, Chuan

2011-11-17

SlyA is a master virulence regulator that controls the transcription of numerous genes in Salmonella enterica. We present here crystal structures of SlyA by itself and bound to a high-affinity DNA operator sequence in the slyA gene. SlyA interacts with DNA through direct recognition of a guanine base by Arg-65, as well as interactions between conserved Arg-86 and the minor groove and a large network of non-base-specific contacts with the sugar phosphate backbone. Our structures, together with an unpublished structure of SlyA bound to the small molecule effector salicylate (Protein Data Bank code 3DEU), reveal that, unlike many other MarRmore » family proteins, SlyA dissociates from DNA without large conformational changes when bound to this effector. We propose that SlyA and other MarR global regulators rely more on indirect readout of DNA sequence to exert control over many genes, in contrast to proteins (such as OhrR) that recognize a single operator.« less
Mitoregulin: A lncRNA-Encoded Microprotein that Supports Mitochondrial Supercomplexes and Respiratory Efficiency.

PubMed

Stein, Colleen S; Jadiya, Pooja; Zhang, Xiaoming; McLendon, Jared M; Abouassaly, Gabrielle M; Witmer, Nathan H; Anderson, Ethan J; Elrod, John W; Boudreau, Ryan L

2018-06-26

Mitochondria are composed of many small proteins that control protein synthesis, complex assembly, metabolism, and ion and reactive oxygen species (ROS) handling. We show that a skeletal muscle- and heart-enriched long non-coding RNA, LINC00116, encodes a highly conserved 56-amino-acid microprotein that we named mitoregulin (Mtln). Mtln localizes to the inner mitochondrial membrane, where it binds cardiolipin and influences protein complex assembly. In cultured cells, Mtln overexpression increases mitochondrial membrane potential, respiration rates, and Ca 2+ retention capacity while decreasing mitochondrial ROS and matrix-free Ca 2+ . Mtln-knockout mice display perturbations in mitochondrial respiratory (super)complex formation and activity, fatty acid oxidation, tricarboxylic acid (TCA) cycle enzymes, and Ca 2+ retention capacity. Blue-native gel electrophoresis revealed that Mtln co-migrates alongside several complexes, including the complex I assembly module, complex V, and supercomplexes. Under denaturing conditions, Mtln remains in high-molecular-weight complexes, supporting its role as a sticky molecular tether that enhances respiratory efficiency by bolstering protein complex assembly and/or stability. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
The complete genome sequence of the Atlantic salmon paramyxovirus (ASPV)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nylund, Stian; Karlsen, Marius; Nylund, Are

2008-03-30

The complete RNA genome of the Atlantic salmon paramyxovirus (ASPV), isolated from Atlantic salmon suffering from proliferative gill inflammation (PGI), has been determined. The genome is 16,965 nucleotides in length and consists of six nonoverlapping genes in the order 3'- N - P/C/V - M - F - HN - L -5', coding for the nucleocapsid, phospho-, matrix, fusion, hemagglutinin-neuraminidase and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and trinucleotide intergenic regions similar to those of other Paramyxoviridae. The ASPV P-gene expression strategy is like that of the respiro- and morbilliviruses,more » which express the phosphoprotein from the primary transcript, and edit a portion of the mRNA to encode the accessory proteins V and W. It also encodes the C-protein by ribosomal choice of translation initiation. Pairwise comparisons of amino acid identities, and phylogenetic analysis of deduced ASPV protein sequences with homologous sequences from other Paramyxoviridae, show that ASPV has an affinity for the genus Respirovirus, but may represent a new genus within the subfamily Paramyxovirinae.« less
EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation

PubMed Central

Amidi, Afshine; Megalooikonomou, Vasileios; Paragios, Nikos

2018-01-01

During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank (PDB) has increased more than 15-fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence, however, is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The two-layer architecture was investigated on a large dataset of 63,558 enzymes from the PDB and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet. PMID:29740518
Polymorphism of the Pv200L Fragment of Merozoite Surface Protein-1 of Plasmodium vivax in Clinical Isolates from the Pacific Coast of Colombia

PubMed Central

Valderrama-Aguirre, Augusto; Zúñiga-Soto, Evelin; Mariño-Ramírez, Leonardo; Moreno, Luz Ángela; Escalante, Ananías A.; Arévalo-Herrera, Myriam; Herrera, Sócrates

2011-01-01

Merozoite surface protein 1 (MSP-1) is a polymorphic malaria protein with functional domains involved in parasite erythrocyte interaction. Plasmodium vivax MSP-1 has a fragment (Pv200L) that has been identified as a potential subunit vaccine because it is highly immunogenic and induces partial protection against infectious parasite challenge in vaccinated monkeys. To determine the extent of genetic polymorphism and its effect on the translated protein, we sequenced the Pv200L coding region from isolates of 26 P. vivax-infected patients in a malaria-endemic area of Colombia. The extent of nucleotide diversity (π) in these isolates (0.061 ± 0.004) was significantly lower (P ≤ 0.001) than that observed in Thai and Brazilian isolates; 0.083 ± 0.006 and 0.090 ± 0.006, respectively. We found two new alleles and several previously unidentified dimorphic substitutions and significant size polymorphism. The presence of highly conserved blocks in this fragment has important implications for the development of Pv200L as a subunit vaccine candidate. PMID:21292880
EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation.

PubMed

Amidi, Afshine; Amidi, Shervine; Vlachakis, Dimitrios; Megalooikonomou, Vasileios; Paragios, Nikos; Zacharaki, Evangelia I

2018-01-01

During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank (PDB) has increased more than 15-fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence, however, is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The two-layer architecture was investigated on a large dataset of 63,558 enzymes from the PDB and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet.
The central nervous system transcriptome of the weakly electric brown ghost knifefish (Apteronotus leptorhynchus): de novo assembly, annotation, and proteomics validation.

PubMed

Salisbury, Joseph P; Sîrbulescu, Ruxandra F; Moran, Benjamin M; Auclair, Jared R; Zupanc, Günther K H; Agar, Jeffrey N

2015-03-11

The brown ghost knifefish (Apteronotus leptorhynchus) is a weakly electric teleost fish of particular interest as a versatile model system for a variety of research areas in neuroscience and biology. The comprehensive information available on the neurophysiology and neuroanatomy of this organism has enabled significant advances in such areas as the study of the neural basis of behavior, the development of adult-born neurons in the central nervous system and their involvement in the regeneration of nervous tissue, as well as brain aging and senescence. Despite substantial scientific interest in this species, no genomic resources are currently available. Here, we report the de novo assembly and annotation of the A. leptorhynchus transcriptome. After evaluating several trimming and transcript reconstruction strategies, de novo assembly using Trinity uncovered 42,459 unique contigs containing at least a partial protein-coding sequence based on alignment to a reference set of known Actinopterygii sequences. As many as 11,847 of these contigs contained full or near-full length protein sequences, providing broad coverage of the proteome. A variety of non-coding RNA sequences were also identified and annotated, including conserved long intergenic non-coding RNA and other long non-coding RNA observed previously to be expressed in adult zebrafish (Danio rerio) brain, as well as a variety of miRNA, snRNA, and snoRNA. Shotgun proteomics confirmed translation of open reading frames from over 2,000 transcripts, including alternative splice variants. Assignment of tandem mass spectra was greatly improved by use of the assembly compared to databases of sequences from closely related organisms. The assembly and raw reads have been deposited at DDBJ/EMBL/GenBank under the accession number GBKR00000000. Tandem mass spectrometry data is available via ProteomeXchange with identifier PXD001285. Presented here is the first release of an annotated de novo transcriptome assembly from Apteronotus leptorhynchus, providing a broad overview of RNA expressed in central nervous system tissue. The assembly, which includes substantial coverage of a wide variety of both protein coding and non-coding transcripts, will allow the development of better tools to understand the mechanisms underlying unique characteristics of the knifefish model system, such as their tremendous regenerative capacity and negligible brain senescence.
Coiled-coil length: Size does matter.

PubMed

Surkont, Jaroslaw; Diekmann, Yoan; Ryder, Pearl V; Pereira-Leal, Jose B

2015-12-01

Protein evolution is governed by processes that alter primary sequence but also the length of proteins. Protein length may change in different ways, but insertions, deletions and duplications are the most common. An optimal protein size is a trade-off between sequence extension, which may change protein stability or lead to acquisition of a new function, and shrinkage that decreases metabolic cost of protein synthesis. Despite the general tendency for length conservation across orthologous proteins, the propensity to accept insertions and deletions is heterogeneous along the sequence. For example, protein regions rich in repetitive peptide motifs are well known to extensively vary their length across species. Here, we analyze length conservation of coiled-coils, domains formed by an ubiquitous, repetitive peptide motif present in all domains of life, that frequently plays a structural role in the cell. We observed that, despite the repetitive nature, the length of coiled-coil domains is generally highly conserved throughout the tree of life, even when the remaining parts of the protein change, including globular domains. Length conservation is independent of primary amino acid sequence variation, and represents a conservation of domain physical size. This suggests that the conservation of domain size is due to functional constraints. © 2015 Wiley Periodicals, Inc.
The impact of rare variation on gene expression across tissues.

PubMed

Li, Xin; Kim, Yungil; Tsang, Emily K; Davis, Joe R; Damani, Farhan N; Chiang, Colby; Hess, Gaelen T; Zappala, Zachary; Strober, Benjamin J; Scott, Alexandra J; Li, Amy; Ganna, Andrea; Bassik, Michael C; Merker, Jason D; Hall, Ira M; Battle, Alexis; Montgomery, Stephen B

2017-10-11

Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.
Silencing of X-Linked MicroRNAs by Meiotic Sex Chromosome Inactivation

PubMed Central

Royo, Hélène; Seitz, Hervé; ElInati, Elias; Peters, Antoine H. F. M.; Stadler, Michael B.; Turner, James M. A.

2015-01-01

During the pachytene stage of meiosis in male mammals, the X and Y chromosomes are transcriptionally silenced by Meiotic Sex Chromosome Inactivation (MSCI). MSCI is conserved in therian mammals and is essential for normal male fertility. Transcriptomics approaches have demonstrated that in mice, most or all protein-coding genes on the X chromosome are subject to MSCI. However, it is unclear whether X-linked non-coding RNAs behave in a similar manner. The X chromosome is enriched in microRNA (miRNA) genes, with many exhibiting testis-biased expression. Importantly, high expression levels of X-linked miRNAs (X-miRNAs) have been reported in pachytene spermatocytes, indicating that these genes may escape MSCI, and perhaps play a role in the XY-silencing process. Here we use RNA FISH to examine X-miRNA expression in the male germ line. We find that, like protein-coding X-genes, X-miRNAs are expressed prior to prophase I and are thereafter silenced during pachynema. X-miRNA silencing does not occur in mouse models with defective MSCI. Furthermore, X-miRNAs are expressed at pachynema when present as autosomally integrated transgenes. Thus, we conclude that silencing of X-miRNAs during pachynema in wild type males is MSCI-dependent. Importantly, misexpression of X-miRNAs during pachynema causes spermatogenic defects. We propose that MSCI represents a chromosomal mechanism by which X-miRNAs, and other potential X-encoded repressors, can be silenced, thereby regulating genes with critical late spermatogenic functions. PMID:26509798
Mitochondrial genome of the African lion Panthera leo leo.

PubMed

Ma, Yue-ping; Wang, Shuo

2015-01-01

In this study, the complete mitochondrial genome sequence of the African lion P. leo leo was reported. The total length of the mitogenome was 17,054 bp. It contained the typical mitochondrial structure, including 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes and 1 control region; 21 of the tRNA genes folded into typical cloverleaf secondary structure except for tRNASe. The overall composition of the mitogenome was A (32.0%), G (14.5%), C (26.5%) and T (27.0%). The new sequence will provide molecular genetic information for conservation genetics study of this important large carnivore.

Complete mitochondrial genome sequence of northeastern sika deer (Cervus nippon hortulorum).

PubMed

Shao, Yuanchen; Zha, Daiming; Xing, Xiumei; Su, Weilin; Liu, Huamiao; Zhang, Ranran

2016-01-01

The complete mitochondrial genome of the northeastern sika deer, Cervus nippon hortulorum, was determined by accurate polymerase chain reaction. The entire genome is 16,434 bp in length and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and 1 control region, all of which are arranged in a typical vertebrate manner. The overall base composition of the northeastern sika deer's mitochondrial genome is 33.3% of A, 24.5% of C, 28.7% of T and 13.5% of G. A termination associated sequence and several conserved central sequence block domains were discovered within the control region.
Complete mitochondrial genome of the Freshwater Catfish Rita rita (Siluriformes, Bagridae).

PubMed

Lashari, Punhal; Laghari, Muhammad Younis; Xu, Peng; Zhao, Zixia; Jiang, Li; Narejo, Naeem Tariq; Deng, Yulin; Sun, Xiaowen; Zhang, Yan

2015-01-01

The complete mitochondrial genome of Catfish, Rita rita, was isolated by LA PCR (TakaRa LAtaq, Dalian, China); and sequenced by Sanger's method to obtain the complete mitochondrial genome, which is listed Critically Endangered and Red Listed species. The complete mitogenome was 16,449 bp in length and contains 13 typical vertebrate protein-coding genes, 2 rRNA and 22 tRNA genes. The whole genome base composition was estimated to be 33.40% A, 27.43% C, 14.26% G and 24.89% T. The complete mitochondrial genome of catfish, Rita rita provides the basis for genetic breeding and conservation studies.
75 FR 46903 - Notice of Proposed Changes to the National Handbook of Conservation Practices for the Natural...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-08-04

... Treatment (Code 521D), Pond Sealing or Lining--Soil Dispersant Treatment (Code 521B), Salinity and Sodic Soil Management (Code 610), Stream Habitat Improvement and Management (Code 395), Vertical Drain (Code... the criteria section; an expansion of the considerations section to include fish and wildlife and soil...
Variations in the non-coding transcriptome as a driver of inter-strain divergence and physiological adaptation in bacteria

PubMed Central

Kopf, Matthias; Klähn, Stephan; Scholz, Ingeborg; Hess, Wolfgang R.; Voß, Björn

2015-01-01

In all studied organisms, a substantial portion of the transcriptome consists of non-coding RNAs that frequently execute regulatory functions. Here, we have compared the primary transcriptomes of the cyanobacteria Synechocystis sp. PCC 6714 and PCC 6803 under 10 different conditions. These strains share 2854 protein-coding genes and a 16S rRNA identity of 99.4%, indicating their close relatedness. Conserved major transcriptional start sites (TSSs) give rise to non-coding transcripts within the sigB gene, from the 5′UTRs of cmpA and isiA, and 168 loci in antisense orientation. Distinct differences include single nucleotide polymorphisms rendering promoters inactive in one of the strains, e.g., for cmpR and for the asRNA PsbA2R. Based on the genome-wide mapped location, regulation and classification of TSSs, non-coding transcripts were identified as the most dynamic component of the transcriptome. We identified a class of mRNAs that originate by read-through from an sRNA that accumulates as a discrete and abundant transcript while also serving as the 5′UTR. Such an sRNA/mRNA structure, which we name ‘actuaton’, represents another way for bacteria to remodel their transcriptional network. Our findings support the hypothesis that variations in the non-coding transcriptome constitute a major evolutionary element of inter-strain divergence and capability for physiological adaptation. PMID:25902393
Highly conserved elements discovered in vertebrates are present in non-syntenic loci of tunicates, act as enhancers and can be transcribed during development

PubMed Central

Sanges, Remo; Hadzhiev, Yavor; Gueroult-Bellone, Marion; Roure, Agnes; Ferg, Marco; Meola, Nicola; Amore, Gabriele; Basu, Swaraj; Brown, Euan R.; De Simone, Marco; Petrera, Francesca; Licastro, Danilo; Strähle, Uwe; Banfi, Sandro; Lemaire, Patrick; Birney, Ewan; Müller, Ferenc; Stupka, Elia

2013-01-01

Co-option of cis-regulatory modules has been suggested as a mechanism for the evolution of expression sites during development. However, the extent and mechanisms involved in mobilization of cis-regulatory modules remains elusive. To trace the history of non-coding elements, which may represent candidate ancestral cis-regulatory modules affirmed during chordate evolution, we have searched for conserved elements in tunicate and vertebrate (Olfactores) genomes. We identified, for the first time, 183 non-coding sequences that are highly conserved between the two groups. Our results show that all but one element are conserved in non-syntenic regions between vertebrate and tunicate genomes, while being syntenic among vertebrates. Nevertheless, in all the groups, they are significantly associated with transcription factors showing specific functions fundamental to animal development, such as multicellular organism development and sequence-specific DNA binding. The majority of these regions map onto ultraconserved elements and we demonstrate that they can act as functional enhancers within the organism of origin, as well as in cross-transgenesis experiments, and that they are transcribed in extant species of Olfactores. We refer to the elements as ‘Olfactores conserved non-coding elements’. PMID:23393190
Abundant RNA editing sites of chloroplast protein-coding genes in Ginkgo biloba and an evolutionary pattern analysis.

PubMed

He, Peng; Huang, Sheng; Xiao, Guanghui; Zhang, Yuzhou; Yu, Jianing

2016-12-01

RNA editing is a posttranscriptional modification process that alters the RNA sequence so that it deviates from the genomic DNA sequence. RNA editing mainly occurs in chloroplasts and mitochondrial genomes, and the number of editing sites varies in terrestrial plants. Why and how RNA editing systems evolved remains a mystery. Ginkgo biloba is one of the oldest seed plants and has an important evolutionary position. Determining the patterns and distribution of RNA editing in the ancient plant provides insights into the evolutionary trend of RNA editing, and helping us to further understand their biological significance. In this paper, we investigated 82 protein-coding genes in the chloroplast genome of G. biloba and identified 255 editing sites, which is the highest number of RNA editing events reported in a gymnosperm. All of the editing sites were C-to-U conversions, which mainly occurred in the second codon position, biased towards to the U_A context, and caused an increase in hydrophobic amino acids. RNA editing could change the secondary structures of 82 proteins, and create or eliminate a transmembrane region in five proteins as determined in silico. Finally, the evolutionary tendencies of RNA editing in different gene groups were estimated using the nonsynonymous-synonymous substitution rate selection mode. The G. biloba chloroplast genome possesses the highest number of RNA editing events reported so far in a seed plant. Most of the RNA editing sites can restore amino acid conservation, increase hydrophobicity, and even influence protein structures. Similar purifying selections constitute the dominant evolutionary force at the editing sites of essential genes, such as the psa, some psb and pet groups, and a positive selection occurred in the editing sites of nonessential genes, such as most ndh and a few psb genes.
The Puf family of RNA-binding proteins in plants: phylogeny, structural modeling, activity and subcellular localization

PubMed Central

2010-01-01

Background Puf proteins have important roles in controlling gene expression at the post-transcriptional level by promoting RNA decay and repressing translation. The Pumilio homology domain (PUM-HD) is a conserved region within Puf proteins that binds to RNA with sequence specificity. Although Puf proteins have been well characterized in animal and fungal systems, little is known about the structural and functional characteristics of Puf-like proteins in plants. Results The Arabidopsis and rice genomes code for 26 and 19 Puf-like proteins, respectively, each possessing eight or fewer Puf repeats in their PUM-HD. Key amino acids in the PUM-HD of several of these proteins are conserved with those of animal and fungal homologs, whereas other plant Puf proteins demonstrate extensive variability in these amino acids. Three-dimensional modeling revealed that the predicted structure of this domain in plant Puf proteins provides a suitable surface for binding RNA. Electrophoretic gel mobility shift experiments showed that the Arabidopsis AtPum2 PUM-HD binds with high affinity to BoxB of the Drosophila Nanos Response Element I (NRE1) RNA, whereas a point mutation in the core of the NRE1 resulted in a significant reduction in binding affinity. Transient expression of several of the Arabidopsis Puf proteins as fluorescent protein fusions revealed a dynamic, punctate cytoplasmic pattern of localization for most of these proteins. The presence of predicted nuclear export signals and accumulation of AtPuf proteins in the nucleus after treatment of cells with leptomycin B demonstrated that shuttling of these proteins between the cytosol and nucleus is common among these proteins. In addition to the cytoplasmically enriched AtPum proteins, two AtPum proteins showed nuclear targeting with enrichment in the nucleolus. Conclusions The Puf family of RNA-binding proteins in plants consists of a greater number of members than any other model species studied to date. This, along with the amino acid variability observed within their PUM-HDs, suggests that these proteins may be involved in a wide range of post-transcriptional regulatory events that are important in providing plants with the ability to respond rapidly to changes in environmental conditions and throughout development. PMID:20214804
78 FR 49202 - Energy Conservation Program for Certain Commercial and Industrial Equipment: Proposed...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-08-13

.... EERE-2013-BT-STD-0030] RIN 1904-AD01 Energy Conservation Program for Certain Commercial and Industrial... efficiency of certain industrial equipment to conserve the energy resources of the Nation. DATES: DOE will... codification in the U.S. Code, establishes the ``Energy Conservation Program for Certain Industrial Equipment...
A low-dispersion, exactly energy-charge-conserving semi-implicit relativistic particle-in-cell algorithm

NASA Astrophysics Data System (ADS)

Chen, Guangye; Luis, Chacon; Bird, Robert; Stark, David; Yin, Lin; Albright, Brian

2017-10-01

Leap-frog based explicit algorithms, either ``energy-conserving'' or ``momentum-conserving'', do not conserve energy discretely. Time-centered fully implicit algorithms can conserve discrete energy exactly, but introduce large dispersion errors in the light-wave modes, regardless of timestep sizes. This can lead to intolerable simulation errors where highly accurate light propagation is needed (e.g. laser-plasma interactions, LPI). In this study, we selectively combine the leap-frog and Crank-Nicolson methods to produce a low-dispersion, exactly energy-and-charge-conserving PIC algorithm. Specifically, we employ the leap-frog method for Maxwell equations, and the Crank-Nicolson method for particle equations. Such an algorithm admits exact global energy conservation, exact local charge conservation, and preserves the dispersion properties of the leap-frog method for the light wave. The algorithm has been implemented in a code named iVPIC, based on the VPIC code developed at LANL. We will present numerical results that demonstrate the properties of the scheme with sample test problems (e.g. Weibel instability run for 107 timesteps, and LPI applications.
Prunus necrotic ringspot ilarvirus: nucleotide sequence of RNA3 and the relationship to other ilarviruses based on coat protein comparison.

PubMed

Guo, D; Maiss, E; Adam, G; Casper, R

1995-05-01

The RNA3 of prunus necrotic ringspot ilarvirus (PNRSV) has been cloned and its entire sequence determined. The RNA3 consists of 1943 nucleotides (nt) and possesses two large open reading frames (ORFs) separated by an intergenic region of 74 nt. The 5' proximal ORF is 855 nt in length and codes for a protein of molecular mass 31.4 kDa which has homologies with the putative movement protein of other members of the Bromoviridae. The 3' proximal ORF of 675 nt is the cistron for the coat protein (CP) and has a predicted molecular mass of 24.9 kDa. The sequence of the 3' non-coding region (NCR) of PNRSV RNA3 showed a high degree of similarity with those of tobacco streak virus (TSV), prune dwarf virus (PDV), apple mosaic virus (ApMV) and also alfalfa mosaic virus (AIMV). In addition it contained potential stem-loop structures with interspersed AUGC motifs characteristic for ilar- and alfamoviruses. This conserved primary and secondary structure in all 3' NCRs may be responsible for the interaction with homologous and heterologous CPs and subsequent activation of genome replication. The CP gene of an ApMV isolate (ApMV-G) of 657 nt has also been cloned and sequenced. Although ApMV and PNRSV have a distant serological relationship, the deduced amino acid sequences of their CPs have an identity of only 51.8%. The N termini of PNRSV and ApMV CPs have in common a zinc-finger motif and the potential to form an amphipathic helix.
Termination and read-through proteins encoded by genome segment 9 of Colorado tick fever virus.

PubMed

Mohd Jaafar, Fauziah; Attoui, Houssam; De Micco, Philippe; De Lamballerie, Xavier

2004-08-01

Genome segment 9 (Seg-9) of Colorado tick fever virus (CTFV) is 1884 bp long and contains a large open reading frame (ORF; 1845 nt in length overall), although a single in-frame stop codon (at nt 1052-1054) reduces the ORF coding capacity by approximately 40 %. However, analyses of highly conserved RNA sequences in the vicinity of the stop codon indicate that it belongs to a class of 'leaky terminators'. The third nucleotide positions in codons situated both before and after the stop codon, shows the highest variability, suggesting that both regions are translated during virus replication. This also suggests that the stop signal is functionally leaky, allowing read-through translation to occur. Indeed, both the truncated 'termination' protein and the full-length 'read-through' protein (VP9 and VP9', respectively) were detected in CTFV-infected cells, in cells transfected with a plasmid expressing only Seg-9 protein products, and in the in vitro translation products from undenatured Seg-9 ssRNA. The ratios of full-length and truncated proteins generated suggest that read-through may be down-regulated by other viral proteins. Western blot analysis of infected cells and purified CTFV showed that VP9 is a structural component of the virion, while VP9' is a non-structural protein.
The RtcB RNA ligase is an essential component of the metazoan unfolded protein response

PubMed Central

Kosmaczewski, Sara Guckian; Edwards, Tyson James; Han, Sung Min; Eckwahl, Matthew J; Meyer, Benjamin Isaiah; Peach, Sally; Hesselberth, Jay R; Wolin, Sandra L; Hammarlund, Marc

2014-01-01

RNA ligation can regulate RNA function by altering RNA sequence, structure and coding potential. For example, the function of XBP1 in mediating the unfolded protein response requires RNA ligation, as does the maturation of some tRNAs. Here, we describe a novel in vivo model in Caenorhabditis elegans for the conserved RNA ligase RtcB and show that RtcB ligates the xbp-1 mRNA during the IRE-1 branch of the unfolded protein response. Without RtcB, protein stress results in the accumulation of unligated xbp-1 mRNA fragments, defects in the unfolded protein response, and decreased lifespan. RtcB also ligates endogenous pre-tRNA halves, and RtcB mutants have defects in growth and lifespan that can be bypassed by expression of pre-spliced tRNAs. In addition, animals that lack RtcB have defects that are independent of tRNA maturation and the unfolded protein response. Thus, RNA ligation by RtcB is required for the function of multiple endogenous target RNAs including both xbp-1 and tRNAs. RtcB is uniquely capable of performing these ligation functions, and RNA ligation by RtcB mediates multiple essential processes in vivo. Subject Categories Protein Biosynthesis & Quality Control; RNA Biology PMID:25366321
Evolutionary and biophysical relationships among the papillomavirus E2 proteins.

PubMed

Blakaj, Dukagjin M; Fernandez-Fuentes, Narcis; Chen, Zigui; Hegde, Rashmi; Fiser, Andras; Burk, Robert D; Brenowitz, Michael

2009-01-01

Infection by human papillomavirus (HPV) may result in clinical conditions ranging from benign warts to invasive cancer. The HPV E2 protein represses oncoprotein transcription and is required for viral replication. HPV E2 binds to palindromic DNA sequences of highly conserved four base pair sequences flanking an identical length variable 'spacer'. E2 proteins directly contact the conserved but not the spacer DNA. Variation in naturally occurring spacer sequences results in differential protein affinity that is dependent on their sensitivity to the spacer DNA's unique conformational and/or dynamic properties. This article explores the biophysical character of this core viral protein with the goal of identifying characteristics that associated with risk of virally caused malignancy. The amino acid sequence, 3d structure and electrostatic features of the E2 protein DNA binding domain are highly conserved; specific interactions with DNA binding sites have also been conserved. In contrast, the E2 protein's transactivation domain does not have extensive surfaces of highly conserved residues. Rather, regions of high conservation are localized to small surface patches. Implications to cancer biology are discussed.
Recent applications of the transonic wing analysis computer code, TWING

NASA Technical Reports Server (NTRS)

Subramanian, N. R.; Holst, T. L.; Thomas, S. D.

1982-01-01

An evaluation of the transonic-wing-analysis computer code TWING is given. TWING utilizes a fully implicit approximate factorization iteration scheme to solve the full potential equation in conservative form. A numerical elliptic-solver grid-generation scheme is used to generate the required finite-difference mesh. Several wing configurations were analyzed, and the limits of applicability of this code was evaluated. Comparisons of computed results were made with available experimental data. Results indicate that the code is robust, accurate (when significant viscous effects are not present), and efficient. TWING generally produces solutions an order of magnitude faster than other conservative full potential codes using successive-line overrelaxation. The present method is applicable to a wide range of isolated wing configurations including high-aspect-ratio transport wings and low-aspect-ratio, high-sweep, fighter configurations.
Coevolutionary modeling of protein sequences: Predicting structure, function, and mutational landscapes

NASA Astrophysics Data System (ADS)

Weigt, Martin

Over the last years, biological research has been revolutionized by experimental high-throughput techniques, in particular by next-generation sequencing technology. Unprecedented amounts of data are accumulating, and there is a growing request for computational methods unveiling the information hidden in raw data, thereby increasing our understanding of complex biological systems. Statistical-physics models based on the maximum-entropy principle have, in the last few years, played an important role in this context. To give a specific example, proteins and many non-coding RNA show a remarkable degree of structural and functional conservation in the course of evolution, despite a large variability in amino acid sequences. We have developed a statistical-mechanics inspired inference approach - called Direct-Coupling Analysis - to link this sequence variability (easy to observe in sequence alignments, which are available in public sequence databases) to bio-molecular structure and function. In my presentation I will show, how this methodology can be used (i) to infer contacts between residues and thus to guide tertiary and quaternary protein structure prediction and RNA structure prediction, (ii) to discriminate interacting from non-interacting protein families, and thus to infer conserved protein-protein interaction networks, and (iii) to reconstruct mutational landscapes and thus to predict the phenotypic effect of mutations. References [1] M. Figliuzzi, H. Jacquier, A. Schug, O. Tenaillon and M. Weigt ''Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1'', Mol. Biol. Evol. (2015), doi: 10.1093/molbev/msv211 [2] E. De Leonardis, B. Lutz, S. Ratz, S. Cocco, R. Monasson, A. Schug, M. Weigt ''Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction'', Nucleic Acids Research (2015), doi: 10.1093/nar/gkv932 [3] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. Marks, C. Sander, R. Zecchina, J.N. Onuchic, T. Hwa, M. Weigt, ''Direct-coupling analysis of residue co-evolution captures native contacts across many protein families'', Proc. Natl. Acad. Sci. 108, E1293-E1301 (2011).
cDNA cloning and characterization of the human THRAP2 gene which maps to chromosome 12q24, and its mouse ortholog Thrap2.

PubMed

Musante, Luciana; Bartsch, Oliver; Ropers, Hans-Hilger; Kalscheuer, Vera M

2004-05-12

Characterization of a balanced t(2;12)(q37;q24) translocation in a patient with suspicion of Noonan syndrome revealed that the chromosome 12 breakpoint lies in the vicinity of a novel human gene, thyroid hormone receptor-associated protein 2 (THRAP2). We therefore characterized this gene and its mouse counterpart in more detail. Human and mouse THRAP2/Thrap2 span a genomic region of about 310 and >170 kilobases (kb), and both contain 31 exons. Corresponding transcripts are approximately 9.5 kb long. Their open reading frames code for proteins of 2210 and 2203 amino acids, which are 93% identical. By northern blot analysis, human and mouse THRAP2/Thrap2 genes showed ubiquitous expression. Transcripts were most abundant in human skeletal muscle and in mouse heart. THRAP2 protein is 56% identical to human TRAP240, which belongs to the thyroid hormone receptor associated protein (TRAP) complex and is evolutionary conserved up to yeast. This complex is involved in transcriptional regulation and is believed to serve as adapting interface between regulatory proteins bound to specific DNA sequences and RNA polymerase II.
De novo truncating variants in the AHDC1 gene encoding the AT-hook DNA-binding motif-containing protein 1 are associated with intellectual disability and developmental delay.

PubMed

Yang, Hui; Douglas, Ganka; Monaghan, Kristin G; Retterer, Kyle; Cho, Megan T; Escobar, Luis F; Tucker, Megan E; Stoler, Joan; Rodan, Lance H; Stein, Diane; Marks, Warren; Enns, Gregory M; Platt, Julia; Cox, Rachel; Wheeler, Patricia G; Crain, Carrie; Calhoun, Amy; Tryon, Rebecca; Richard, Gabriele; Vitazka, Patrik; Chung, Wendy K

2015-10-01

Whole-exome sequencing (WES) represents a significant breakthrough in clinical genetics, and identifies a genetic etiology in up to 30% of cases of intellectual disability (ID). Using WES, we identified seven unrelated patients with a similar clinical phenotype of severe intellectual disability or neurodevelopmental delay who were all heterozygous for de novo truncating variants in the AT-hook DNA-binding motif-containing protein 1 (AHDC1). The patients were all minimally verbal or nonverbal and had variable neurological problems including spastic quadriplegia, ataxia, nystagmus, seizures, autism, and self-injurious behaviors. Additional common clinical features include dysmorphic facial features and feeding difficulties associated with failure to thrive and short stature. The AHDC1 gene has only one coding exon, and the protein contains conserved regions including AT-hook motifs and a PDZ binding domain. We postulate that all seven variants detected in these patients result in a truncated protein missing critical functional domains, disrupting interactions with other proteins important for brain development. Our study demonstrates that truncating variants in AHDC1 are associated with ID and are primarily associated with a neurodevelopmental phenotype.
Robust Translation of the Nucleoid Protein Fis Requires a Remote Upstream AU Element and Is Enhanced by RNA Secondary Structure

PubMed Central

Nafissi, Maryam; Chau, Jeannette; Xu, Jimin

2012-01-01

Synthesis of the Fis nucleoid protein rapidly increases in response to nutrient upshifts, and Fis is one of the most abundant DNA binding proteins in Escherichia coli under nutrient-rich growth conditions. Previous work has shown that control of Fis synthesis occurs at transcription initiation of the dusB-fis operon. We show here that while translation of the dihydrouridine synthase gene dusB is low, unusual mechanisms operate to enable robust translation of fis. At least two RNA sequence elements located within the dusB coding region are responsible for high fis translation. The most important is an AU element centered 35 nucleotides (nt) upstream of the fis AUG, which may function as a binding site for ribosomal protein S1. In addition, a 44-nt segment located upstream of the AU element and predicted to form a stem-loop secondary structure plays a prominent role in enhancing fis translation. On the other hand, mutations close to the AUG, including over a potential Shine-Dalgarno sequence, have little effect on Fis protein levels. The AU element and stem-loop regions are phylogenetically conserved within dusB-fis operons of representative enteric bacteria. PMID:22389479
The genomic structure of the human Charcot-Leyden crystal protein gene is analogous to those of the galectin genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dyer, K.D.; Handen, J.S.; Rosenberg, H.F.

The Charcot-Leyden crystal (CLC) protein, or eosinophil lysophospholipase, is a characteristic protein of human eosinophils and basophils; recent work has demonstrated that the CLC protein is both structurally and functionally related to the galectin family of {beta}-galactoside binding proteins. The galectins as a group share a number of features in common, including a linear ligand binding site encoded on a single exon. In this work, we demonstrate that the intron-exon structure of the gene encoding CLC is analogous to those encoding the galectins. The coding sequence of the CLC gene is divided into four exons, with the entire {beta}-galactoside bindingmore » site encoded by exon III. We have isolated CLC {beta}-galactoside binding sites from both orangutan (Pongo pygmaeus) and murine (Mus musculus) genomic DNAs, both encoded on single exons, and noted conservation of the amino acids shown to interact directly with the {beta}-galactoside ligand. The most likely interpretation of these results suggests the occurrence of one or more exon duplication and insertion events, resulting in the distribution of this lectin domain to CLC as well as to the multiple galectin genes. 35 refs., 3 figs.« less
Structure of Lmaj006129AAA, a hypothetical protein from Leishmania major

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arakaki, Tracy; Le Trong, Isolde; Structural Genomics of Pathogenic Protozoa

2006-03-01

The crystal structure of a conserved hypothetical protein from L. major, Pfam sequence family PF04543, structural genomics target ID Lmaj006129AAA, has been determined at a resolution of 1.6 Å. The gene product of structural genomics target Lmaj006129 from Leishmania major codes for a 164-residue protein of unknown function. When SeMet expression of the full-length gene product failed, several truncation variants were created with the aid of Ginzu, a domain-prediction method. 11 truncations were selected for expression, purification and crystallization based upon secondary-structure elements and disorder. The structure of one of these variants, Lmaj006129AAH, was solved by multiple-wavelength anomalous diffraction (MAD)more » using ELVES, an automatic protein crystal structure-determination system. This model was then successfully used as a molecular-replacement probe for the parent full-length target, Lmaj006129AAA. The final structure of Lmaj006129AAA was refined to an R value of 0.185 (R{sub free} = 0.229) at 1.60 Å resolution. Structure and sequence comparisons based on Lmaj006129AAA suggest that proteins belonging to Pfam sequence families PF04543 and PF01878 may share a common ligand-binding motif.« less

RBFOX and PTBP1 proteins regulate the alternative splicing of micro-exons in human brain transcripts

PubMed Central

Sanchez-Pulido, Luis; Haerty, Wilfried

2015-01-01

Ninety-four percent of mammalian protein-coding exons exceed 51 nucleotides (nt) in length. The paucity of micro-exons (≤ 51 nt) suggests that their recognition and correct processing by the splicing machinery present greater challenges than for longer exons. Yet, because thousands of human genes harbor processed micro-exons, specialized mechanisms may be in place to promote their splicing. Here, we survey deep genomic data sets to define 13,085 micro-exons and to study their splicing mechanisms and molecular functions. More than 60% of annotated human micro-exons exhibit a high level of sequence conservation, an indicator of functionality. While most human micro-exons require splicing-enhancing genomic features to be processed, the splicing of hundreds of micro-exons is enhanced by the adjacent binding of splice factors in the introns of pre-messenger RNAs. Notably, splicing of a significant number of micro-exons was found to be facilitated by the binding of RBFOX proteins, which promote their inclusion in the brain, muscle, and heart. Our analyses suggest that accurate regulation of micro-exon inclusion by RBFOX proteins and PTBP1 plays an important role in the maintenance of tissue-specific protein–protein interactions. PMID:25524026
General Relativistic Smoothed Particle Hydrodynamics code developments: A progress report

NASA Astrophysics Data System (ADS)

Faber, Joshua; Silberman, Zachary; Rizzo, Monica

2017-01-01

We report on our progress in developing a new general relativistic Smoothed Particle Hydrodynamics (SPH) code, which will be appropriate for studying the properties of accretion disks around black holes as well as compact object binary mergers and their ejecta. We will discuss in turn the relativistic formalisms being used to handle the evolution, our techniques for dealing with conservative and primitive variables, as well as those used to ensure proper conservation of various physical quantities. Code tests and performance metrics will be discussed, as will the prospects for including smoothed particle hydrodynamics codes within other numerical relativity codebases, particularly the publicly available Einstein Toolkit. We acknowledge support from NSF award ACI-1550436 and an internal RIT D-RIG grant.
RAZORBACK - A Research Reactor Transient Analysis Code Version 1.0 - Volume 3: Verification and Validation Report.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Talley, Darren G.

2017-04-01

This report describes the work and results of the verification and validation (V&V) of the version 1.0 release of the Razorback code. Razorback is a computer code designed to simulate the operation of a research reactor (such as the Annular Core Research Reactor (ACRR)) by a coupled numerical solution of the point reactor kinetics equations, the energy conservation equation for fuel element heat transfer, the equation of motion for fuel element thermal expansion, and the mass, momentum, and energy conservation equations for the water cooling of the fuel elements. This V&V effort was intended to confirm that the code showsmore » good agreement between simulation and actual ACRR operations.« less
Numerical study of supersonic combustors by multi-block grids with mismatched interfaces

NASA Technical Reports Server (NTRS)

Moon, Young J.

1990-01-01

A three dimensional, finite rate chemistry, Navier-Stokes code was extended to a multi-block code with mismatched interface for practical calculations of supersonic combustors. To ensure global conservation, a conservative algorithm was used for the treatment of mismatched interfaces. The extended code was checked against one test case, i.e., a generic supersonic combustor with transverse fuel injection, examining solution accuracy, convergence, and local mass flux error. After testing, the code was used to simulate the chemically reacting flow fields in a scramjet combustor with parallel fuel injectors (unswept and swept ramps). Computational results were compared with experimental shadowgraph and pressure measurements. Fuel-air mixing characteristics of the unswept and swept ramps were compared and investigated.
10 CFR 434.99 - Explanation of numbering system for codes.

Code of Federal Regulations, 2011 CFR

2011-01-01

... 10 Energy 3 2011-01-01 2011-01-01 false Explanation of numbering system for codes. 434.99 Section 434.99 Energy DEPARTMENT OF ENERGY ENERGY CONSERVATION ENERGY CODE FOR NEW FEDERAL COMMERCIAL AND MULTI-FAMILY HIGH RISE RESIDENTIAL BUILDINGS § 434.99 Explanation of numbering system for codes. (a) For...
10 CFR 434.99 - Explanation of numbering system for codes.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 10 Energy 3 2010-01-01 2010-01-01 false Explanation of numbering system for codes. 434.99 Section 434.99 Energy DEPARTMENT OF ENERGY ENERGY CONSERVATION ENERGY CODE FOR NEW FEDERAL COMMERCIAL AND MULTI-FAMILY HIGH RISE RESIDENTIAL BUILDINGS § 434.99 Explanation of numbering system for codes. (a) For...
Functional Identification and Structure Determination of Two Novel Prolidases from cog1228 in the Amidohydrolase Superfamily

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xiang, Dao Feng; Patskovsky, Yury; Xu, Chengfu

2010-12-07

Two uncharacterized enzymes from the amidohydrolase superfamily belonging to cog1228 were cloned, expressed, and purified to homogeneity. The two proteins, Sgx9260c (gi|44242006) and Sgx9260b (gi|44479596), were derived from environmental DNA samples originating from the Sargasso Sea. The catalytic function and substrate profiles for Sgx9260c and Sgx9260b were determined using a comprehensive library of dipeptides and N-acyl derivative of L-amino acids. Sgx9260c catalyzes the hydrolysis of Gly-L-Pro, L-Ala-L-Pro, and N-acyl derivatives of L-Pro. The best substrate identified to date is N-acetyl-L-Pro with a value of k{sub cat}/K{sub m} of 3 x 10{sup 5} M{sup -1} s{sup -1}. Sgx9260b catalyzes the hydrolysismore » of L-hydrophobic L-Pro dipeptides and N-acyl derivatives of L-Pro. The best substrate identified to date is N-propionyl-L-Pro with a value of k{sub cat}/K{sub m} of 1 x 10{sup 5} M{sup -1} s{sup -1}. Three-dimensional structures of both proteins were determined by X-ray diffraction methods (PDB codes 3MKV and 3FEQ). These proteins fold as distorted ({beta}/{alpha})8-barrels with two divalent cations in the active site. The structure of Sgx9260c was also determined as a complex with the N-methylphosphonate derivative of L-Pro (PDB code 3N2C). In this structure the phosphonate moiety bridges the binuclear metal center, and one oxygen atom interacts with His-140. The {alpha}-carboxylate of the inhibitor interacts with Tyr-231. The proline side chain occupies a small substrate binding cavity formed by residues contributed from the loop that follows {beta}-strand 7 within the ({beta}/{alpha})8-barrel. A total of 38 other proteins from cog1228 are predicted to have the same substrate profile based on conservation of the substrate binding residues. The structure of an evolutionarily related protein, Cc2672 from Caulobacter crecentus, was determined as a complex with the N-methylphosphonate derivative of L-arginine (PDB code 3MTW).« less
Complementary DNA characterization and chromosomal localization of a human gene related to the poliovirus receptor-encoding gene.

PubMed

Lopez, M; Eberlé, F; Mattei, M G; Gabert, J; Birg, F; Bardin, F; Maroc, C; Dubreuil, P

1995-04-03

The human poliovirus (PV) receptor (PVR) is a member of the immunoglobulin (Ig) superfamily with unknown cellular function. We have isolated a human PVR-related (PRR) cDNA. The deduced amino acid (aa) sequence of PRR showed, in the extracellular region, 51.7 and 54.3% similarity with human PVR and with the murine PVR homolog, respectively. The cDNA coding sequence is 1.6-kb long and encodes a deduced 57-kDa protein; this protein has a structural organization analogous to that of PVR, that is, one V- and two C-set Ig domains, with a conserved number of aa. Northern blot analysis indicated that a major 5.9-kb transcript is present in all normal human tissues tested. In situ hybridization showed that the PRR gene is located at bands q23-q24 of human chromosome 11.
Monitoring Autophagy in the Model Green Microalga Chlamydomonas reinhardtii.

PubMed

Pérez-Pérez, María Esther; Couso, Inmaculada; Heredia-Martínez, Luis G; Crespo, José L

2017-10-22

Autophagy is an intracellular catabolic system that delivers cytoplasmic constituents and organelles in the vacuole. This degradative process is mediated by a group of proteins coded by autophagy-related ( ATG ) genes that are widely conserved from yeasts to plants and mammals. Homologs of ATG genes have been also identified in algal genomes including the unicellular model green alga Chlamydomonas reinhardtii . The development of specific tools to monitor autophagy in Chlamydomonas has expanded our current knowledge about the regulation and function of this process in algae. Recent findings indicated that autophagy is regulated by redox signals and the TOR network in Chlamydomonas and revealed that this process may play in important role in the control of lipid metabolism and ribosomal protein turnover in this alga. Here, we will describe the different techniques and approaches that have been reported to study autophagy and autophagic flux in Chlamydomonas.
Initial sequencing and comparative analysis of the mouse genome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan

2002-12-15

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of themore » genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.« less
Fluconazole Resistance Associated with Drug Efflux and Increased Transcription of a Drug Transporter Gene, PDH1, in Candida glabrata

PubMed Central

Miyazaki, Haruko; Miyazaki, Yoshitsugu; Geber, Antonia; Parkinson, Tanya; Hitchcock, Christopher; Falconer, Derek J.; Ward, Douglas J.; Marsden, Katherine; Bennett, John E.

1998-01-01

Sequential Candida glabrata isolates were obtained from the mouth of a patient infected with human immunodeficiency virus type 1 who was receiving high doses of fluconazole for oropharyngeal thrush. Fluconazole-susceptible colonies were replaced by resistant colonies that exhibited both increased fluconazole efflux and increased transcripts of a gene which codes for a protein with 72.5% identity to Pdr5p, an ABC multidrug transporter in Saccharomyces cerevisiae. The deduced protein had a molecular mass of 175 kDa and was composed of two homologous halves, each with six putative transmembrane domains and highly conserved sequences of ATP-binding domains. When the earliest and most azole-susceptible isolate of C. glabrata from this patient was exposed to fluconazole, increased transcripts of the PDR5 homolog appeared, linking azole exposure to regulation of this gene. PMID:9661006
Decelerated genome evolution in modern vertebrates revealed by analysis of multiple lancelet genomes

PubMed Central

Huang, Shengfeng; Chen, Zelin; Yan, Xinyu; Yu, Ting; Huang, Guangrui; Yan, Qingyu; Pontarotti, Pierre Antoine; Zhao, Hongchen; Li, Jie; Yang, Ping; Wang, Ruihua; Li, Rui; Tao, Xin; Deng, Ting; Wang, Yiquan; Li, Guang; Zhang, Qiujin; Zhou, Sisi; You, Leiming; Yuan, Shaochun; Fu, Yonggui; Wu, Fenfang; Dong, Meiling; Chen, Shangwu; Xu, Anlong

2014-01-01

Vertebrates diverged from other chordates ~500 Myr ago and experienced successful innovations and adaptations, but the genomic basis underlying vertebrate origins are not fully understood. Here we suggest, through comparison with multiple lancelet (amphioxus) genomes, that ancient vertebrates experienced high rates of protein evolution, genome rearrangement and domain shuffling and that these rates greatly slowed down after the divergence of jawed and jawless vertebrates. Compared with lancelets, modern vertebrates retain, at least relatively, less protein diversity, fewer nucleotide polymorphisms, domain combinations and conserved non-coding elements (CNE). Modern vertebrates also lost substantial transposable element (TE) diversity, whereas lancelets preserve high TE diversity that includes even the long-sought RAG transposon. Lancelets also exhibit rapid gene turnover, pervasive transcription, fastest exon shuffling in metazoans and substantial TE methylation not observed in other invertebrates. These new lancelet genome sequences provide new insights into the chordate ancestral state and the vertebrate evolution. PMID:25523484
Decelerated genome evolution in modern vertebrates revealed by analysis of multiple lancelet genomes.

PubMed

Huang, Shengfeng; Chen, Zelin; Yan, Xinyu; Yu, Ting; Huang, Guangrui; Yan, Qingyu; Pontarotti, Pierre Antoine; Zhao, Hongchen; Li, Jie; Yang, Ping; Wang, Ruihua; Li, Rui; Tao, Xin; Deng, Ting; Wang, Yiquan; Li, Guang; Zhang, Qiujin; Zhou, Sisi; You, Leiming; Yuan, Shaochun; Fu, Yonggui; Wu, Fenfang; Dong, Meiling; Chen, Shangwu; Xu, Anlong

2014-12-19

Vertebrates diverged from other chordates ~500 Myr ago and experienced successful innovations and adaptations, but the genomic basis underlying vertebrate origins are not fully understood. Here we suggest, through comparison with multiple lancelet (amphioxus) genomes, that ancient vertebrates experienced high rates of protein evolution, genome rearrangement and domain shuffling and that these rates greatly slowed down after the divergence of jawed and jawless vertebrates. Compared with lancelets, modern vertebrates retain, at least relatively, less protein diversity, fewer nucleotide polymorphisms, domain combinations and conserved non-coding elements (CNE). Modern vertebrates also lost substantial transposable element (TE) diversity, whereas lancelets preserve high TE diversity that includes even the long-sought RAG transposon. Lancelets also exhibit rapid gene turnover, pervasive transcription, fastest exon shuffling in metazoans and substantial TE methylation not observed in other invertebrates. These new lancelet genome sequences provide new insights into the chordate ancestral state and the vertebrate evolution.
The minisatellite of the GPI/AMF/NLK/MF gene: interspecies conservation and transcriptional activity.

PubMed

Williams, R R; Hassan-Walker, A F; Lavender, F L; Morgan, M; Faik, P; Ragoussis, J

2001-05-16

Minisatellites are tandemly repeated DNA sequences found throughout the genomes of all eukaryotes. They are regions often prone to instability and hence hypervariability; thus repeat unit sequence is generally not conserved beyond closely related species. We have studied the minisatellite located in intron 9 of the human glucose phosphate isomerase (GPI) gene (also known as neuroleukin, autocrine motility factor, maturation and differentiation factor) and have found, by Zoo blotting coupled with PCR amplification and DNA sequencing, that similar repeat units are present in seven other species of mammal. There is also evidence for the presence of the minisatellite in chicken. The repeat unit does not appear to be present at any other locus in these genomes. Minisatellite DNA has been reported to be involved in recombination activity, control of gene expression of nearby gene(s) (both transcriptional and translational), whilst others form protein coding regions. The high level of conservation exhibited by the GPI minisatellite, coupled with the unique location, strongly suggests a functional role. Our results from transient and stable transfections using luciferase reporter constructs have shown that the GPI minisatellite region can act to increase transcription from the SV40 promoter, CMV promoter and the human GPI promoter.
Molecular Characterization of a Catalase from Hydra vulgaris

PubMed Central

Dash, Bhagirathi; Phillips, Timothy D.

2012-01-01

Catalase, an antioxidant and hydroperoxidase enzyme protects the cellular environment from harmful effects of hydrogen peroxide by facilitating its degradation to oxygen and water. Molecular information on a cnidarian catalase and/or peroxidase is, however, limited. In this work an apparent full length cDNA sequence coding for a catalase (HvCatalase) was isolated from Hydra vulgaris using 3’- and 5’- (RLM) RACE approaches. The 1859 bp HvCatalase cDNA included an open reading frame of 1518 bp encoding a putative protein of 505 amino acids with a predicted molecular mass of 57.44 kDa. The deduced amino acid sequence of HvCatalase contained several highly conserved motifs including the heme-ligand signature sequence RLFSYGDTH and the active site signature FXRERIPERVVHAKGXGA. A comparative analysis showed the presence of conserved catalytic amino acids [His(71), Asn(145), and Tyr(354)] in HvCatalase as well. Homology modeling indicated the presence of the conserved features of mammalian catalase fold. Hydrae exposed to thermal, starvation, metal and oxidative stress responded by regulating its catalase mRNA transcription. These results indicated that the HvCatalase gene is involved in the cellular stress response and (anti)oxidative processes triggered by stressor and contaminant exposure. PMID:22521743
N6-methyladenine: a conserved and dynamic DNA mark

PubMed Central

O’Brown, Zach Klapholz; Greer, Eric Lieberman

2017-01-01

Chromatin, consisting of deoxyribonucleic acid (DNA) wrapped around histone proteins, facilitates DNA compaction and allows identical DNA code to confer many different cellular phenotypes. This biological versatility is accomplished in large part by post-translational modifications to histones and chemical modifications to DNA. These modifications direct the cellular machinery to expand or compact specific chromatin regions, and mark regions of the DNA as important for cellular functions. While each of the four bases that make up DNA can be modified (Iyer et al. 2011), this chapter will focus on methylation of the 6th position on adenines (6mA), as this modification has been poorly characterized in recently evolved eukaryotes but shows promise as a new conserved layer of epigenetic regulation. 6mA was previously thought to be restricted to unicellular organisms, but recent work has revealed its presence in more recently evolved metazoa. Here, we will briefly describe the history of 6mA, examine its evolutionary conservation, and evaluate the current methods for detecting 6mA. We will discuss the enzymes that bind and regulate this mark and finally examine known and potential functions of 6mA in eukaryotes. PMID:27826841
Long Non-Coding RNAs Responsive to Salt and Boron Stress in the Hyper-Arid Lluteño Maize from Atacama Desert.

PubMed

Huanca-Mamani, Wilson; Arias-Carrasco, Raúl; Cárdenas-Ninasivincha, Steffany; Rojas-Herrera, Marcelo; Sepúlveda-Hermosilla, Gonzalo; Caris-Maldonado, José Carlos; Bastías, Elizabeth; Maracaja-Coutinho, Vinicius

2018-03-20

Long non-coding RNAs (lncRNAs) have been defined as transcripts longer than 200 nucleotides, which lack significant protein coding potential and possess critical roles in diverse cellular processes. Long non-coding RNAs have recently been functionally characterized in plant stress-response mechanisms. In the present study, we perform a comprehensive identification of lncRNAs in response to combined stress induced by salinity and excess of boron in the Lluteño maize, a tolerant maize landrace from Atacama Desert, Chile. We use deep RNA sequencing to identify a set of 48,345 different lncRNAs, of which 28,012 (58.1%) are conserved with other maize (B73, Mo17 or Palomero), with the remaining 41.9% belonging to potentially Lluteño exclusive lncRNA transcripts. According to B73 maize reference genome sequence, most Lluteño lncRNAs correspond to intergenic transcripts. Interestingly, Lluteño lncRNAs presents an unusual overall higher expression compared to protein coding genes under exposure to stressed conditions. In total, we identified 1710 putatively responsive to the combined stressed conditions of salt and boron exposure. We also identified a set of 848 stress responsive potential trans natural antisense transcripts ( trans -NAT) lncRNAs, which seems to be regulating genes associated with regulation of transcription, response to stress, response to abiotic stimulus and participating of the nicotianamine metabolic process. Reverse transcription-quantitative PCR (RT-qPCR) experiments were performed in a subset of lncRNAs, validating their existence and expression patterns. Our results suggest that a diverse set of maize lncRNAs from leaves and roots is responsive to combined salt and boron stress, being the first effort to identify lncRNAs from a maize landrace adapted to extreme conditions such as the Atacama Desert. The information generated is a starting point to understand the genomic adaptabilities suffered by this maize to surpass this extremely stressed environment.
Long Non-Coding RNAs Responsive to Salt and Boron Stress in the Hyper-Arid Lluteño Maize from Atacama Desert

PubMed Central

Huanca-Mamani, Wilson; Arias-Carrasco, Raúl; Cárdenas-Ninasivincha, Steffany; Rojas-Herrera, Marcelo; Sepúlveda-Hermosilla, Gonzalo; Caris-Maldonado, José Carlos; Bastías, Elizabeth; Maracaja-Coutinho, Vinicius

2018-01-01

Long non-coding RNAs (lncRNAs) have been defined as transcripts longer than 200 nucleotides, which lack significant protein coding potential and possess critical roles in diverse cellular processes. Long non-coding RNAs have recently been functionally characterized in plant stress–response mechanisms. In the present study, we perform a comprehensive identification of lncRNAs in response to combined stress induced by salinity and excess of boron in the Lluteño maize, a tolerant maize landrace from Atacama Desert, Chile. We use deep RNA sequencing to identify a set of 48,345 different lncRNAs, of which 28,012 (58.1%) are conserved with other maize (B73, Mo17 or Palomero), with the remaining 41.9% belonging to potentially Lluteño exclusive lncRNA transcripts. According to B73 maize reference genome sequence, most Lluteño lncRNAs correspond to intergenic transcripts. Interestingly, Lluteño lncRNAs presents an unusual overall higher expression compared to protein coding genes under exposure to stressed conditions. In total, we identified 1710 putatively responsive to the combined stressed conditions of salt and boron exposure. We also identified a set of 848 stress responsive potential trans natural antisense transcripts (trans-NAT) lncRNAs, which seems to be regulating genes associated with regulation of transcription, response to stress, response to abiotic stimulus and participating of the nicotianamine metabolic process. Reverse transcription-quantitative PCR (RT-qPCR) experiments were performed in a subset of lncRNAs, validating their existence and expression patterns. Our results suggest that a diverse set of maize lncRNAs from leaves and roots is responsive to combined salt and boron stress, being the first effort to identify lncRNAs from a maize landrace adapted to extreme conditions such as the Atacama Desert. The information generated is a starting point to understand the genomic adaptabilities suffered by this maize to surpass this extremely stressed environment. PMID:29558449
ProClaT, a new bioinformatics tool for in silico protein reclassification: case study of DraB, a protein coded from the draTGB operon in Azospirillum brasilense.

PubMed

Rubel, Elisa Terumi; Raittz, Roberto Tadeu; Coimbra, Nilson Antonio da Rocha; Gehlen, Michelly Alves Coutinho; Pedrosa, Fábio de Oliveira

2016-12-15

Azopirillum brasilense is a plant-growth promoting nitrogen-fixing bacteria that is used as bio-fertilizer in agriculture. Since nitrogen fixation has a high-energy demand, the reduction of N 2 to NH 4 + by nitrogenase occurs only under limiting conditions of NH 4 + and O 2 . Moreover, the synthesis and activity of nitrogenase is highly regulated to prevent energy waste. In A. brasilense nitrogenase activity is regulated by the products of draG and draT. The product of the draB gene, located downstream in the draTGB operon, may be involved in the regulation of nitrogenase activity by an, as yet, unknown mechanism. A deep in silico analysis of the product of draB was undertaken aiming at suggesting its possible function and involvement with DraT and DraG in the regulation of nitrogenase activity in A. brasilense. In this work, we present a new artificial intelligence strategy for protein classification, named ProClaT. The features used by the pattern recognition model were derived from the primary structure of the DraB homologous proteins, calculated by a ProClaT internal algorithm. ProClaT was applied to this case study and the results revealed that the A. brasilense draB gene codes for a protein highly similar to the nitrogenase associated NifO protein of Azotobacter vinelandii. This tool allowed the reclassification of DraB/NifO homologous proteins, hypothetical, conserved hypothetical and those annotated as putative arsenate reductase, ArsC, as NifO-like. An analysis of co-occurrence of draB, draT, draG and of other nif genes was performed, suggesting the involvement of draB (nifO) in nitrogen fixation, however, without the definition of a specific function.
Isolation, Cloning, and Expression of an Acid Phosphatase Containing Phosphotyrosyl Phosphatase Activity from Prevotella intermedia

PubMed Central

Chen, Xiaochi; Ansai, Toshihiro; Awano, Shuji; Iida, Toshiya; Barik, Sailen; Takehara, Tadamichi

1999-01-01

A novel acid phosphatase containing phosphotyrosyl phosphatase (PTPase) activity, designated PiACP, from Prevotella intermedia ATCC 25611, an anaerobe implicated in progressive periodontal disease, has been purified and characterized. PiACP, a monomer with an apparent molecular mass of 30 kDa, did not require divalent metal cations for activity and was sensitive to orthovanadate but highly resistant to okadaic acid. The enzyme exhibited substantial activity against tyrosine phosphate-containing peptides derived from the epidermal growth factor receptor. On the basis of N-terminal and internal amino acid sequences of purified PiACP, the gene coding for PiACP was isolated and sequenced. The PiACP gene consisted of 792 bp and coded for a basic protein with an Mr of 29,164. The deduced amino acid sequence exhibited striking similarity (25 to 64%) to those of members of class A bacterial acid phosphatases, including PhoC of Morganella morganii, and involved a conserved phosphatase sequence motif that is shared among several lipid phosphatases and the mammalian glucose-6-phosphatases. The highly conservative motif HCXAGXXR in the active domain of PTPase was not found in PiACP. Mutagenesis of recombinant PiACP showed that His-170 and His-209 were essential for activity. Thus, the class A bacterial acid phosphatases including PiACP may function as atypical PTPases, the biological functions of which remain to be determined. PMID:10559178

Molecular characterization of dihydroneopterin aldolase and aminodeoxychorismate synthase in common bean-genes coding for enzymes in the folate synthesis pathway.

PubMed

Xie, Weilong; Perry, Gregory; Martin, C Joe; Shim, Youn-Seb; Navabi, Alireza; Pauls, K Peter

2017-07-01

Common beans (Phaseolus vulgaris) are excellent sources of dietary folates, but different varieties contain different amounts of these compounds. Genes coding for dihydroneopterin aldolase (DHNA) and aminodeoxychorismate synthase (ADCS) of the folate synthesis pathway were characterized by PCR amplification, BAC clone sequencing, and whole genome sequencing. All DHNA and ADCS genes in the Mesoamerican cultivar OAC Rex were isolated and compared with those genes in the genome of Andean genotype G19833. Both genotypes have two functional DHNA genes and one pseudo gene. PvDHNA1 and PvDHNA2 proteins have similar secondary structures and conserved residues as DHNA homologs in Staphylococcus aureus and Arabidopsis. Sequence analysis and synteny mapping indicated that PvDHNA1 might be a duplicated and transposed copy of PvDHNA2. There is only one ADCS gene (PvADCS) identified in the bean genome and it is identical in OAC Rex and G19833. PvADCS has the conserved motifs required for catalytic activity similar to other plant ADCS homologs. DHNA and ADCS gene-specific markers were developed, mapped, and compared to their physical locations on chromosomes 1 and 7, respectively. The gene-specific markers developed in this study should be useful for detection and selection of varieties with enhanced folate contents in bean breeding programs.
Identification of long non-coding RNAs in two anthozoan species and their possible implications for coral bleaching.

PubMed

Huang, Chen; Morlighem, Jean-Étienne R L; Cai, Jing; Liao, Qiwen; Perez, Carlos Daniel; Gomes, Paula Braga; Guo, Min; Rádis-Baptista, Gandhi; Lee, Simon Ming-Yuen

2017-07-13

Long non-coding RNAs (lncRNAs) have been shown to play regulatory roles in a diverse range of biological processes and are associated with the outcomes of various diseases. The majority of studies about lncRNAs focus on model organisms, with lessened investigation in non-model organisms to date. Herein, we have undertaken an investigation on lncRNA in two zoanthids (cnidarian): Protolpalythoa varibilis and Palythoa caribaeorum. A total of 11,206 and 13,240 lncRNAs were detected in P. variabilis and P. caribaeorum transcriptome, respectively. Comparison using NONCODE database indicated that the majority of these lncRNAs is taxonomically species-restricted with no identifiable orthologs. Even so, we found cases in which short regions of P. caribaeorum's lncRNAs were similar to vertebrate species' lncRNAs, and could be associated with lncRNA conserved regulatory functions. Consequently, some high-confidence lncRNA-mRNA interactions were predicted based on such conserved regions, therefore revealing possible involvement of lncRNAs in posttranscriptional processing and regulation in anthozoans. Moreover, investigation of differentially expressed lncRNAs, in healthy colonies and colonial individuals undergoing natural bleaching, indicated that some up-regulated lncRNAs in P. caribaeorum could posttranscriptionally regulate the mRNAs encoding proteins of Ras-mediated signal transduction pathway and components of innate immune-system, which could contribute to the molecular response of coral bleaching.
The Arabidopsis SRR1 gene mediates phyB signaling and is required for normal circadian clock function

PubMed Central

Staiger, Dorothee; Allenbach, Laure; Salathia, Neeraj; Fiechter, Vincent; Davis, Seth J.; Millar, Andrew J.; Chory, Joanne; Fankhauser, Christian

2003-01-01

Plants possess several photoreceptors to sense the light environment. In Arabidopsis cryptochromes and phytochromes play roles in photomorphogenesis and in the light input pathways that synchronize the circadian clock with the external world. We have identified SRR1 (sensitivity to red light reduced), a gene that plays an important role in phytochrome B (phyB)-mediated light signaling. The recessive srr1 null allele and phyB mutants display a number of similar phenotypes indicating that SRR1 is required for normal phyB signaling. Genetic analysis suggests that SRR1 works both in the phyB pathway but also independently of phyB. srr1 mutants are affected in multiple outputs of the circadian clock in continuous light conditions, including leaf movement and expression of the clock components, CCA1 and TOC1. Clock-regulated gene expression is also impaired during day–night cycles and in constant darkness. The circadian phenotypes of srr1 mutants in all three conditions suggest that SRR1 activity is required for normal oscillator function. The SRR1 gene was identified and shown to code for a protein conserved in numerous eukaryotes including mammals and flies, implicating a conserved role for this protein in both the animal and plant kingdoms. PMID:12533513
The sequence of camelpox virus shows it is most closely related to variola virus, the cause of smallpox.

PubMed

Gubser, Caroline; Smith, Geoffrey L

2002-04-01

Camelpox virus (CMPV) and variola virus (VAR) are orthopoxviruses (OPVs) that share several biological features and cause high mortality and morbidity in their single host species. The sequence of a virulent CMPV strain was determined; it is 202182 bp long, with inverted terminal repeats (ITRs) of 6045 bp and has 206 predicted open reading frames (ORFs). As for other poxviruses, the genes are tightly packed with little non-coding sequence. Most genes within 25 kb of each terminus are transcribed outwards towards the terminus, whereas genes within the centre of the genome are transcribed from either DNA strand. The central region of the genome contains genes that are highly conserved in other OPVs and 87 of these are conserved in all sequenced chordopoxviruses. In contrast, genes towards either terminus are more variable and encode proteins involved in host range, virulence or immunomodulation. In some cases, these are broken versions of genes found in other OPVs. The relationship of CMPV to other OPVs was analysed by comparisons of DNA and predicted protein sequences, repeats within the ITRs and arrangement of ORFs within the terminal regions. Each comparison gave the same conclusion: CMPV is the closest known virus to variola virus, the cause of smallpox.
Molecular cloning and characterization of sea bass (Dicentrarchus labrax, L.) Tapasin.

PubMed

Pinto, Rute D; da Silva, Diogo V; Pereira, Pedro J B; dos Santos, Nuno M S

2012-01-01

Mammalian tapasin (TPN) is a key member of the major histocompatibility complex (MHC) class I antigen presentation pathway, being part of the multi-protein complex called the peptide loading complex (PLC). Several studies describe its important roles in stabilizing empty MHC class I complexes, facilitating peptide loading and editing the repertoire of bound peptides, with impact on CD8(+) T cell immune responses. In this work, the gene and cDNA of the sea bass (Dicentrarchus labrax) glycoprotein TPN have been isolated and characterized. The coding sequence has a 1329 bp ORF encoding a 442-residue precursor protein with a predicted 24-amino acid leader peptide, generating a 418-amino acid mature form that retains a conserved N-glycosylation site, three conserved mammalian tapasin motifs, two Ig superfamily domains, a transmembrane domain and an ER-retention di-lysine motif at the C-terminus, suggestive of a function similar to mammalian tapasins. Similar to the human counterpart, the sea bass TPN gene comprises 8 exons, some of which correspond to separate functional domains of the protein. A three-dimensional homology model of sea bass tapasin was calculated and is consistent with the structural features described for the human molecule. Together, these results support the concept that the basic structure of TPN has been maintained through evolution. Moreover, the present data provides information that will allow further studies on cell-mediated immunity and class I antigen presentation pathway in particular, in this important fish species. Copyright © 2011 Elsevier Ltd. All rights reserved.
Genetic organization of plasmid pXF51 from the plant pathogen Xylella fastidiosa.

PubMed

Marques, M V; da Silva, A M; Gomes, S L

2001-05-01

The sequence of plasmid pXF51 from the plant pathogen Xylella fastidiosa, the causal agent of citrus variegated chlorosis, has been analyzed. This plasmid codes for 65 open reading frames (ORFs), organized into four main regions, containing genes related to replication, mobilization, and conjugative transfer. Twenty-five ORFs have no counterparts in the public sequence databases, and 7 are similar to conserved hypothetical proteins from other bacteria. A pXF51 incompatibility group has not been determined, as we could not find a typical replication origin. One cluster of conjugation-related genes (trb) seems to be incomplete in pXF51, and a copy of this sequence is found in the chromosome, suggesting it was generated by a duplication event. A second cluster (tra) contains all genes necessary for conjugation transfer to occur, showing a conserved organization with other conjugative plasmids. An identifiable origin of transfer similar to oriT from IncP plasmids is found adjacent to genes encoding two mobilization proteins. None of the ORFs with putative assigned function could be predicted as having a role in pathogenesis, except for a virulence-associated protein D homolog. These results indicate that even though pXF51 appears not to have a direct role in Xylella pathogenesis, it is a conjugative plasmid that could be important for lateral gene transfer in this bacterium. This property may be of great importance for future development of transformation techniques in X. fastidiosa.
Structure of genes for dermaseptins B, antimicrobial peptides from frog skin. Exon 1-encoded prepropeptide is conserved in genes for peptides of highly different structures and activities.

PubMed

Vouille, V; Amiche, M; Nicolas, P

1997-09-01

We cloned the genes of two members of the dermaseptin family, broad-spectrum antimicrobial peptides isolated from the skin of the arboreal frog Phyllomedusa bicolor. The dermaseptin gene Drg2 has a 2-exon coding structure interrupted by a small 137-bp intron, wherein exon 1 encoded a 22-residue hydrophobic signal peptide and the first three amino acids of the acidic propiece; exon 2 contained the 18 additional acidic residues of the propiece plus a typical prohormone processing signal Lys-Arg and a 32-residue dermaseptin progenitor sequence. The dermaseptin genes Drg2 and Drg1g2 have conserved sequences at both untranslated ends and in the first and second coding exons. In contrast, Drg1g2 comprises a third coding exon for a short version of the acidic propiece and a second dermaseptin progenitor sequence. Structural conservation between the two genes suggests that Drg1g2 arose recently from an ancestral Drg2-like gene through amplification of part of the second coding exon and 3'-untranslated region. Analysis of the cDNAs coding precursors for several frog skin peptides of highly different structures and activities demonstrates that the signal peptides and part of the acidic propieces are encoded by conserved nucleotides encompassed by the first coding exon of the dermaseptin genes. The organization of the genes that belong to this family, with the signal peptide and the progenitor sequence on separate exons, permits strikingly different peptides to be directed into the secretory pathway. The recruitment of such a homologous 'secretory' exon by otherwise non-homologous genes may have been an early event in the evolution of amphibian.
Impacts of phylogenetic nomenclature on the efficacy of the U.S. Endangered Species Act.

PubMed

Leslie, Matthew S

2015-02-01

Cataloging biodiversity is critical to conservation efforts because accurate taxonomy is often a precondition for protection under laws designed for species conservation, such as the U.S. Endangered Species Act (ESA). Traditional nomenclatural codes governing the taxonomic process have recently come under scrutiny because taxon names are more closely linked to hierarchical ranks than to the taxa themselves. A new approach to naming biological groups, called phylogenetic nomenclature (PN), explicitly names taxa by defining their names in terms of ancestry and descent. PN has the potential to increase nomenclatural stability and decrease confusion induced by the rank-based codes. But proponents of PN have struggled with whether species and infraspecific taxa should be governed by the same rules as other taxa or should have special rules. Some proponents advocate the wholesale abandonment of rank labels (including species); this could have consequences for the implementation of taxon-based conservation legislation. I examined the principles of PN as embodied in the PhyloCode (an alternative to traditional rank-based nomenclature that names biological groups based on the results of phylogenetic analyses and does not associate taxa with ranks) and assessed how this novel approach to naming taxa might affect the implementation of species-based legislation by providing a case study of the ESA. The latest version of the PhyloCode relies on the traditional rank-based codes to name species and infraspecific taxa; thus, little will change regarding the main targets of the ESA because they will retain rank labels. For this reason, and because knowledge of evolutionary relationships is of greater importance than nomenclatural procedures for initial protection of endangered taxa under the ESA, I conclude that PN under the PhyloCode will have little impact on implementation of the ESA. © 2014 Society for Conservation Biology.
78 FR 59728 - Notice of Permit Applications Received Under the Antarctic Conservation Act of 1978

Federal Register 2010, 2011, 2012, 2013, 2014

2013-09-27

... Conservation Act of 1978 AGENCY: National Science Foundation. ACTION: Notice. SUMMARY: The National Science... regulated under the Antarctic Conservation Act of 1978. NSF has published regulations under the Antarctic Conservation Act at Title 45 Part 670 of the Code of Federal Regulations. This is the required notice of permit...
Nucleotide sequence of the 3' terminal region of lettuce mosaic potyvirus RNA shows a Gln/Val dipeptide at the cleavage site between the polymerase and the coat protein.

PubMed

Dinant, S; Lot, H; Albouy, J; Kuziak, C; Meyer, M; Astier-Manifacier, S

1991-01-01

DNA complementary to the 3' terminal 1651 nucleotides of the genome of the common strain of lettuce mosaic virus (LMV-O) has been cloned and sequenced. Microsequencing of the N-terminus enabled localization of the coat protein gene in this sequence. It showed also that the LMV coat protein coding region is at the 3' end of the genome, and that the coat protein is processed from a larger protein by cleavage at an unusual Q/V dipeptide between the polymerase and the coat protein. This is the first report of such a site for cleavage of a potyvirus polyprotein, where only Q/A, Q/S, and Q/G cleavage sites have been reported. The LMV coat protein gene encodes a 278 amino acid polypeptide with a calculated Mr of 31,171 and is flanked by a region which has a high degree of homology with the putative polymerase and a 3' untranslated region of 211 nucleotides in length. Percentage of homology with the coat protein of other potyviruses confirms that LMV is a distinct member of this group. Moreover, amino acid homologies noticed with the coat protein of potexvirus, bymovirus, and carlavirus elongated plant viruses suggest a functional significance for the conserved domains.
Wetting of nonconserved residue-backbones: A feature indicative of aggregation associated regions of proteins.

PubMed

Pradhan, Mohan R; Pal, Arumay; Hu, Zhongqiao; Kannan, Srinivasaraghavan; Chee Keong, Kwoh; Lane, David P; Verma, Chandra S

2016-02-01

Aggregation is an irreversible form of protein complexation and often toxic to cells. The process entails partial or major unfolding that is largely driven by hydration. We model the role of hydration in aggregation using "Dehydrons." "Dehydrons" are unsatisfied backbone hydrogen bonds in proteins that seek shielding from water molecules by associating with ligands or proteins. We find that the residues at aggregation interfaces have hydrated backbones, and in contrast to other forms of protein-protein interactions, are under less evolutionary pressure to be conserved. Combining evolutionary conservation of residues and extent of backbone hydration allows us to distinguish regions on proteins associated with aggregation (non-conserved dehydron-residues) from other interaction interfaces (conserved dehydron-residues). This novel feature can complement the existing strategies used to investigate protein aggregation/complexation. © 2015 Wiley Periodicals, Inc.
Characterization and Analysis of Whole Transcriptome of Giant Panda Spleens: Implying Critical Roles of Long Non-Coding RNAs in Immunity.

PubMed

Peng, Rui; Liu, Yuliang; Cai, Zhigang; Shen, Fujun; Chen, Jiasong; Hou, Rong; Zou, Fangdong

2018-01-01

Giant pandas, an endangered species, are a powerful symbol of species conservation. Giant pandas may suffer from a variety of diseases. Owing to their highly specialized diet of bamboo, giant pandas are thought to have a relatively weak ability to resist diseases. The spleen is the largest organ in the lymphatic system. However, there is little known about giant panda spleen at a molecular level. Thus, clarifying the regulatory mechanisms of spleen could help us further understand the immune system of the giant panda as well as its conservation. The two giant panda spleens were from two male individuals, one newborn and one an adult, in a non-pathological condition. The whole transcriptomes of mRNA, lncRNA, miRNA, and circRNA in the two spleens were sequenced using the Illumina HiSeq platform. EBseq and IDEG6 were used to observe the differentially expressed genes (DEGs) between these two spleens. Gene Ontology and KEGG analyses were used to annotate the function of DEGs. Furthermore, networks between non-coding RNAs and protein-coding genes were constructed to investigate the relationship between non-coding RNAs and immune-associated genes. By comparative analysis of the whole transcriptomes of these two spleens, we found that one of the major roles of lncRNAs could be involved in the regulation of immune responses of giant panda spleens. In addition, our results also revealed that microRNAs and circRNAs may have evolved to regulate a large set of biological processes of giant panda spleens, and circRNAs may function as miRNA sponges. To our knowledge, this is the first report of lncRNAs and circRNAs in giant panda, which could be a useful resource for further giant panda research. Our study reveals the potential functional roles of miRNAs, lncRNAs, and circRNAs in giant panda spleen. © 2018 The Author(s). Published by S. Karger AG, Basel.
Prediction of plant lncRNA by ensemble machine learning classifiers.

PubMed

Simopoulos, Caitlin M A; Weretilnyk, Elizabeth A; Golding, G Brian

2018-05-02

In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.
Phylogenetic distribution of plant snoRNA families.

PubMed

Patra Bhattacharya, Deblina; Canzler, Sebastian; Kehr, Stephanie; Hertel, Jana; Grosse, Ivo; Stadler, Peter F

2016-11-24

Small nucleolar RNAs (snoRNAs) are one of the most ancient families amongst non-protein-coding RNAs. They are ubiquitous in Archaea and Eukarya but absent in bacteria. Their main function is to target chemical modifications of ribosomal RNAs. They fall into two classes, box C/D snoRNAs and box H/ACA snoRNAs, which are clearly distinguished by conserved sequence motifs and the type of chemical modification that they govern. Similarly to microRNAs, snoRNAs appear in distinct families of homologs that affect homologous targets. In animals, snoRNAs and their evolution have been studied in much detail. In plants, however, their evolution has attracted comparably little attention. In order to chart the phylogenetic distribution of individual snoRNA families in plants, we applied a sophisticated approach for identifying homologs of known plant snoRNAs across the plant kingdom. In response to the relatively fast evolution of snoRNAs, information on conserved sequence boxes, target sequences, and secondary structure is combined to identify additional snoRNAs. We identified 296 families of snoRNAs in 24 species and traced their evolution throughout the plant kingdom. Many of the plant snoRNA families comprise paralogs. We also found that targets are well-conserved for most snoRNA families. The sequence conservation of snoRNAs is sufficient to establish homologies between phyla. The degree of this conservation tapers off, however, between land plants and algae. Plant snoRNAs are frequently organized in highly conserved spatial clusters. As a resource for further investigations we provide carefully curated and annotated alignments for each snoRNA family under investigation.
Identification and validation of Asteraceae miRNAs by the expressed sequence tag analysis.

PubMed

Monavar Feshani, Aboozar; Mohammadi, Saeed; Frazier, Taylor P; Abbasi, Abbas; Abedini, Raha; Karimi Farsad, Laleh; Ehya, Farveh; Salekdeh, Ghasem Hosseini; Mardi, Mohsen

2012-02-10

MicroRNAs (miRNAs) are small non-coding RNA molecules that play a vital role in the regulation of gene expression. Despite their identification in hundreds of plant species, few miRNAs have been identified in the Asteraceae, a large family that comprises approximately one tenth of all flowering plants. In this study, we used the expressed sequence tag (EST) analysis to identify potential conserved miRNAs and their putative target genes in the Asteraceae. We applied quantitative Real-Time PCR (qRT-PCR) to confirm the expression of eight potential miRNAs in Carthamus tinctorius and Helianthus annuus. We also performed qRT-PCR analysis to investigate the differential expression pattern of five newly identified miRNAs during five different cotyledon growth stages in safflower. Using these methods, we successfully identified and characterized 151 potentially conserved miRNAs, belonging to 26 miRNA families, in 11 genus of Asteraceae. EST analysis predicted that the newly identified conserved Asteraceae miRNAs target 130 total protein-coding ESTs in sunflower and safflower, as well as 433 additional target genes in other plant species. We experimentally confirmed the existence of seven predicted miRNAs, (miR156, miR159, miR160, miR162, miR166, miR396, and miR398) in safflower and sunflower seedlings. We also observed that five out of eight miRNAs are differentially expressed during cotyledon development. Our results indicate that miRNAs may be involved in the regulation of gene expression during seed germination and the formation of the cotyledons in the Asteraceae. The findings of this study might ultimately help in the understanding of miRNA-mediated gene regulation in important crop species. Copyright © 2011 Elsevier B.V. All rights reserved.
Microfluidic affinity and ChIP-seq analyses converge on a conserved FOXP2-binding motif in chimp and human, which enables the detection of evolutionarily novel targets.

PubMed

Nelson, Christopher S; Fuller, Chris K; Fordyce, Polly M; Greninger, Alexander L; Li, Hao; DeRisi, Joseph L

2013-07-01

The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein's DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2's-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved.
Characterization of Conserved Tandem Donor Sites and Intronic Motifs Required for Alternative Splicing in Corticosteroid Receptor Genes

PubMed Central

Qian, Xiaoxiao; Matthews, Laura; Lightman, Stafford; Ray, David; Norman, Michael

2015-01-01

Alternative splicing events from tandem donor sites result in mRNA variants coding for additional amino acids in the DNA binding domain of both the glucocorticoid (GR) and mineralocorticoid (MR) receptors. We now show that expression of both splice variants is extensively conserved in mammalian species, providing strong evidence for their functional significance. An exception to the conservation of the MR tandem splice site (an A at position +5 of the MR+12 donor site in the mouse) was predicted to decrease U1 small nuclear RNA binding. In accord with this prediction, we were unable to detect the MR+12 variant in this species. The one exception to the conservation of the GR tandem splice site, an A at position +3 of the platypus GRγ donor site that was predicted to enhance binding of U1 snRNA, was unexpectedly associated with decreased expression of the variant from the endogenous gene as well as a minigene. An intronic pyrimidine motif present in both GR and MR genes was found to be critical for usage of the downstream donor site, and overexpression of TIA1/TIAL1 RNA binding proteins, which are known to bind such motifs, led to a marked increase in the proportion of GRγ and MR+12. These results provide striking evidence for conservation of a complex splicing mechanism that involves processes other than stochastic spliceosome binding and identify a mechanism that would allow regulation of variant expression. PMID:19819975
Subscale Development of Advanced ABM Graphite/Epoxy Composite Structure

DTIC Science & Technology

1978-01-01

laminate analysis computer code (Reference 5). eie output of this code yields lamina stresses and strains, equivalent elastic and shear modulii for the...was not accounted for. Therefore the net effect was that the analysis tended to yield conservative results. For design purposes, this conservative...extracted using a Soxhlet Extraction apparatus, recycling the solvent af least 4 to 10 times every hour for a minimum of 6 hours. (4) All samples are
Structural Code Considerations for Solar Rooftop Installations.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dwyer, Stephen F.; Dwyer, Brian P.; Sanchez, Alfred

2014-12-01

Residential rooftop solar panel installations are limited in part by the high cost of structural related code requirements for field installation. Permitting solar installations is difficult because there is a belief among residential permitting authorities that typical residential rooftops may be structurally inadequate to support the additional load associated with a photovoltaic (PV) solar installation. Typical engineering methods utilized to calculate stresses on a roof structure involve simplifying assumptions that render a complex non-linear structure to a basic determinate beam. This method of analysis neglects the composite action of the entire roof structure, yielding a conservative analysis based on amore » rafter or top chord of a truss. Consequently, the analysis can result in an overly conservative structural analysis. A literature review was conducted to gain a better understanding of the conservative nature of the regulations and codes governing residential construction and the associated structural system calculations.« less
The complete mitochondrial genome of the central chimpanzee, Pan troglodytes troglodytes.

PubMed

Liu, Bang; Hu, Xiao-di; Gao, Li-Zhi

2016-07-01

This study first report the complete mitochondrial genome sequence of the central chimpanzee, Pan troglodytes troglodytes. The genome was a total of 16 556 bp in length and had a base composition of A (31.05%), G (12.95%), C (30.84%), and T (25.16%), indicating that the percentage of A + T (56.21%) is higher than G + C (43.79%). Similar to other primates, it possessed a typically conserved structure, including 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes and 1 control region (D-loop). Most of these genes were found to locate on the H-strand except for the ND6 gene and 8 tRNA genes. The phylogenetic analysis showed that the P. t. troglodytes mitochondrial genome formed a cluster with the other three Pan troglodytes genomes and that the genus Pan is closely related to the genus Homo. This mitochondrial genome sequence would supply useful genetic resources to help the conservation management of primate germplasm and uncover hominoid evolution.

Dynamic landscape and regulation of RNA editing in mammals

PubMed Central

Tan, Meng How; Li, Qin; Shanmugam, Raghuvaran; Piskol, Robert; Kohler, Jennefer; Young, Amy N.; Liu, Kaiwen Ivy; Zhang, Rui; Ramaswami, Gokul; Ariyoshi, Kentaro; Gupte, Ankita; Keegan, Liam P.; George, Cyril X.; Ramu, Avinash; Huang, Ni; Pollina, Elizabeth A.; Leeman, Dena S.; Rustighi, Alessandra; Sharon Goh, Y. P.; Chawla, Ajay; Del Sal, Giannino; Peltz, Gary; Brunet, Anne; Conrad, Donald F.; Samuel, Charles E.; O’Connell, Mary A.; Walkley, Carl R.; Nishikura, Kazuko; Li, Jin Billy

2017-01-01

Adenosine-to-inosine (A-to-I) RNA editing is a conserved post-transcriptional mechanism mediated by ADAR enzymes that diversifies the transcriptome by altering selected nucleotides in RNA molecules1. Although many editing sites have recently been discovered2–7, the extent to which most sites are edited and how the editing is regulated in different biological contexts are not fully understood8–10. Here we report dynamic spatiotemporal patterns and new regulators of RNA editing, discovered through an extensive profiling of A-to-I RNA editing in 8,551 human samples (representing 53 body sites from 552 individuals) from the Genotype-Tissue Expression (GTEx) project and in hundreds of other primate and mouse samples. We show that editing levels in non-repetitive coding regions vary more between tissues than editing levels in repetitive regions. Globally, ADAR1 is the primary editor of repetitive sites and ADAR2 is the primary editor of non-repetitive coding sites, whereas the catalytically inactive ADAR3 predominantly acts as an inhibitor of editing. Cross-species analysis of RNA editing in several tissues revealed that species, rather than tissue type, is the primary determinant of editing levels, suggesting stronger cis-directed regulation of RNA editing for most sites, although the small set of conserved coding sites is under stronger trans-regulation. In addition, we curated an extensive set of ADAR1 and ADAR2 targets and showed that many editing sites display distinct tissue-specific regulation by the ADAR enzymes in vivo. Further analysis of the GTEx data revealed several potential regulators of editing, such as AIMP2, which reduces editing in muscles by enhancing the degradation of the ADAR proteins. Collectively, our work provides insights into the complex cis- and trans-regulation of A-to-I editing. PMID:29022589
GenomeVista

DOE Office of Scientific and Technical Information (OSTI.GOV)

Poliakov, Alexander; Couronne, Olivier

2002-11-04

Aligning large vertebrate genomes that are structurally complex poses a variety of problems not encountered on smaller scales. Such genomes are rich in repetitive elements and contain multiple segmental duplications, which increases the difficulty of identifying true orthologous SNA segments in alignments. The sizes of the sequences make many alignment algorithms designed for comparing single proteins extremely inefficient when processing large genomic intervals. We integrated both local and global alignment tools and developed a suite of programs for automatically aligning large vertebrate genomes and identifying conserved non-coding regions in the alignments. Our method uses the BLAT local alignment program tomore » find anchors on the base genome to identify regions of possible homology for a query sequence. These regions are postprocessed to find the best candidates which are then globally aligned using the AVID global alignment program. In the last step conserved non-coding segments are identified using VISTA. Our methods are fast and the resulting alignments exhibit a high degree of sensitivity, covering more than 90% of known coding exons in the human genome. The GenomeVISTA software is a suite of Perl programs that is built on a MySQL database platform. The scheduler gets control data from the database, builds a queve of jobs, and dispatches them to a PC cluster for execution. The main program, running on each node of the cluster, processes individual sequences. A Perl library acts as an interface between the database and the above programs. The use of a separate library allows the programs to function independently of the database schema. The library also improves on the standard Perl MySQL database interfere package by providing auto-reconnect functionality and improved error handling.« less
Dynamic landscape and regulation of RNA editing in mammals.

PubMed

Tan, Meng How; Li, Qin; Shanmugam, Raghuvaran; Piskol, Robert; Kohler, Jennefer; Young, Amy N; Liu, Kaiwen Ivy; Zhang, Rui; Ramaswami, Gokul; Ariyoshi, Kentaro; Gupte, Ankita; Keegan, Liam P; George, Cyril X; Ramu, Avinash; Huang, Ni; Pollina, Elizabeth A; Leeman, Dena S; Rustighi, Alessandra; Goh, Y P Sharon; Chawla, Ajay; Del Sal, Giannino; Peltz, Gary; Brunet, Anne; Conrad, Donald F; Samuel, Charles E; O'Connell, Mary A; Walkley, Carl R; Nishikura, Kazuko; Li, Jin Billy

2017-10-11

Adenosine-to-inosine (A-to-I) RNA editing is a conserved post-transcriptional mechanism mediated by ADAR enzymes that diversifies the transcriptome by altering selected nucleotides in RNA molecules. Although many editing sites have recently been discovered, the extent to which most sites are edited and how the editing is regulated in different biological contexts are not fully understood. Here we report dynamic spatiotemporal patterns and new regulators of RNA editing, discovered through an extensive profiling of A-to-I RNA editing in 8,551 human samples (representing 53 body sites from 552 individuals) from the Genotype-Tissue Expression (GTEx) project and in hundreds of other primate and mouse samples. We show that editing levels in non-repetitive coding regions vary more between tissues than editing levels in repetitive regions. Globally, ADAR1 is the primary editor of repetitive sites and ADAR2 is the primary editor of non-repetitive coding sites, whereas the catalytically inactive ADAR3 predominantly acts as an inhibitor of editing. Cross-species analysis of RNA editing in several tissues revealed that species, rather than tissue type, is the primary determinant of editing levels, suggesting stronger cis-directed regulation of RNA editing for most sites, although the small set of conserved coding sites is under stronger trans-regulation. In addition, we curated an extensive set of ADAR1 and ADAR2 targets and showed that many editing sites display distinct tissue-specific regulation by the ADAR enzymes in vivo. Further analysis of the GTEx data revealed several potential regulators of editing, such as AIMP2, which reduces editing in muscles by enhancing the degradation of the ADAR proteins. Collectively, our work provides insights into the complex cis- and trans-regulation of A-to-I editing.
A circadian gene expression atlas in mammals: implications for biology and medicine.

PubMed

Zhang, Ray; Lahens, Nicholas F; Ballance, Heather I; Hughes, Michael E; Hogenesch, John B

2014-11-11

To characterize the role of the circadian clock in mouse physiology and behavior, we used RNA-seq and DNA arrays to quantify the transcriptomes of 12 mouse organs over time. We found 43% of all protein coding genes showed circadian rhythms in transcription somewhere in the body, largely in an organ-specific manner. In most organs, we noticed the expression of many oscillating genes peaked during transcriptional "rush hours" preceding dawn and dusk. Looking at the genomic landscape of rhythmic genes, we saw that they clustered together, were longer, and had more spliceforms than nonoscillating genes. Systems-level analysis revealed intricate rhythmic orchestration of gene pathways throughout the body. We also found oscillations in the expression of more than 1,000 known and novel noncoding RNAs (ncRNAs). Supporting their potential role in mediating clock function, ncRNAs conserved between mouse and human showed rhythmic expression in similar proportions as protein coding genes. Importantly, we also found that the majority of best-selling drugs and World Health Organization essential medicines directly target the products of rhythmic genes. Many of these drugs have short half-lives and may benefit from timed dosage. In sum, this study highlights critical, systemic, and surprising roles of the mammalian circadian clock and provides a blueprint for advancement in chronotherapy.
FOXP2 variation in great ape populations offers insight into the evolution of communication skills.

PubMed

Staes, Nicky; Sherwood, Chet C; Wright, Katharine; de Manuel, Marc; Guevara, Elaine E; Marques-Bonet, Tomas; Krützen, Michael; Massiah, Michael; Hopkins, William D; Ely, John J; Bradley, Brenda J

2017-12-04

The gene coding for the forkhead box protein P2 (FOXP2) is associated with human language disorders. Evolutionary changes in this gene are hypothesized to have contributed to the emergence of speech and language in the human lineage. Although FOXP2 is highly conserved across most mammals, humans differ at two functional amino acid substitutions from chimpanzees, bonobos and gorillas, with an additional fixed substitution found in orangutans. However, FOXP2 has been characterized in only a small number of apes and no publication to date has examined the degree of natural variation in large samples of unrelated great apes. Here, we analyzed the genetic variation in the FOXP2 coding sequence in 63 chimpanzees, 11 bonobos, 48 gorillas, 37 orangutans and 2 gibbons and observed undescribed variation in great apes. We identified two variable polyglutamine microsatellites in chimpanzees and orangutans and found three nonsynonymous single nucleotide polymorphisms, one in chimpanzees, one in gorillas and one in orangutans with derived allele frequencies of 0.01, 0.26 and 0.29, respectively. Structural and functional protein modeling indicate a biochemical effect of the substitution in orangutans, and because of its presence solely in the Sumatran orangutan species, the mutation may be associated with reported population differences in vocalizations.
Polar bears, antibiotics, and the evolving ribosome (Nobel Lecture).

PubMed

Yonath, Ada

2010-06-14

High-resolution structures of ribosomes, the cellular machines that translate the genetic code into proteins, revealed the decoding mechanism, detected the mRNA path, identified the sites of the tRNA molecules in the ribosome, elucidated the position and the nature of the nascent proteins exit tunnel, illuminated the interactions of the ribosome with non-ribosomal factors, such as the initiation, release and recycling factors, and provided valuable information on ribosomal antibiotics, their binding sites, modes of action, principles of selectivity and the mechanisms leading to their resistance. Notably, these structures proved that the ribosome is a ribozyme whose active site, namely where the peptide bonds are being formed, is situated within a universal symmetrical region that is embedded in the otherwise asymmetric ribosome structure. As this symmetrical region is highly conserved and provides the machinery required for peptide bond formation and for ribosome polymerase activity, it may be the remnant of the proto-ribosome, a dimeric prebiotic machine that formed peptide bonds and non-coded polypeptide chains. Structures of complexes of ribosomes with antibiotics targeting them revealed the principles allowing for their clinical use, identified resistance mechanisms and showed the structural bases for discriminating pathogenic bacteria from hosts, hence providing valuable structural information for antibiotics improvement and for the design of novel compounds that can serve as antibiotics.
The complete mitochondrial genome of the butterfly Apatura metis (Lepidoptera: Nymphalidae).

PubMed

Zhang, Min; Nie, Xinping; Cao, Tianwen; Wang, Juping; Li, Tao; Zhang, Xiaonan; Guo, Yaping; Ma, Enbo; Zhong, Yang

2012-06-01

As an important pest in the Slender Leaved Willow (Salix alba), Apatura metis is called Freyer's purple emperor, and its mitochondrial genome is 15,236 bp long. The encoded genes for 22 tRNA genes, two ribosomal RNA (rrnL and rrnS) genes, and 13 protein-coding genes (PCGs), and a control region in the A. metis mitochondria are highly homologous to other lepidopteran species. The mitochondrial genome of A. metis is biased toward a high A + T content (A + T = 80.5%). All protein-coding genes, except for COI begins with the CGA codon as observed in other lepidopterans, start with a typical ATN initiation codon. All tRNAs show the classic clover-leaf structure, except that the dihydrouridine (DHU) arm of tRNA(Ser(AGN)) forms a simple loop. The A. metis A + T-rich region contains some conserved structures including a structure combining the motif 'ATAGA' and 19 bp poly (T) stretch, which is similar to those found in other lepidopteran mitogenomes. The phylogenetic analyses of lepidopterans based on mitogenomes sequences demonstrate that each of the six superfamilies is monophyletic, and the relationship among them is (((Noctuoidea + (Geometroidea + Bombycoidea)) + Pyraloidea) + Papilionoidea) + Tortricoidea. In Papilionoidea group, our conclusion argues that ((Lycaenidae + Pieridae) + Nymphalidae) + Papilionidae.
Mitochondrial Genome of the Stonefly Kamimuria wangi (Plecoptera: Perlidae) and Phylogenetic Position of Plecoptera Based on Mitogenomes

PubMed Central

Yu-Han, Qian; Hai-Yan, Wu; Xiao-Yu, Ji; Wei-Wei, Yu; Yu-Zhou, Du

2014-01-01

This study determined the mitochondrial genome sequence of the stonefly, Kamimuria wangi. In order to investigate the relatedness of stonefly to other members of Neoptera, a phylogenetic analysis was undertaken based on 13 protein-coding genes of mitochondrial genomes in 13 representative insects. The mitochondrial genome of the stonefly is a circular molecule consisting of 16,179 nucleotides and contains the 37 genes typically found in other insects. A 10-bp poly-T stretch was observed in the A+T-rich region of the K. wangi mitochondrial genome. Downstream of the poly-T stretch, two regions were located with potential ability to form stem-loop structures; these were designated stem-loop 1 (positions 15848–15651) and stem-loop 2 (15965–15998). The arrangement of genes and nucleotide composition of the K. wangi mitogenome are similar to those in Pteronarcys princeps, suggesting a conserved genome evolution within the Plecoptera. Phylogenetic analysis using maximum likelihood and Bayesian inference of 13 protein-coding genes supported a novel relationship between the Plecoptera and Ephemeroptera. The results contradict the existence of a monophyletic Plectoptera and Plecoptera as sister taxa to Embiidina, and thus requires further analyses with additional mitogenome sampling at the base of the Neoptera. PMID:24466028
Mitochondrial genome of the stonefly Kamimuria wangi (Plecoptera: Perlidae) and phylogenetic position of plecoptera based on mitogenomes.

PubMed

Yu-Han, Qian; Hai-Yan, Wu; Xiao-Yu, Ji; Wei-Wei, Yu; Yu-Zhou, Du

2014-01-01

This study determined the mitochondrial genome sequence of the stonefly, Kamimuria wangi. In order to investigate the relatedness of stonefly to other members of Neoptera, a phylogenetic analysis was undertaken based on 13 protein-coding genes of mitochondrial genomes in 13 representative insects. The mitochondrial genome of the stonefly is a circular molecule consisting of 16,179 nucleotides and contains the 37 genes typically found in other insects. A 10-bp poly-T stretch was observed in the A+T-rich region of the K. wangi mitochondrial genome. Downstream of the poly-T stretch, two regions were located with potential ability to form stem-loop structures; these were designated stem-loop 1 (positions 15848-15651) and stem-loop 2 (15965-15998). The arrangement of genes and nucleotide composition of the K. wangi mitogenome are similar to those in Pteronarcys princeps, suggesting a conserved genome evolution within the Plecoptera. Phylogenetic analysis using maximum likelihood and Bayesian inference of 13 protein-coding genes supported a novel relationship between the Plecoptera and Ephemeroptera. The results contradict the existence of a monophyletic Plectoptera and Plecoptera as sister taxa to Embiidina, and thus requires further analyses with additional mitogenome sampling at the base of the Neoptera.
The complete mitochondrial genome of Arctic Calanus hyperboreus (Copepoda, Calanoida) reveals characteristic patterns in calanoid mitochondrial genome.

PubMed

Kim, Sanghee; Lim, Byung-Jin; Min, Gi-Sik; Choi, Han-Gu

2013-05-10

Copepoda is the most diverse and abundant group of crustaceans, but its phylogenetic relationships are ambiguous. Mitochondrial (mt) genomes are useful for studying evolutionary history, but only six complete Copepoda mt genomes have been made available and these have extremely rearranged genome structures. This study determined the mt genome of Calanus hyperboreus, making it the first reported Arctic copepod mt genome and the first complete mt genome of a calanoid copepod. The mt genome of C. hyperboreus is 17,910 bp in length and it contains the entire set of 37 mt genes, including 13 protein-coding genes, 2 rRNAs, and 22 tRNAs. It has a very unusual gene structure, including the longest control region reported for a crustacean, a large tRNA gene cluster, and reversed GC skews in 11 out of 13 protein-coding genes (84.6%). Despite the unusual features, comparing this genome to published copepod genomes revealed retained pan-crustacean features, as well as a conserved calanoid-specific pattern. Our data provide a foundation for exploring the calanoid pattern and the mechanisms of mt gene rearrangement in the evolutionary history of the copepod mt genome. Copyright © 2012 Elsevier B.V. All rights reserved.
Novel Accurate Bacterial Discrimination by MALDI-Time-of-Flight MS Based on Ribosomal Proteins Coding in S10-spc-alpha Operon at Strain Level S10-GERMS

NASA Astrophysics Data System (ADS)

Tamura, Hiroto; Hotta, Yudai; Sato, Hiroaki

2013-08-01

Matrix-assisted laser-desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) is one of the most widely used mass-based approaches for bacterial identification and classification because of the simple sample preparation and extremely rapid analysis within a few minutes. To establish the accurate MALDI-TOF MS bacterial discrimination method at strain level, the ribosomal subunit proteins coded in the S 10-spc-alpha operon, which encodes half of the ribosomal subunit protein and is highly conserved in eubacterial genomes, were selected as reliable biomarkers. This method, named the S10-GERMS method, revealed that the strains of genus Pseudomonas were successfully identified and discriminated at species and strain levels, respectively; therefore, the S10-GERMS method was further applied to discriminate the pathovar of P. syringae. The eight selected biomarkers (L24, L30, S10, S12, S14, S16, S17, and S19) suggested the rapid discrimination of P. syringae at the strain (pathovar) level. The S10-GERMS method appears to be a powerful tool for rapid and reliable bacterial discrimination and successful phylogenetic characterization. In this article, an overview of the utilization of results from the S10-GERMS method is presented, highlighting the characterization of the Lactobacillus casei group and discrimination of the bacteria of genera Bacillus and Sphingopyxis despite only two and one base difference in the 16S rRNA gene sequence, respectively.
Identification of a new genotype H wild-type mumps virus strain and its molecular relatedness to other virulent and attenuated strains.

PubMed

Amexis, Georgios; Rubin, Steven; Chatterjee, Nando; Carbone, Kathryn; Chumakov, Kostantin

2003-06-01

A single clinical isolate of mumps virus designated 88-1961 was obtained from a patient hospitalized with a clinical history of upper respiratory tract infection, parotitis, severe headache, fever and lymphadenopathy. We have sequenced the full-length genome of 88-1961 and compared it against all available full-length sequences of mumps virus. Based upon its nucleotide sequence of the SH gene 88-1961 was identified as a genotype H mumps strain. The overall extent of nucleotide and amino acid differences between each individual gene and protein of 88-1961 and the full-length mumps samples showed that the missense to silent ratios were unevenly distributed. Upon evaluation of the consensus sequence of 88-1961, four positions were found to be clearly heterogeneous at the nucleotide level (NP 315C/T, NP 318C/T, F 271A/C, and HN 855C/T). Sequence analysis revealed that the amino acid sequences for the NP, M, and the L protein were the most conserved, whereas the SH protein exhibited the highest variability among the compared mumps genotypes A, B, and G. No identifying molecular patterns in the non-coding (intergenic) or coding regions of 88-1961 were found when we compared it against relatively virulent (Urabe AM9 B, Glouc1/UK96, 87-1004 and 87-1005) and non-virulent mumps strains (Jeryl Lynn and all Urabe Am9 A substrains). Copyright 2003 Wiley-Liss, Inc.
Mutations in the Promoter Region of the Aldolase B Gene that cause Hereditary Fructose Intolerance

PubMed Central

Coffee, Erin M.; Tolan, Dean R.

2010-01-01

SUMMARY Hereditary fructose intolerance (HFI) is a potentially fatal inherited metabolic disease caused by a deficiency of aldolase B activity in the liver and kidney. Over 40 disease-causing mutations are known in the protein-coding region of ALDOB. Mutations upstream of the protein-coding portion of ALDOB are reported here for the first time. DNA sequence analysis of 61 HFI patients revealed single base mutations in the promoter, intronic enhancer, and the first exon, which is entirely untranslated. One mutation, g.–132G>A, is located within the promoter at an evolutionarily conserved nucleotide within a transcription factor-binding site. A second mutation, IVS1+1G>C, is at the donor splice site of the first exon. In vitro electrophoretic mobility shift assays show a decrease in nuclear extract-protein binding at the g.–132G>A mutant site. The promoter mutation results in decreased transcription using luciferase reporter plasmids. Analysis of cDNA from cells transfected with plasmids harboring the IVS1+1G>C mutation results in aberrant splicing leading to complete retention of the first intron (~ 5 kb). The IVS1+1G>C splicing mutation results in loss of luciferase activity from a reporter plasmid. These novel mutations in ALDOB represent 2% of alleles in American HFI patients, with IVS1+1G>C representing a significantly higher allele frequency (6%) among HFI patients of Hispanic and African-American ethnicity. PMID:20882353
Comparative protein modeling of methionine S-adenosyltransferase (MAT) enzyme from Mycobacterium tuberculosis: a potential target for antituberculosis drug discovery.

PubMed

Khedkar, Santosh A; Malde, Alpeshkumar K; Coutinho, Evans C

2005-01-01

Mycobacterium tuberculosis (Mtb) is a successful pathogen that overcomes the numerous challenges presented by the immune system of the host. In the last 40 years few anti-TB drugs have been developed, while the drug-resistance problem is increasing; there is thus a pressing need to develop new anti-TB drugs active against both the acute and chronic growth phases of the mycobacterium. Methionine S-adenosyltransferase (MAT) is an enzyme involved in the synthesis of S-adenosylmethionine (SAM), a methyl donor essential for mycolipid biosynthesis. As an anti-TB drug target, Mtb-MAT has been well validated. A homology model of MAT has been constructed using the X-ray structures of E. coli MAT (PDB code: 1MXA) and rat MAT (PDB code: 1QM4) as templates, by comparative protein modeling principles. The resulting model has the correct stereochemistry as gauged from the Ramachandran plot and good three-dimensional (3D) structure compatibility as assessed by the Profiles-3D score. The structurally and functionally important residues (active site) of Mtb-MAT have been identified using the E. coli and rat MAT crystal structures and the reported point mutation data. The homology model conserves the topological and active site features of the MAT family of proteins. The differences in the molecular electrostatic potentials (MEP) of Mtb and human MAT provide evidences that selective and specific Mtb-MAT inhibitors can be designed using the homology model, by the structure-based drug design approaches.
RNA polymerase II conserved protein domains as platforms for protein-protein interactions

PubMed Central

García-López, M Carmen

2011-01-01

RNA polymerase II establishes many protein-protein interactions with transcriptional regulators to coordinate gene expression, but little is known about protein domains involved in the contact with them. We use a new approach to look for conserved regions of the RNA pol II of S. cerevisiae located at the surface of the structure of the complex, hypothesizing that they might be involved in the interaction with transcriptional regulators. We defined five different conserved domains and demonstrate that all of them make contact with transcriptional regulators. PMID:21922063
An efficient algorithm for pairwise local alignment of protein interaction networks

DOE PAGES

Chen, Wenbin; Schmidt, Matthew; Tian, Wenhong; ...

2015-04-01

Recently, researchers seeking to understand, modify, and create beneficial traits in organisms have looked for evolutionarily conserved patterns of protein interactions. Their conservation likely means that the proteins of these conserved functional modules are important to the trait's expression. In this paper, we formulate the problem of identifying these conserved patterns as a graph optimization problem, and develop a fast heuristic algorithm for this problem. We compare the performance of our network alignment algorithm to that of the MaWISh algorithm [Koyuturk M, Kim Y, Topkara U, Subramaniam S, Szpankowski W, Grama A, Pairwise alignment of protein interaction networks, J Computmore » Biol 13(2): 182-199, 2006.], which bases its search algorithm on a related decision problem formulation. We find that our algorithm discovers conserved modules with a larger number of proteins in an order of magnitude less time. In conclusion, the protein sets found by our algorithm correspond to known conserved functional modules at comparable precision and recall rates as those produced by the MaWISh algorithm.« less
The Evolution and Expression Pattern of Human Overlapping lncRNA and Protein-coding Gene Pairs.

PubMed

Ning, Qianqian; Li, Yixue; Wang, Zhen; Zhou, Songwen; Sun, Hong; Yu, Guangjun

2017-03-27

Long non-coding RNA overlapping with protein-coding gene (lncRNA-coding pair) is a special type of overlapping genes. Protein-coding overlapping genes have been well studied and increasing attention has been paid to lncRNAs. By studying lncRNA-coding pairs in human genome, we showed that lncRNA-coding pairs were more likely to be generated by overprinting and retaining genes in lncRNA-coding pairs were given higher priority than non-overlapping genes. Besides, the preference of overlapping configurations preserved during evolution was based on the origin of lncRNA-coding pairs. Further investigations showed that lncRNAs promoting the splicing of their embedded protein-coding partners was a unilateral interaction, but the existence of overlapping partners improving the gene expression was bidirectional and the effect was decreased with the increased evolutionary age of genes. Additionally, the expression of lncRNA-coding pairs showed an overall positive correlation and the expression correlation was associated with their overlapping configurations, local genomic environment and evolutionary age of genes. Comparison of the expression correlation of lncRNA-coding pairs between normal and cancer samples found that the lineage-specific pairs including old protein-coding genes may play an important role in tumorigenesis. This work presents a systematically comprehensive understanding of the evolution and the expression pattern of human lncRNA-coding pairs.
Cloning and characterization of the Drosophila homolog of the xeroderma pigmentosum complementation-group B correcting gene, ERCC3.

PubMed Central

Koken, M H; Vreeken, C; Bol, S A; Cheng, N C; Jaspers-Dekker, I; Hoeijmakers, J H; Eeken, J C; Weeda, G; Pastink, A

1992-01-01

Previously the human nucleotide excision repair gene ERCC3 was shown to be responsible for a rare combination of the autosomal recessive DNA repair disorders xeroderma pigmentosum (complementation group B) and Cockayne's syndrome (complementation group C). The human and mouse ERCC3 proteins contain several sequence motifs suggesting that it is a nucleic acid or chromatin binding helicase. To study the significance of these domains and the overall evolutionary conservation of the gene, the homolog from Drosophila melanogaster was isolated by low stringency hybridizations using two flanking probes of the human ERCC3 cDNA. The flanking probe strategy selects for long stretches of nucleotide sequence homology, and avoids isolation of small regions with fortuitous homology. In situ hybridization localized the gene onto chromosome III 67E3/4, a region devoid of known D.melanogaster mutagen sensitive mutants. Northern blot analysis showed that the gene is continuously expressed in all stages of fly development. A slight increase (2-3 times) of ERCC3Dm transcript was observed in the later stages. Two almost full length cDNAs were isolated, which have different 5' untranslated regions (UTR). The SD4 cDNA harbours only one long open reading frame (ORF) coding for ERCC3Dm. Another clone (SD2), however, has the potential to encode two proteins: a 170 amino acids polypeptide starting at the optimal first ATG has no detectable homology with any other proteins currently in the data bases, and another ORF beginning at the suboptimal second startcodon which is identical to that of SD4. Comparison of the encoded ERCC3Dm protein with the homologous proteins of mouse and man shows a strong amino acid conservation (71% identity), especially in the postulated DNA binding region and seven 'helicase' domains. The ERCC3Dm sequence is fully consistent with the presumed functions and the high conservation of these regions strengthens their functional significance. Microinjection and DNA transfection of ERCC3Dm into human xeroderma pigmentosum (c.g. B) fibroblasts and group 3 rodent mutants did not yield detectable correction. One of the possibilities to explain these negative findings is that the D.melanogaster protein may be unable to function in a mammalian repair context. Images PMID:1454518
Salivary agglutinin/glycoprotein-340/DMBT1: a single molecule with variable composition and with different functions in infection, inflammation and cancer.

PubMed

Ligtenberg, Antoon J M; Veerman, Enno C I; Nieuw Amerongen, Arie V; Mollenhauer, Jan

2007-12-01

Salivary agglutinin (SAG), lung glycoprotein-340 (gp-340) and Deleted in Malignant Brain Tumours 1 (DMBT1) are three names for identical proteins encoded by the dmbt1 gene. DMBT1/SAG/gp-340 belongs to the scavenger receptor cysteine-rich (SRCR) superfamily of proteins, a superfamily of secreted or membrane-bound proteins with SRCR domains that are highly conserved down to sponges, the most ancient metazoa. On the one hand, DMBT1 may represent an innate defence factor acting as a pattern recognition molecule. It interacts with a broad range of pathogens, including cariogenic streptococci and Helicobacter pylori, influenza viruses and HIV, but also with mucosal defence proteins, such as IgA, surfactant proteins and MUC5B. Stimulation of alveolar macrophage migration, suppression of neutrophil oxidative burst and activation of the complement cascade point further to an important role in the regulation of inflammatory responses. On the other hand, DMBT1 has been demonstrated to play a role in epithelial and stem cell differentiation. Inactivation of the gene coding for this protein may lead to disturbed differentiation, possibly resulting in tumour formation. These data strongly point to a role for DMBT1 as a molecule linking innate immune processes with regenerative processes.
RBFOX and PTBP1 proteins regulate the alternative splicing of micro-exons in human brain transcripts.

PubMed

Li, Yang I; Sanchez-Pulido, Luis; Haerty, Wilfried; Ponting, Chris P

2015-01-01

Ninety-four percent of mammalian protein-coding exons exceed 51 nucleotides (nt) in length. The paucity of micro-exons (≤ 51 nt) suggests that their recognition and correct processing by the splicing machinery present greater challenges than for longer exons. Yet, because thousands of human genes harbor processed micro-exons, specialized mechanisms may be in place to promote their splicing. Here, we survey deep genomic data sets to define 13,085 micro-exons and to study their splicing mechanisms and molecular functions. More than 60% of annotated human micro-exons exhibit a high level of sequence conservation, an indicator of functionality. While most human micro-exons require splicing-enhancing genomic features to be processed, the splicing of hundreds of micro-exons is enhanced by the adjacent binding of splice factors in the introns of pre-messenger RNAs. Notably, splicing of a significant number of micro-exons was found to be facilitated by the binding of RBFOX proteins, which promote their inclusion in the brain, muscle, and heart. Our analyses suggest that accurate regulation of micro-exon inclusion by RBFOX proteins and PTBP1 plays an important role in the maintenance of tissue-specific protein-protein interactions. © 2015 Li et al.; Published by Cold Spring Harbor Laboratory Press.

The RtcB RNA ligase is an essential component of the metazoan unfolded protein response.

PubMed

Kosmaczewski, Sara Guckian; Edwards, Tyson James; Han, Sung Min; Eckwahl, Matthew J; Meyer, Benjamin Isaiah; Peach, Sally; Hesselberth, Jay R; Wolin, Sandra L; Hammarlund, Marc

2014-12-01

RNA ligation can regulate RNA function by altering RNA sequence, structure and coding potential. For example, the function of XBP1 in mediating the unfolded protein response requires RNA ligation, as does the maturation of some tRNAs. Here, we describe a novel in vivo model in Caenorhabditis elegans for the conserved RNA ligase RtcB and show that RtcB ligates the xbp-1 mRNA during the IRE-1 branch of the unfolded protein response. Without RtcB, protein stress results in the accumulation of unligated xbp-1 mRNA fragments, defects in the unfolded protein response, and decreased lifespan. RtcB also ligates endogenous pre-tRNA halves, and RtcB mutants have defects in growth and lifespan that can be bypassed by expression of pre-spliced tRNAs. In addition, animals that lack RtcB have defects that are independent of tRNA maturation and the unfolded protein response. Thus, RNA ligation by RtcB is required for the function of multiple endogenous target RNAs including both xbp-1 and tRNAs. RtcB is uniquely capable of performing these ligation functions, and RNA ligation by RtcB mediates multiple essential processes in vivo. © 2014 The Authors.
Conservation and diversification of Msx protein in metazoan evolution.

PubMed

Takahashi, Hirokazu; Kamiya, Akiko; Ishiguro, Akira; Suzuki, Atsushi C; Saitou, Naruya; Toyoda, Atsushi; Aruga, Jun

2008-01-01

Msx (/msh) family genes encode homeodomain (HD) proteins that control ontogeny in many animal species. We compared the structures of Msx genes from a wide range of Metazoa (Porifera, Cnidaria, Nematoda, Arthropoda, Tardigrada, Platyhelminthes, Mollusca, Brachiopoda, Annelida, Echiura, Echinodermata, Hemichordata, and Chordata) to gain an understanding of the role of these genes in phylogeny. Exon-intron boundary analysis suggested that the position of the intron located N-terminally to the HDs was widely conserved in all the genes examined, including those of cnidarians. Amino acid (aa) sequence comparison revealed 3 new evolutionarily conserved domains, as well as very strong conservation of the HDs. Two of the three domains were associated with Groucho-like protein binding in both a vertebrate and a cnidarian Msx homolog, suggesting that the interaction between Groucho-like proteins and Msx proteins was established in eumetazoan ancestors. Pairwise comparison among the collected HDs and their C-flanking aa sequences revealed that the degree of sequence conservation varied depending on the animal taxa from which the sequences were derived. Highly conserved Msx genes were identified in the Vertebrata, Cephalochordata, Hemichordata, Echinodermata, Mollusca, Brachiopoda, and Anthozoa. The wide distribution of the conserved sequences in the animal phylogenetic tree suggested that metazoan ancestors had already acquired a set of conserved domains of the current Msx family genes. Interestingly, although strongly conserved sequences were recovered from the Vertebrata, Cephalochordata, and Anthozoa, the sequences from the Urochordata and Hydrozoa showed weak conservation. Because the Vertebrata-Cephalochordata-Urochordata and Anthozoa-Hydrozoa represent sister groups in the Chordata and Cnidaria, respectively, Msx sequence diversification may have occurred differentially in the course of evolution. We speculate that selective loss of the conserved domains in Msx family proteins contributed to the diversification of animal body organization.
Complete mitochondrial genome of the larch hawk moth, Sphinx morio (Lepidoptera: Sphingidae).

PubMed

Kim, Min Jee; Choi, Sei-Woong; Kim, Iksoo

2013-12-01

The larch hawk moth, Sphinx morio, belongs to the lepidopteran family Sphingidae that has long been studied as a family of model insects in a diverse field. In this study, we describe the complete mitochondrial genome (mitogenome) sequences of the species in terms of general genomic features and characteristic short repetitive sequences found in the A + T-rich region. The 15,299-bp-long genome consisted of a typical set of genes (13 protein-coding genes, 2 rRNA genes, and 22 tRNA genes) and one major non-coding A + T-rich region, with the typical arrangement found in Lepidoptera. The 316-bp-long A + T-rich region located between srRNA and tRNA(Met) harbored the conserved sequence blocks that are typically found in lepidopteran insects. Additionally, the A + T-rich region of S. morio contained three characteristic repeat sequences that are rarely found in Lepidoptera: two identical 12-bp repeat, three identical 5-bp-long tandem repeat, and six nearly identical 5-6 bp long repeat sequences.
Perspectives on the mechanism of transcriptional regulation by long non-coding RNAs.

PubMed

Roberts, Thomas C; Morris, Kevin V; Weinberg, Marc S

2014-01-01

Long non-coding RNAs (lncRNAs) are increasingly being recognized as epigenetic regulators of gene transcription. The diversity and complexity of lncRNA genes means that they exert their regulatory effects by a variety of mechanisms. Although there is still much to be learned about the mechanism of lncRNA function, general principles are starting to emerge. In particular, the application of high throughput (deep) sequencing methodologies has greatly advanced our understanding of lncRNA gene function. lncRNAs function as adaptors that link specific chromatin loci with chromatin-remodeling complexes and transcription factors. lncRNAs can act in cis or trans to guide epigenetic-modifier complexes to distinct genomic sites, or act as scaffolds which recruit multiple proteins simultaneously, thereby coordinating their activities. In this review we discuss the genomic organization of lncRNAs, the importance of RNA secondary structure to lncRNA functionality, the multitude of ways in which they interact with the genome, and what evolutionary conservation tells us about their function.
A Potato cDNA Encoding a Homologue of Mammalian Multidrug Resistant P-Glycoprotein

NASA Technical Reports Server (NTRS)

Wang, W.; Takezawa, D.; Poovaiah, B. W.

1996-01-01

A homologue of the multidrug resistance (MDR) gene was obtained while screening a potato stolon tip cDNA expression library with S-15-labeled calmodulin. The mammalian MDR gene codes for a membrane-bound P-glycoprotein (170-180 kDa) which imparts multidrug resistance to cancerous cells. The potato cDNA (PMDR1) codes for a polypeptide of 1313 amino acid residues (ca. 144 kDa) and its structural features are very similar to the MDR P-glycoprotein. The N-terminal half of the PMDR1-encoded protein shares striking homology with its C-terminal half, and each half contains a conserved ATP-binding site and six putative transmembrane domains. Southern blot analysis indicated that potato has one or two MDR-like genes. PMDR1 mRNA is constitutively expressed in all organs studied with higher expression in the stem and stolon tip. The PMDR1 expression was highest during tuber initiation and decreased during tuber development.
In Vitro Anti-Echinococcal and Metabolic Effects of Metformin Involve Activation of AMP-Activated Protein Kinase in Larval Stages of Echinococcus granulosus.

PubMed

Loos, Julia A; Cumino, Andrea C

2015-01-01

Metformin (Met) is a biguanide anti-hyperglycemic agent, which also exerts antiproliferative effects on cancer cells. This drug inhibits the complex I of the mitochondrial electron transport chain inducing a fall in the cell energy charge and leading 5'-AMP-activated protein kinase (AMPK) activation. AMPK is a highly conserved heterotrimeric complex that coordinates metabolic and growth pathways in order to maintain energy homeostasis and cell survival, mainly under nutritional stress conditions, in a Liver Kinase B1 (LKB1)-dependent manner. This work describes for the first time, the in vitro anti-echinococcal effect of Met on Echinococcus granulosus larval stages, as well as the molecular characterization of AMPK (Eg-AMPK) in this parasite of clinical importance. The drug exerted a dose-dependent effect on the viability of both larval stages. Based on this, we proceeded with the identification of the genes encoding for the different subunits of Eg-AMPK. We cloned one gene coding for the catalytic subunit (Eg-ampkɑ) and two genes coding for the regulatory subunits (Eg-ampkβ and Eg-ampkγ), all of them constitutively transcribed in E. granulosus protoscoleces and metacestodes. Their deduced amino acid sequences show all the conserved functional domains, including key amino acids involved in catalytic activity and protein-protein interactions. In protoscoleces, the drug induced the activation of AMPK (Eg-AMPKɑ-P176), possibly as a consequence of cellular energy charge depletion evidenced by assays with the fluorescent indicator JC-1. Met also led to carbohydrate starvation, it increased glucogenolysis and homolactic fermentation, and decreased transcription of intermediary metabolism genes. By in toto immunolocalization assays, we detected Eg-AMPKɑ-P176 expression, both in the nucleus and the cytoplasm of cells as in the larval tegument, the posterior bladder and the calcareous corpuscles of control and Met-treated protoscoleces. Interestingly, expression of Eg-AMPKɑ was observed in the developmental structures during the de-differentiation process from protoscoleces to microcysts. Therefore, the Eg-AMPK expression during the asexual development of E. granulosus, as well as the in vitro synergic therapeutic effects observed in presence of Met plus albendazole sulfoxide (ABZSO), suggest the importance of carrying out chemoprophylactic and clinical efficacy studies combining Met with conventional anti-echinococcal agents to test the potential use of this drug in hydatidosis therapy.
In Vitro Anti-Echinococcal and Metabolic Effects of Metformin Involve Activation of AMP-Activated Protein Kinase in Larval Stages of Echinococcus granulosus

PubMed Central

Loos, Julia A.; Cumino, Andrea C.

2015-01-01

Metformin (Met) is a biguanide anti-hyperglycemic agent, which also exerts antiproliferative effects on cancer cells. This drug inhibits the complex I of the mitochondrial electron transport chain inducing a fall in the cell energy charge and leading 5'-AMP-activated protein kinase (AMPK) activation. AMPK is a highly conserved heterotrimeric complex that coordinates metabolic and growth pathways in order to maintain energy homeostasis and cell survival, mainly under nutritional stress conditions, in a Liver Kinase B1 (LKB1)-dependent manner. This work describes for the first time, the in vitro anti-echinococcal effect of Met on Echinococcus granulosus larval stages, as well as the molecular characterization of AMPK (Eg-AMPK) in this parasite of clinical importance. The drug exerted a dose-dependent effect on the viability of both larval stages. Based on this, we proceeded with the identification of the genes encoding for the different subunits of Eg-AMPK. We cloned one gene coding for the catalytic subunit (Eg-ampkɑ) and two genes coding for the regulatory subunits (Eg-ampkβ and Eg-ampkγ), all of them constitutively transcribed in E. granulosus protoscoleces and metacestodes. Their deduced amino acid sequences show all the conserved functional domains, including key amino acids involved in catalytic activity and protein-protein interactions. In protoscoleces, the drug induced the activation of AMPK (Eg-AMPKɑ-P176), possibly as a consequence of cellular energy charge depletion evidenced by assays with the fluorescent indicator JC-1. Met also led to carbohydrate starvation, it increased glucogenolysis and homolactic fermentation, and decreased transcription of intermediary metabolism genes. By in toto immunolocalization assays, we detected Eg-AMPKɑ-P176 expression, both in the nucleus and the cytoplasm of cells as in the larval tegument, the posterior bladder and the calcareous corpuscles of control and Met-treated protoscoleces. Interestingly, expression of Eg-AMPKɑ was observed in the developmental structures during the de-differentiation process from protoscoleces to microcysts. Therefore, the Eg-AMPK expression during the asexual development of E. granulosus, as well as the in vitro synergic therapeutic effects observed in presence of Met plus albendazole sulfoxide (ABZSO), suggest the importance of carrying out chemoprophylactic and clinical efficacy studies combining Met with conventional anti-echinococcal agents to test the potential use of this drug in hydatidosis therapy. PMID:25965910
An RNA electrophoretic mobility shift and mutational analysis of rnp-4f 5′-UTR intron splicing regulatory proteins in Drosophila reveals a novel new role for a dADAR protein isoform

PubMed Central

Lakshmi, G. Girija; Ghosh, Sushmita; Jones, Gabriel P.; Parikh, Roshni; Rawlins, Bridgette A.; Vaughn, Jack C.

2014-01-01

Alternative splicing greatly enhances the diversity of proteins encoded by eukaryotic genomes, and is also important in gene expression control. In contrast to the great depth of knowledge as to molecular mechanisms in the splicing pathway itself, relatively little is known about the regulatory events behind this process. The 5′-UTR and 3′-UTR in pre-mRNAs play a variety of roles in controlling eukaryotic gene expression, including translational modulation, and nearly 4,000 of the roughly 14,000 protein coding genes in Drosophila contain introns of unknown functional significance in their 5′-UTR. Here we report the results of an RNA electrophoretic mobility shift analysis of Drosophila rnp-4f 5′-UTR intron 0 splicing regulatory proteins. The pre-mRNA potential regulatory element consists of an evolutionarily-conserved 177-nt stem-loop arising from pairing of intron 0 with part of adjacent exon 2. Incubation of in vitro transcribed probe with embryo protein extract is shown to result in two shifted RNA-protein bands, and protein extract from a dADAR null mutant fly line results in only one shifted band. A mutated stem-loop in which the conserved exon 2 primary sequence is changed but secondary structure maintained by introducing compensatory base changes results in diminished band shifts. To test the hypothesis that dADAR plays a role in intron splicing regulation in vivo, levels of unspliced rnp-4f mRNA in dADAR mutant were compared to wild-type via real-time qRT-PCR. The results show that during embryogenesis unspliced rnp-4f mRNA levels fall by up to 85% in the mutant, in support of the hypothesis. Taken together, these results demonstrate a novel role for dADAR protein in rnp-4f 5′-UTR alternative intron splicing regulation which is consistent with a previously proposed model. PMID:23026215
Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone.

PubMed

Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso

2016-12-27

Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.
Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach.

PubMed

Algama, Manjula; Tasker, Edward; Williams, Caitlin; Parslow, Adam C; Bryson-Richardson, Robert J; Keith, Jonathan M

2017-03-27

Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investigate the extent to which these elements include ncRNAs (or conserved domains of ncRNAs) and regulatory sequences. We identified 655 deeply conserved intronic sequences in a genome-wide analysis. We also performed a pathway-focussed analysis on genes involved in muscle development, detecting 27 intronic elements, of which 22 were not detected in the genome-wide analysis. At least 87% of the genome-wide and 70% of the pathway-focussed elements have existing annotations indicative of conserved RNA secondary structure. The expression of 26 of the pathway-focused elements was examined using RT-PCR, providing confirmation that they include expressed ncRNAs. Consistent with previous studies, these elements are significantly over-represented in the introns of transcription factors. This study demonstrates a novel, highly effective, Bayesian approach to identifying conserved non-coding sequences. Our results complement previous findings that these sequences are enriched in transcription factors. However, in contrast to previous studies which suggest the majority of conserved sequences are regulatory factor binding sites, the majority of conserved sequences identified using our approach contain evidence of conserved RNA secondary structures, and our laboratory results suggest most are expressed. Functional roles at DNA and RNA levels are not mutually exclusive, and many of our elements possess evidence of both. Moreover, ncRNAs play roles in transcriptional and post-transcriptional regulation, and this may contribute to the over-representation of these elements in introns of transcription factors. We attribute the higher sensitivity of the pathway-focussed analysis compared to the genome-wide analysis to improved alignment quality, suggesting that enhanced genomic alignments may reveal many more conserved intronic sequences.
Early Evolution of Conserved Regulatory Sequences Associated with Development in Vertebrates

PubMed Central

McEwen, Gayle K.; Goode, Debbie K.; Parker, Hugo J.; Woolfe, Adam; Callaway, Heather; Elgar, Greg

2009-01-01

Comparisons between diverse vertebrate genomes have uncovered thousands of highly conserved non-coding sequences, an increasing number of which have been shown to function as enhancers during early development. Despite their extreme conservation over 500 million years from humans to cartilaginous fish, these elements appear to be largely absent in invertebrates, and, to date, there has been little understanding of their mode of action or the evolutionary processes that have modelled them. We have now exploited emerging genomic sequence data for the sea lamprey, Petromyzon marinus, to explore the depth of conservation of this type of element in the earliest diverging extant vertebrate lineage, the jawless fish (agnathans). We searched for conserved non-coding elements (CNEs) at 13 human gene loci and identified lamprey elements associated with all but two of these gene regions. Although markedly shorter and less well conserved than within jawed vertebrates, identified lamprey CNEs are able to drive specific patterns of expression in zebrafish embryos, which are almost identical to those driven by the equivalent human elements. These CNEs are therefore a unique and defining characteristic of all vertebrates. Furthermore, alignment of lamprey and other vertebrate CNEs should permit the identification of persistent sequence signatures that are responsible for common patterns of expression and contribute to the elucidation of the regulatory language in CNEs. Identifying the core regulatory code for development, common to all vertebrates, provides a foundation upon which regulatory networks can be constructed and might also illuminate how large conserved regulatory sequence blocks evolve and become fixed in genomic DNA. PMID:20011110
Genome-wide identification and characterization of the SBP-box gene family in Petunia.

PubMed

Zhou, Qin; Zhang, Sisi; Chen, Feng; Liu, Baojun; Wu, Lan; Li, Fei; Zhang, Jiaqi; Bao, Manzhu; Liu, Guofeng

2018-03-12

SQUAMOSA PROMOTER BINDING PROTEIN (SBP)-box genes encode a family of plant-specific transcription factors (TFs) that play important roles in many growth and development processes including phase transition, leaf initiation, shoot and inflorescence branching, fruit development and ripening etc. The SBP-box gene family has been identified and characterized in many species, but has not been well studied in Petunia, an important ornamental genus. We identified 21 putative SPL genes of Petunia axillaris and P. inflata from the reference genome of P. axillaris N and P. inflata S6, respectively, which were supported by the transcriptome data. For further confirmation, all the 21 genes were also cloned from P. hybrida line W115 (Mitchel diploid). Phylogenetic analysis based on the highly conserved SBP domains arranged PhSPLs in eight groups, analogous to those from Arabidopsis and tomato. Furthermore, the Petunia SPL genes had similar exon-intron structure and the deduced proteins contained very similar conserved motifs within the same subgroup. Out of 21 PhSPL genes, fourteen were predicted to be potential targets of PhmiR156/157, and the putative miR156/157 response elements (MREs) were located in the coding region of group IV, V, VII and VIII genes, but in the 3'-UTR regions of group VI genes. SPL genes were also identified from another two wild Petunia species, P. integrifolia and P. exserta, based on their transcriptome databases to investigate the origin of PhSPLs. Phylogenetic analysis and multiple alignments of the coding sequences of PhSPLs and their orthologs from wild species indicated that PhSPLs were originated mainly from P. axillaris. qRT-PCR analysis demonstrated differential spatiotemperal expression patterns of PhSPL genes in petunia and many were expressed predominantly in the axillary buds and/or inflorescences. In addition, overexpression of PhSPL9a and PhSPL9b in Arabidopsis suggested that these genes play a conserved role in promoting the vegetative-to-reproductive phase transition. Petunia genome contains at least 21 SPL genes, and most of the genes are expressed in different tissues. The PhSPL genes may play conserved and diverse roles in plant growth and development, including flowering regulation, leaf initiation, axillary bud and inflorescence development. This work provides a comprehensive understanding of the SBP-box gene family in Petunia and lays a significant foundation for future studies on the function and evolution of SPL genes in petunia.
An UPF3-based nonsense-mediated decay in Paramecium.

PubMed

Contreras, Julia; Begley, Victoria; Macias, Sandra; Villalobo, Eduardo

2014-12-01

Nonsense-mediated decay recognises mRNAs containing premature termination codons. One of its components, UPF3, is a molecular link bridging through its binding to the exon junction complex nonsense-mediated decay and splicing. In protists UPF3 has not been identified yet. We report that Paramecium tetraurelia bears an UPF3 gene and that it has a role in nonsense-mediated decay. Interestingly, the identified UPF3 has not conserved the essential amino acids required to bind the exon junction complex. Though, our data indicates that this ciliate bears genes coding for core proteins of the exon junction complex. Copyright © 2014 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
The complete chloroplast genome of two Brassica species, Brassica nigra and B. Oleracea.

PubMed

Seol, Young-Joo; Kim, Kyunghee; Kang, Sang-Ho; Perumal, Sampath; Lee, Jonghoon; Kim, Chang-Kug

2017-03-01

The two Brassica species, Brassica nigra and Brassica oleracea, are important agronomic crops. The chloroplast genome sequences were generated by de novo assembly using whole genome next-generation sequences. The chloroplast genomes of B. nigra and B. oleracea were 153 633 bp and 153 366 bp in size, respectively, and showed conserved typical chloroplast structure. The both chloroplast genomes contained a total of 114 genes including 80 protein-coding genes, 30 tRNA genes, and 4 rRNA genes. Phylogenetic analysis revealed that B. oleracea is closely related to B. rapa and B. napus but B. nigra is more diverse than the neighbor species Raphanus sativus.
Structure and chromosomal localization of the human PD-1 gene (PDCD1)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shinohara, T.; Ishida, Y.; Kawaichi, M.

1994-10-01

A cDNA encoding mouse PD-1, a member of the immunoglobulin superfamily, was previously isolated from apoptosis-induced cells by subtractive hybridization. To determine the structure and chromosomal location of the human PD-1 gene, we screened a human T cell cDNA library by mouse PD-1 probe and isolated a cDNA coding for the human PD-1 protein. The deduced amino acid sequence of human PD-1 was 60% identical to the mouse counterpart, and a putative tyrosine kinase-association motif was well conserved. The human PD-1 gene was mapped to 2q37.3 by chromosomal in situ hybridization. 7 refs., 3 figs.
The metazoan Mediator co-activator complex as an integrative hub for transcriptional regulation.

PubMed

Malik, Sohail; Roeder, Robert G

2010-11-01

The Mediator is an evolutionarily conserved, multiprotein complex that is a key regulator of protein-coding genes. In metazoan cells, multiple pathways that are responsible for homeostasis, cell growth and differentiation converge on the Mediator through transcriptional activators and repressors that target one or more of the almost 30 subunits of this complex. Besides interacting directly with RNA polymerase II, Mediator has multiple functions and can interact with and coordinate the action of numerous other co-activators and co-repressors, including those acting at the level of chromatin. These interactions ultimately allow the Mediator to deliver outputs that range from maximal activation of genes to modulation of basal transcription to long-term epigenetic silencing.
Uridine Affects Liver Protein Glycosylation, Insulin Signaling, and Heme Biosynthesis

PubMed Central

Urasaki, Yasuyo; Pizzorno, Giuseppe; Le, Thuc T.

2014-01-01

Purines and pyrimidines are complementary bases of the genetic code. The roles of purines and their derivatives in cellular signal transduction and energy metabolism are well-known. In contrast, the roles of pyrimidines and their derivatives in cellular function remain poorly understood. In this study, the roles of uridine, a pyrimidine nucleoside, in liver metabolism are examined in mice. We report that short-term uridine administration in C57BL/6J mice increases liver protein glycosylation profiles, reduces phosphorylation level of insulin signaling proteins, and activates the HRI-eIF-2α-ATF4 heme-deficiency stress response pathway. Short-term uridine administration is also associated with reduced liver hemin level and reduced ability for insulin-stimulated blood glucose removal during an insulin tolerance test. Some of the short-term effects of exogenous uridine in C57BL/6J mice are conserved in transgenic UPase1 −/− mice with long-term elevation of endogenous uridine level. UPase1 −/− mice exhibit activation of the liver HRI-eIF-2α-ATF4 heme-deficiency stress response pathway. UPase1 −/− mice also exhibit impaired ability for insulin-stimulated blood glucose removal. However, other short-term effects of exogenous uridine in C57BL/6J mice are not conserved in UPase1 −/− mice. UPase1 −/− mice exhibit normal phosphorylation level of liver insulin signaling proteins and increased liver hemin concentration compared to untreated control C57BL/6J mice. Contrasting short-term and long-term consequences of uridine on liver metabolism suggest that uridine exerts transient effects and elicits adaptive responses. Taken together, our data support potential roles of pyrimidines and their derivatives in the regulation of liver metabolism. PMID:24918436
ROOT HAIR DEFECTIVE SIX-LIKE Class I Genes Promote Root Hair Development in the Grass Brachypodium distachyon

PubMed Central

Kim, Chul Min

2016-01-01

Genes encoding ROOT HAIR DEFECTIVE SIX-LIKE (RSL) class I basic helix loop helix proteins are expressed in future root hair cells of the Arabidopsis thaliana root meristem where they positively regulate root hair cell development. Here we show that there are three RSL class I protein coding genes in the Brachypodium distachyon genome, BdRSL1, BdRSL2 and BdRSL3, and each is expressed in developing root hair cells after the asymmetric cell division that forms root hair cells and hairless epidermal cells. Expression of BdRSL class I genes is sufficient for root hair cell development: ectopic overexpression of any of the three RSL class I genes induces the development of root hairs in every cell of the root epidermis. Expression of BdRSL class I genes in root hairless Arabidopsis thaliana root hair defective 6 (Atrhd6) Atrsl1 double mutants, devoid of RSL class I function, restores root hair development indicating that the function of these proteins has been conserved. However, neither AtRSL nor BdRSL class I genes is sufficient for root hair development in A. thaliana. These data demonstrate that the spatial pattern of class I RSL activity can account for the pattern of root hair cell differentiation in B. distachyon. However, the spatial pattern of class I RSL activity cannot account for the spatial pattern of root hair cells in A. thaliana. Taken together these data indicate that that the functions of RSL class I proteins have been conserved among most angiosperms—monocots and eudicots—despite the dramatically different patterns of root hair cell development. PMID:27494519
FoxK1 splice variants show developmental stage-specific plasticity of expression with temperature in the tiger pufferfish.

PubMed

Fernandes, Jorge M O; MacKenzie, Matthew G; Kinghorn, James R; Johnston, Ian A

2007-10-01

FoxK1 is a member of the highly conserved forkhead/winged helix (Fox) family of transcription factors and it is known to play a key role in mammalian muscle development and myogenic stem cell function. The tiger pufferfish (Takifugu rubripes) orthologue of mammalian FoxK1 (TFoxK1) has seven exons and is located in a region of conserved synteny between pufferfish and mouse. TFoxK1 is expressed as three alternative transcripts: TFoxK1-alpha, TFoxK1-gamma and TFoxK1-delta. TFoxK1-alpha is the orthologue of mouse FoxK1-alpha, coding for a putative protein of 558 residues that contains the forkhead and forkhead-associated domains typical of Fox proteins and shares 53% global identity with its mammalian homologue. TFoxK1-gamma and TFoxK1-delta arise from intron retention events and these transcripts translate into the same 344-amino acid protein with a truncated forkhead domain. Neither are orthologues of mouse FoxK1-beta. In adult fish, the TFoxK1 splice variants were differentially expressed between fast and slow myotomal muscle, as well as other tissues, and the FoxK1-alpha protein was expressed in myogenic progenitor cells of fast myotomal muscle. During embryonic development, TFoxK1 was transiently expressed in the developing somites, heart, brain and eye. The relative expression of TFoxK1-alpha and the other two alternative transcripts varied with the incubation temperature regime for equivalent embryonic stages and the differences were particularly marked at later developmental stages. The developmental expression pattern of TFoxK1 and its localisation to mononuclear myogenic progenitor cells in adult fast muscle indicate that it may play an essential role in myogenesis in T. rubripes.
A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions

PubMed Central

Abnousi, Armen; Broschat, Shira L.; Kalyanaraman, Ananth

2016-01-01

Background Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions. However, identifying conserved regions from large collections (millions) of protein sequences presents significant challenges. Methods In this paper we present a new, alignment-free method for detecting conserved regions in protein sequences called NADDA (No-Alignment Domain Detection Algorithm). Our method exploits the abundance of exact matching short subsequences (k-mers) to quickly detect conserved regions, and the power of machine learning is used to improve the prediction accuracy of detection. We present a parallel implementation of NADDA using the MapReduce framework and show that our method is highly scalable. Results We have compared NADDA with Pfam and InterPro databases. For known domains annotated by Pfam, accuracy is 83%, sensitivity 96%, and specificity 44%. For sequences with new domains not present in the training set an average accuracy of 63% is achieved when compared to Pfam. A boost in results in comparison with InterPro demonstrates the ability of NADDA to capture conserved regions beyond those present in Pfam. We have also compared NADDA with ADDA and MKDOM2, assuming Pfam as ground-truth. On average NADDA shows comparable accuracy, more balanced sensitivity and specificity, and being alignment-free, is significantly faster. Excluding the one-time cost of training, runtimes on a single processor were 49s, 10,566s, and 456s for NADDA, ADDA, and MKDOM2, respectively, for a data set comprised of approximately 2500 sequences. PMID:27552220

Some links on this page may take you to non-federal websites. Their policies may differ from this site.