Sample records for poor sequence conservation

  1. Conserved sequence-specific lincRNA-steroid receptor interactions drive transcriptional repression and direct cell fate

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hudson, William H.; Pickard, Mark R.; de Vera, Ian Mitchelle S.

    2014-12-23

    The majority of the eukaryotic genome is transcribed, generating a significant number of long intergenic noncoding RNAs (lincRNAs). Although lincRNAs represent the most poorly understood product of transcription, recent work has shown lincRNAs fulfill important cellular functions. In addition to low sequence conservation, poor understanding of structural mechanisms driving lincRNA biology hinders systematic prediction of their function. Here we report the molecular requirements for the recognition of steroid receptors (SRs) by the lincRNA growth arrest-specific 5 (Gas5), which regulates steroid-mediated transcriptional regulation, growth arrest and apoptosis. We identify the functional Gas5-SR interface and generate point mutations that ablate the SR-Gas5more » lincRNA interaction, altering Gas5-driven apoptosis in cancer cell lines. Further, we find that the Gas5 SR-recognition sequence is conserved among haplorhines, with its evolutionary origin as a splice acceptor site. This study demonstrates that lincRNAs can recognize protein targets in a conserved, sequence-specific manner in order to affect critical cell functions.« less

  2. A conserved post-transcriptional BMP2 switch in lung cells.

    PubMed

    Jiang, Shan; Fritz, David T; Rogers, Melissa B

    2010-05-15

    An ultra-conserved sequence in the bone morphogenetic protein 2 (BMP2) 3' untranslated region (UTR) markedly represses BMP2 expression in non-transformed lung cells. In contrast, the ultra-conserved sequence stimulates BMP2 expression in transformed lung cells. The ultra-conserved sequence functions as a post-transcriptional cis-regulatory switch. A common single-nucleotide polymorphism (SNP, rs15705, +A1123C), which has been shown to influence human morphology, disrupts a conserved element within the ultra-conserved sequence and altered reporter gene activity in non-transformed lung cells. This polymorphism changed the affinity of the BMP2 RNA for several proteins including nucleolin, which has an increased affinity for the C allele. Elevated BMP2 synthesis is associated with increased malignancy in mouse models of lung cancer and poor lung cancer patient prognosis. Understanding the cis- and trans-regulatory factors that control BMP2 synthesis is relevant to the initiation or progression of pathologies associated with abnormal BMP2 levels. (c) 2010 Wiley-Liss, Inc.

  3. A comparative genomics strategy for targeted discovery of single-nucleotide polymorphisms and conserved-noncoding sequences in orphan crops.

    PubMed

    Feltus, F A; Singh, H P; Lohithaswa, H C; Schulze, S R; Silva, T D; Paterson, A H

    2006-04-01

    Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species.

  4. A Comparative Genomics Strategy for Targeted Discovery of Single-Nucleotide Polymorphisms and Conserved-Noncoding Sequences in Orphan Crops1[W

    PubMed Central

    Feltus, F.A.; Singh, H.P.; Lohithaswa, H.C.; Schulze, S.R.; Silva, T.D.; Paterson, A.H.

    2006-01-01

    Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species. PMID:16607031

  5. The crystal structure of Erwinia amylovora AmyR, a member of the YbjN protein family, shows similarity to type III secretion chaperones but suggests different cellular functions

    PubMed Central

    Bartho, Joseph D.; Bellini, Dom; Wuerges, Jochen; Demitri, Nicola; Toccafondi, Mirco; Schmitt, Armin O.; Zhao, Youfu; Walsh, Martin A.

    2017-01-01

    AmyR is a stress and virulence associated protein from the plant pathogenic Enterobacteriaceae species Erwinia amylovora, and is a functionally conserved ortholog of YbjN from Escherichia coli. The crystal structure of E. amylovora AmyR reveals a class I type III secretion chaperone-like fold, despite the lack of sequence similarity between these two classes of protein and lacking any evidence of a secretion-associated role. The results indicate that AmyR, and YbjN proteins in general, function through protein-protein interactions without any enzymatic action. The YbjN proteins of Enterobacteriaceae show remarkably low sequence similarity with other members of the YbjN protein family in Eubacteria, yet a high level of structural conservation is observed. Across the YbjN protein family sequence conservation is limited to residues stabilising the protein core and dimerization interface, while interacting regions are only conserved between closely related species. This study presents the first structure of a YbjN protein from Enterobacteriaceae, the most highly divergent and well-studied subgroup of YbjN proteins, and an in-depth sequence and structural analysis of this important but poorly understood protein family. PMID:28426806

  6. The crystal structure of Erwinia amylovora AmyR, a member of the YbjN protein family, shows similarity to type III secretion chaperones but suggests different cellular functions.

    PubMed

    Bartho, Joseph D; Bellini, Dom; Wuerges, Jochen; Demitri, Nicola; Toccafondi, Mirco; Schmitt, Armin O; Zhao, Youfu; Walsh, Martin A; Benini, Stefano

    2017-01-01

    AmyR is a stress and virulence associated protein from the plant pathogenic Enterobacteriaceae species Erwinia amylovora, and is a functionally conserved ortholog of YbjN from Escherichia coli. The crystal structure of E. amylovora AmyR reveals a class I type III secretion chaperone-like fold, despite the lack of sequence similarity between these two classes of protein and lacking any evidence of a secretion-associated role. The results indicate that AmyR, and YbjN proteins in general, function through protein-protein interactions without any enzymatic action. The YbjN proteins of Enterobacteriaceae show remarkably low sequence similarity with other members of the YbjN protein family in Eubacteria, yet a high level of structural conservation is observed. Across the YbjN protein family sequence conservation is limited to residues stabilising the protein core and dimerization interface, while interacting regions are only conserved between closely related species. This study presents the first structure of a YbjN protein from Enterobacteriaceae, the most highly divergent and well-studied subgroup of YbjN proteins, and an in-depth sequence and structural analysis of this important but poorly understood protein family.

  7. Primary structures of ribosomal proteins from the archaebacterium Halobacterium marismortui and the eubacterium Bacillus stearothermophilus.

    PubMed

    Arndt, E; Scholzen, T; Krömer, W; Hatakeyama, T; Kimura, M

    1991-06-01

    Approximately 40 ribosomal proteins from each Halobacterium marismortui and Bacillus stearothermophilus have been sequenced either by direct protein sequence analysis or by DNA sequence analysis of the appropriate genes. The comparison of the amino acid sequences from the archaebacterium H marismortui with the available ribosomal proteins from the eubacterial and eukaryotic kingdoms revealed four different groups of proteins: 24 proteins are related to both eubacterial as well as eukaryotic proteins. Eleven proteins are exclusively related to eukaryotic counterparts. For three proteins only eubacterial relatives-and for another three proteins no counterpart-could be found. The similarities of the halobacterial ribosomal proteins are in general somewhat higher to their eukaryotic than to their eubacterial counterparts. The comparison of B stearothermophilus proteins with their E coli homologues showed that the proteins evolved at different rates. Some proteins are highly conserved with 64-76% identity, others are poorly conserved with only 25-34% identical amino acid residues.

  8. The PRC2-binding long non-coding RNAs in human and mouse genomes are associated with predictive sequence features

    NASA Astrophysics Data System (ADS)

    Tu, Shiqi; Yuan, Guo-Cheng; Shao, Zhen

    2017-01-01

    Recently, long non-coding RNAs (lncRNAs) have emerged as an important class of molecules involved in many cellular processes. One of their primary functions is to shape epigenetic landscape through interactions with chromatin modifying proteins. However, mechanisms contributing to the specificity of such interactions remain poorly understood. Here we took the human and mouse lncRNAs that were experimentally determined to have physical interactions with Polycomb repressive complex 2 (PRC2), and systematically investigated the sequence features of these lncRNAs by developing a new computational pipeline for sequences composition analysis, in which each sequence is considered as a series of transitions between adjacent nucleotides. Through that, PRC2-binding lncRNAs were found to be associated with a set of distinctive and evolutionarily conserved sequence features, which can be utilized to distinguish them from the others with considerable accuracy. We further identified fragments of PRC2-binding lncRNAs that are enriched with these sequence features, and found they show strong PRC2-binding signals and are more highly conserved across species than the other parts, implying their functional importance.

  9. Conservative Forgetful Scholars: How People Learn Causal Structure through Sequences of Interventions

    ERIC Educational Resources Information Center

    Bramley, Neil R.; Lagnado, David A.; Speekenbrink, Maarten

    2015-01-01

    Interacting with a system is key to uncovering its causal structure. A computational framework for interventional causal learning has been developed over the last decade, but how real causal learners might achieve or approximate the computations entailed by this framework is still poorly understood. Here we describe an interactive computer task in…

  10. Stellar Rotation on the Main Sequence

    NASA Astrophysics Data System (ADS)

    Soderblom, D.; Murdin, P.

    2000-11-01

    The conservation of ANGULAR MOMENTUM is the one effective counterbalance to the inexorable pull of gravity in the universe, and so everything rotates. Stars acquire their angular momentum when they form, and, indeed, the manner in which nearly all this initial angular momentum is dissipated remains poorly understood, but without substantial angular momentum loss an interstellar cloud could never ...

  11. Protection of CpG islands from DNA methylation is DNA-encoded and evolutionarily conserved

    PubMed Central

    Long, Hannah K.; King, Hamish W.; Patient, Roger K.; Odom, Duncan T.; Klose, Robert J.

    2016-01-01

    DNA methylation is a repressive epigenetic modification that covers vertebrate genomes. Regions known as CpG islands (CGIs), which are refractory to DNA methylation, are often associated with gene promoters and play central roles in gene regulation. Yet how CGIs in their normal genomic context evade the DNA methylation machinery and whether these mechanisms are evolutionarily conserved remains enigmatic. To address these fundamental questions we exploited a transchromosomic animal model and genomic approaches to understand how the hypomethylated state is formed in vivo and to discover whether mechanisms governing CGI formation are evolutionarily conserved. Strikingly, insertion of a human chromosome into mouse revealed that promoter-associated CGIs are refractory to DNA methylation regardless of host species, demonstrating that DNA sequence plays a central role in specifying the hypomethylated state through evolutionarily conserved mechanisms. In contrast, elements distal to gene promoters exhibited more variable methylation between host species, uncovering a widespread dependence on nucleotide frequency and occupancy of DNA-binding transcription factors in shaping the DNA methylation landscape away from gene promoters. This was exemplified by young CpG rich lineage-restricted repeat sequences that evaded DNA methylation in the absence of co-evolved mechanisms targeting methylation to these sequences, and species specific DNA binding events that protected against DNA methylation in CpG poor regions. Finally, transplantation of mouse chromosomal fragments into the evolutionarily distant zebrafish uncovered the existence of a mechanistically conserved and DNA-encoded logic which shapes CGI formation across vertebrate species. PMID:27084945

  12. PASS2: an automated database of protein alignments organised as structural superfamilies.

    PubMed

    Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan

    2004-04-02

    The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html

  13. Sturgeon conservation genomics: SNP discovery and validation using RAD sequencing.

    PubMed

    Ogden, R; Gharbi, K; Mugue, N; Martinsohn, J; Senn, H; Davey, J W; Pourkazemi, M; McEwing, R; Eland, C; Vidotto, M; Sergeev, A; Congiu, L

    2013-06-01

    Caviar-producing sturgeons belonging to the genus Acipenser are considered to be one of the most endangered species groups in the world. Continued overfishing in spite of increasing legislation, zero catch quotas and extensive aquaculture production have led to the collapse of wild stocks across Europe and Asia. The evolutionary relationships among Adriatic, Russian, Persian and Siberian sturgeons are complex because of past introgression events and remain poorly understood. Conservation management, traceability and enforcement suffer a lack of appropriate DNA markers for the genetic identification of sturgeon at the species, population and individual level. This study employed RAD sequencing to discover and characterize single nucleotide polymorphism (SNP) DNA markers for use in sturgeon conservation in these four tetraploid species over three biological levels, using a single sequencing lane. Four population meta-samples and eight individual samples from one family were barcoded separately before sequencing. Analysis of 14.4 Gb of paired-end RAD data focused on the identification of SNPs in the paired-end contig, with subsequent in silico and empirical validation of candidate markers. Thousands of putatively informative markers were identified including, for the first time, SNPs that show population-wide differentiation between Russian and Persian sturgeons, representing an important advance in our ability to manage these cryptic species. The results highlight the challenges of genotyping-by-sequencing in polyploid taxa, while establishing the potential genetic resources for developing a new range of caviar traceability and enforcement tools. © 2013 John Wiley & Sons Ltd.

  14. Structure-Related Roles for the Conservation of the HIV-1 Fusion Peptide Sequence Revealed by Nuclear Magnetic Resonance.

    PubMed

    Serrano, Soraya; Huarte, Nerea; Rujas, Edurne; Andreu, David; Nieva, José L; Jiménez, María Angeles

    2017-10-17

    Despite extensive characterization of the human immunodeficiency virus type 1 (HIV-1) hydrophobic fusion peptide (FP), the structure-function relationships underlying its extraordinary degree of conservation remain poorly understood. Specifically, the fact that the tandem repeat of the FLGFLG tripeptide is absolutely conserved suggests that high hydrophobicity may not suffice to unleash FP function. Here, we have compared the nuclear magnetic resonance (NMR) structures adopted in nonpolar media by two FP surrogates, wtFP-tag and scrFP-tag, which had equal hydrophobicity but contained wild-type and scrambled core sequences LFLGFLG and FGLLGFL, respectively. In addition, these peptides were tagged at their C-termini with an epitope sequence that folded independently, thereby allowing Western blot detection without interfering with FP structure. We observed similar α-helical FP conformations for both specimens dissolved in the low-polarity medium 25% (v/v) 1,1,1,3,3,3-hexafluoro-2-propanol (HFIP), but important differences in contact with micelles of the membrane mimetic dodecylphosphocholine (DPC). Thus, whereas wtFP-tag preserved a helix displaying a Gly-rich ridge, the scrambled sequence lost in great part the helical structure upon being solubilized in DPC. Western blot analyses further revealed the capacity of wtFP-tag to assemble trimers in membranes, whereas membrane oligomers were not observed in the case of the scrFP-tag sequence. We conclude that, beyond hydrophobicity, preserving sequence order is an important feature for defining the secondary structures and oligomeric states adopted by the HIV FP in membranes.

  15. Protection of CpG islands from DNA methylation is DNA-encoded and evolutionarily conserved.

    PubMed

    Long, Hannah K; King, Hamish W; Patient, Roger K; Odom, Duncan T; Klose, Robert J

    2016-08-19

    DNA methylation is a repressive epigenetic modification that covers vertebrate genomes. Regions known as CpG islands (CGIs), which are refractory to DNA methylation, are often associated with gene promoters and play central roles in gene regulation. Yet how CGIs in their normal genomic context evade the DNA methylation machinery and whether these mechanisms are evolutionarily conserved remains enigmatic. To address these fundamental questions we exploited a transchromosomic animal model and genomic approaches to understand how the hypomethylated state is formed in vivo and to discover whether mechanisms governing CGI formation are evolutionarily conserved. Strikingly, insertion of a human chromosome into mouse revealed that promoter-associated CGIs are refractory to DNA methylation regardless of host species, demonstrating that DNA sequence plays a central role in specifying the hypomethylated state through evolutionarily conserved mechanisms. In contrast, elements distal to gene promoters exhibited more variable methylation between host species, uncovering a widespread dependence on nucleotide frequency and occupancy of DNA-binding transcription factors in shaping the DNA methylation landscape away from gene promoters. This was exemplified by young CpG rich lineage-restricted repeat sequences that evaded DNA methylation in the absence of co-evolved mechanisms targeting methylation to these sequences, and species specific DNA binding events that protected against DNA methylation in CpG poor regions. Finally, transplantation of mouse chromosomal fragments into the evolutionarily distant zebrafish uncovered the existence of a mechanistically conserved and DNA-encoded logic which shapes CGI formation across vertebrate species. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. Regulation of G-protein coupled receptor traffic by an evolutionary conserved hydrophobic signal.

    PubMed

    Angelotti, Tim; Daunt, David; Shcherbakova, Olga G; Kobilka, Brian; Hurt, Carl M

    2010-04-01

    Plasma membrane (PM) expression of G-protein coupled receptors (GPCRs) is required for activation by extracellular ligands; however, mechanisms that regulate PM expression of GPCRs are poorly understood. For some GPCRs, such as alpha2c-adrenergic receptors (alpha(2c)-ARs), heterologous expression in non-native cells results in limited PM expression and extensive endoplasmic reticulum (ER) retention. Recently, ER export/retentions signals have been proposed to regulate cellular trafficking of several GPCRs. By utilizing a chimeric alpha(2a)/alpha(2c)-AR strategy, we identified an evolutionary conserved hydrophobic sequence (ALAAALAAAAA) in the extracellular amino terminal region that is responsible in part for alpha(2c)-AR subtype-specific trafficking. To our knowledge, this is the first luminal ER retention signal reported for a GPCR. Removal or disruption of the ER retention signal dramatically increased PM expression and decreased ER retention. Conversely, transplantation of this hydrophobic sequence into alpha(2a)-ARs reduced their PM expression and increased ER retention. This evolutionary conserved hydrophobic trafficking signal within alpha(2c)-ARs serves as a regulator of GPCR trafficking.

  17. ChIP-seq Identification of Weakly Conserved Heart Enhancers

    PubMed Central

    Blow, Matthew J.; McCulley, David J.; Li, Zirong; Zhang, Tao; Akiyama, Jennifer A.; Holt, Amy; Plajzer-Frick, Ingrid; Shoukry, Malak; Wright, Crystal; Chen, Feng; Afzal, Veena; Bristow, James; Ren, Bing; Black, Brian L.; Rubin, Edward M.; Visel, Axel; Pennacchio, Len A.

    2011-01-01

    Accurate control of tissue-specific gene expression plays a pivotal role in heart development, but few cardiac transcriptional enhancers have thus far been identified. Extreme non-coding sequence conservation successfully predicts enhancers active in many tissues, but fails to identify substantial numbers of heart enhancers. Here we used ChIP-seq with the enhancer-associated protein p300 from mouse embryonic day 11.5 heart tissue to identify over three thousand candidate heart enhancers genome-wide. Compared to other tissues studied at this time-point, most candidate heart enhancers are less deeply conserved in vertebrate evolution. Nevertheless, the testing of 130 candidate regions in a transgenic mouse assay revealed that most of them reproducibly function as enhancers active in the heart, irrespective of their degree of evolutionary constraint. These results provide evidence for a large population of poorly conserved heart enhancers and suggest that the evolutionary constraint of embryonic enhancers can vary depending on tissue type. PMID:20729851

  18. Molecular Evolution of the Non-Coding Eosinophil Granule Ontogeny Transcript

    PubMed Central

    Rose, Dominic; Stadler, Peter F.

    2011-01-01

    Eukaryotic genomes are pervasively transcribed. A large fraction of the transcriptional output consists of long, mRNA-like, non-protein-coding transcripts (mlncRNAs). The evolutionary history of mlncRNAs is still largely uncharted territory. In this contribution, we explore in detail the evolutionary traces of the eosinophil granule ontogeny transcript (EGOT), an experimentally confirmed representative of an abundant class of totally intronic non-coding transcripts (TINs). EGOT is located antisense to an intron of the ITPR1 gene. We computationally identify putative EGOT orthologs in the genomes of 32 different amniotes, including orthologs from primates, rodents, ungulates, carnivores, afrotherians, and xenarthrans, as well as putative candidates from basal amniotes, such as opossum or platypus. We investigate the EGOT gene phylogeny, analyze patterns of sequence conservation, and the evolutionary conservation of the EGOT gene structure. We show that EGO-B, the spliced isoform, may be present throughout the placental mammals, but most likely dates back even further. We demonstrate here for the first time that the whole EGOT locus is highly structured, containing several evolutionary conserved, and thermodynamic stable secondary structures. Our analyses allow us to postulate novel functional roles of a hitherto poorly understood region at the intron of EGO-B which is highly conserved at the sequence level. The region contains a novel ITPR1 exon and also conserved RNA secondary structures together with a conserved TATA-like element, which putatively acts as a promoter of an independent regulatory element. PMID:22303364

  19. Arthropod phylogenetics in light of three novel millipede (myriapoda: diplopoda) mitochondrial genomes with comments on the appropriateness of mitochondrial genome sequence data for inferring deep level relationships.

    PubMed

    Brewer, Michael S; Swafford, Lynn; Spruill, Chad L; Bond, Jason E

    2013-01-01

    Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the resulting tree topologies as suspect. As such, these data are likely inappropriate for investigating such ancient relationships.

  20. Two estrogen response element sequences near the PCNA gene are not responsible for its estrogen-enhanced expression in MCF7 cells.

    PubMed

    Wang, Cheng; Yu, Jie; Kallen, Caleb B

    2008-01-01

    The proliferating cell nuclear antigen (PCNA) is an essential component of DNA replication, cell cycle regulation, and epigenetic inheritance. High expression of PCNA is associated with poor prognosis in patients with breast cancer. The 5'-region of the PCNA gene contains two computationally-detected estrogen response element (ERE) sequences, one of which is evolutionarily conserved. Both of these sequences are of undocumented cis-regulatory function. We recently demonstrated that estradiol (E2) enhances PCNA mRNA expression in MCF7 breast cancer cells. MCF7 cells proliferate in response to E2. Here, we demonstrate that E2 rapidly enhanced PCNA mRNA and protein expression in a process that requires ERalpha as well as de novo protein synthesis. One of the two upstream ERE sequences was specifically bound by ERalpha-containing protein complexes, in vitro, in gel shift analysis. Yet, each ERE sequence, when cloned as a single copy, or when engineered as two tandem copies of the ERE-containing sequence, was not capable of activating a luciferase reporter construct in response to E2. In MCF7 cells, neither ERE-containing genomic region demonstrated E2-dependent recruitment of ERalpha by sensitive ChIP-PCR assays. We conclude that E2 enhances PCNA gene expression by an indirect process and that computational detection of EREs, even when evolutionarily conserved and when near E2-responsive genes, requires biochemical validation.

  1. Genome-wide comparative analysis reveals human-mouse regulatory landscape and evolution.

    PubMed

    Denas, Olgert; Sandstrom, Richard; Cheng, Yong; Beal, Kathryn; Herrero, Javier; Hardison, Ross C; Taylor, James

    2015-02-14

    Because species-specific gene expression is driven by species-specific regulation, understanding the relationship between sequence and function of the regulatory regions in different species will help elucidate how differences among species arise. Despite active experimental and computational research, relationships among sequence, conservation, and function are still poorly understood. We compared transcription factor occupied segments (TFos) for 116 human and 35 mouse TFs in 546 human and 125 mouse cell types and tissues from the Human and the Mouse ENCODE projects. We based the map between human and mouse TFos on a one-to-one nucleotide cross-species mapper, bnMapper, that utilizes whole genome alignments (WGA). Our analysis shows that TFos are under evolutionary constraint, but a substantial portion (25.1% of mouse and 25.85% of human on average) of the TFos does not have a homologous sequence on the other species; this portion varies among cell types and TFs. Furthermore, 47.67% and 57.01% of the homologous TFos sequence shows binding activity on the other species for human and mouse respectively. However, 79.87% and 69.22% is repurposed such that it binds the same TF in different cells or different TFs in the same cells. Remarkably, within the set of repurposed TFos, the corresponding genome regions in the other species are preferred locations of novel TFos. These events suggest exaptation of some functional regulatory sequences into new function. Despite TFos repurposing, we did not find substantial changes in their predicted target genes, suggesting that CRMs buffer evolutionary events allowing little or no change in the TFos - target gene associations. Thus, the small portion of TFos with strictly conserved occupancy underestimates the degree of conservation of regulatory interactions. We mapped regulatory sequences from an extensive number of TFs and cell types between human and mouse using WGA. A comparative analysis of this correspondence unveiled the extent of the shared regulatory sequence across TFs and cell types under study. Importantly, a large part of the shared regulatory sequence is repurposed on the other species. This sequence, fueled by turnover events, provides a strong case for exaptation in regulatory elements.

  2. The sequence, structure and evolutionary features of HOTAIR in mammals

    PubMed Central

    2011-01-01

    Background An increasing number of long noncoding RNAs (lncRNAs) have been identified recently. Different from all the others that function in cis to regulate local gene expression, the newly identified HOTAIR is located between HoxC11 and HoxC12 in the human genome and regulates HoxD expression in multiple tissues. Like the well-characterised lncRNA Xist, HOTAIR binds to polycomb proteins to methylate histones at multiple HoxD loci, but unlike Xist, many details of its structure and function, as well as the trans regulation, remain unclear. Moreover, HOTAIR is involved in the aberrant regulation of gene expression in cancer. Results To identify conserved domains in HOTAIR and study the phylogenetic distribution of this lncRNA, we searched the genomes of 10 mammalian and 3 non-mammalian vertebrates for matches to its 6 exons and the two conserved domains within the 1800 bp exon6 using Infernal. There was just one high-scoring hit for each mammal, but many low-scoring hits were found in both mammals and non-mammalian vertebrates. These hits and their flanking genes in four placental mammals and platypus were examined to determine whether HOTAIR contained elements shared by other lncRNAs. Several of the hits were within unknown transcripts or ncRNAs, many were within introns of, or antisense to, protein-coding genes, and conservation of the flanking genes was observed only between human and chimpanzee. Phylogenetic analysis revealed discrete evolutionary dynamics for orthologous sequences of HOTAIR exons. Exon1 at the 5' end and a domain in exon6 near the 3' end, which contain domains that bind to multiple proteins, have evolved faster in primates than in other mammals. Structures were predicted for exon1, two domains of exon6 and the full HOTAIR sequence. The sequence and structure of two fragments, in exon1 and the domain B of exon6 respectively, were identified to robustly occur in predicted structures of exon1, domain B of exon6 and the full HOTAIR in mammals. Conclusions HOTAIR exists in mammals, has poorly conserved sequences and considerably conserved structures, and has evolved faster than nearby HoxC genes. Exons of HOTAIR show distinct evolutionary features, and a 239 bp domain in the 1804 bp exon6 is especially conserved. These features, together with the absence of some exons and sequences in mouse, rat and kangaroo, suggest ab initio generation of HOTAIR in marsupials. Structure prediction identifies two fragments in the 5' end exon1 and the 3' end domain B of exon6, with sequence and structure invariably occurring in various predicted structures of exon1, the domain B of exon6 and the full HOTAIR. PMID:21496275

  3. Sequencing Conservation Actions Through Threat Assessments in the Southeastern United States

    Treesearch

    Robert D. Sutter; Christopher C. Szell

    2006-01-01

    The identification of conservation priorities is one of the leading issues in conservation biology. We present a project of The Nature Conservancy, called Sequencing Conservation Actions, which prioritizes conservation areas and identifies foci for crosscutting strategies at various geographic scales. We use the term “Sequencing” to mean an ordering of actions over...

  4. On the relationship between residue structural environment and sequence conservation in proteins.

    PubMed

    Liu, Jen-Wei; Lin, Jau-Ji; Cheng, Chih-Wen; Lin, Yu-Feng; Hwang, Jenn-Kang; Huang, Tsun-Tsao

    2017-09-01

    Residues that are crucial to protein function or structure are usually evolutionarily conserved. To identify the important residues in protein, sequence conservation is estimated, and current methods rely upon the unbiased collection of homologous sequences. Surprisingly, our previous studies have shown that the sequence conservation is closely correlated with the weighted contact number (WCN), a measure of packing density for residue's structural environment, calculated only based on the C α positions of a protein structure. Moreover, studies have shown that sequence conservation is correlated with environment-related structural properties calculated based on different protein substructures, such as a protein's all atoms, backbone atoms, side-chain atoms, or side-chain centroid. To know whether the C α atomic positions are adequate to show the relationship between residue environment and sequence conservation or not, here we compared C α atoms with other substructures in their contributions to the sequence conservation. Our results show that C α positions are substantially equivalent to the other substructures in calculations of various measures of residue environment. As a result, the overlapping contributions between C α atoms and the other substructures are high, yielding similar structure-conservation relationship. Take the WCN as an example, the average overlapping contribution to sequence conservation is 87% between C α and all-atom substructures. These results indicate that only C α atoms of a protein structure could reflect sequence conservation at the residue level. © 2017 Wiley Periodicals, Inc.

  5. Repetitive sequences: the hidden diversity of heterochromatin in prochilodontid fish

    PubMed Central

    Terencio, Maria L.; Schneider, Carlos H.; Gross, Maria C.; do Carmo, Edson Junior; Nogaroto, Viviane; de Almeida, Mara Cristina; Artoni, Roberto Ferreira; Vicari, Marcelo R.; Feldberg, Eliana

    2015-01-01

    Abstract The structure and organization of repetitive elements in fish genomes are still relatively poorly understood, although most of these elements are believed to be located in heterochromatic regions. Repetitive elements are considered essential in evolutionary processes as hotspots for mutations and chromosomal rearrangements, among other functions – thus providing new genomic alternatives and regulatory sites for gene expression. The present study sought to characterize repetitive DNA sequences in the genomes of Semaprochilodus insignis (Jardine & Schomburgk, 1841) and Semaprochilodus taeniurus (Valenciennes, 1817) and identify regions of conserved syntenic blocks in this genome fraction of three species of Prochilodontidae (Semaprochilodus insignis, Semaprochilodus taeniurus, and Prochilodus lineatus (Valenciennes, 1836) by cross-FISH using Cot-1 DNA (renaturation kinetics) probes. We found that the repetitive fractions of the genomes of Semaprochilodus insignis and Semaprochilodus taeniurus have significant amounts of conserved syntenic blocks in hybridization sites, but with low degrees of similarity between them and the genome of Prochilodus lineatus, especially in relation to B chromosomes. The cloning and sequencing of the repetitive genomic elements of Semaprochilodus insignis and Semaprochilodus taeniurus using Cot-1 DNA identified 48 fragments that displayed high similarity with repetitive sequences deposited in public DNA databases and classified as microsatellites, transposons, and retrotransposons. The repetitive fractions of the Semaprochilodus insignis and Semaprochilodus taeniurus genomes exhibited high degrees of conserved syntenic blocks in terms of both the structures and locations of hybridization sites, but a low degree of similarity with the syntenic blocks of the Prochilodus lineatus genome. Future comparative analyses of other prochilodontidae species will be needed to advance our understanding of the organization and evolution of the genomes in this group of fish. PMID:26752156

  6. Revisiting the phylogeny of Zoanthidea (Cnidaria: Anthozoa): Staggered alignment of hypervariable sequences improves species tree inference.

    PubMed

    Swain, Timothy D

    2018-01-01

    The recent rapid proliferation of novel taxon identification in the Zoanthidea has been accompanied by a parallel propagation of gene trees as a tool of species discovery, but not a corresponding increase in our understanding of phylogeny. This disparity is caused by the trade-off between the capabilities of automated DNA sequence alignment and data content of genes applied to phylogenetic inference in this group. Conserved genes or segments are easily aligned across the order, but produce poorly resolved trees; hypervariable genes or segments contain the evolutionary signal necessary for resolution and robust support, but sequence alignment is daunting. Staggered alignments are a form of phylogeny-informed sequence alignment composed of a mosaic of local and universal regions that allow phylogenetic inference to be applied to all nucleotides from both hypervariable and conserved gene segments. Comparisons between species tree phylogenies inferred from all data (staggered alignment) and hypervariable-excluded data (standard alignment) demonstrate improved confidence and greater topological agreement with other sources of data for the complete-data tree. This novel phylogeny is the most comprehensive to date (in terms of taxa and data) and can serve as an expandable tool for evolutionary hypothesis testing in the Zoanthidea. Spanish language abstract available in Text S1. Translation by L. O. Swain, DePaul University, Chicago, Illinois, 60604, USA. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Identifying functionally informative evolutionary sequence profiles.

    PubMed

    Gil, Nelson; Fiser, Andras

    2018-04-15

    Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases. Availability and implementation: A freely accessible program, including source code, implementing SAMMI is available at https://github.com/nelsongil92/SAMMI.git. andras.fiser@einstein.yu.edu. Supplementary data are available at Bioinformatics online.

  8. Coevolution Pattern and Functional Conservation or Divergence of miR167s and their targets across Diverse Plant Species

    PubMed Central

    Barik, Suvakanta; Kumar, Ashutosh; Sarkar Das, Shabari; Yadav, Sandeep; Gautam, Vibhav; Singh, Archita; Singh, Sharmila; Sarkar, Ananda K.

    2015-01-01

    microRNAs (miRNAs), a class of endogenously produced small non-coding RNAs of 20–21 nt length, processed from precursor miRNAs, regulate many developmental processes by negatively regulating the target genes in both animals and plants. The coevolutionary pattern of a miRNA family and their targets underscores its functional conservation or diversification. The miR167 regulates various aspects of plant development in Arabidopsis by targeting ARF6 and ARF8. The evolutionary conservation or divergence of miR167s and their target genes are poorly understood till now. Here we show the evolutionary relationship among 153 MIR167 genes obtained from 33 diverse plant species. We found that out of the 153 of miR167 sequences retrieved from the “miRBase”, 27 have been annotated to be processed from the 3′ end, and have diverged distinctively from the other miR167s produced from 5′ end. Our analysis reveals that gma-miR167h/i and mdm-miR167a are processed from 3′ end and have evolved separately, diverged most resulting in novel targets other than their known ones, and thus led to functional diversification, especially in apple and soybean. We also show that mostly conserved miR167 sequences and their target AUXIN RESPONSE FACTORS (ARFs) have gone through parallel evolution leading to functional diversification among diverse plant species. PMID:26459056

  9. Coevolution Pattern and Functional Conservation or Divergence of miR167s and their targets across Diverse Plant Species.

    PubMed

    Barik, Suvakanta; Kumar, Ashutosh; Sarkar Das, Shabari; Yadav, Sandeep; Gautam, Vibhav; Singh, Archita; Singh, Sharmila; Sarkar, Ananda K

    2015-10-13

    microRNAs (miRNAs), a class of endogenously produced small non-coding RNAs of 20-21 nt length, processed from precursor miRNAs, regulate many developmental processes by negatively regulating the target genes in both animals and plants. The coevolutionary pattern of a miRNA family and their targets underscores its functional conservation or diversification. The miR167 regulates various aspects of plant development in Arabidopsis by targeting ARF6 and ARF8. The evolutionary conservation or divergence of miR167s and their target genes are poorly understood till now. Here we show the evolutionary relationship among 153 MIR167 genes obtained from 33 diverse plant species. We found that out of the 153 of miR167 sequences retrieved from the "miRBase", 27 have been annotated to be processed from the 3' end, and have diverged distinctively from the other miR167s produced from 5' end. Our analysis reveals that gma-miR167h/i and mdm-miR167a are processed from 3' end and have evolved separately, diverged most resulting in novel targets other than their known ones, and thus led to functional diversification, especially in apple and soybean. We also show that mostly conserved miR167 sequences and their target AUXIN RESPONSE FACTORS (ARFs) have gone through parallel evolution leading to functional diversification among diverse plant species.

  10. The Murine Norovirus Core Subgenomic RNA Promoter Consists of a Stable Stem-Loop That Can Direct Accurate Initiation of RNA Synthesis

    PubMed Central

    Yunus, Muhammad Amir; Lin, Xiaoyan; Bailey, Dalan; Karakasiliotis, Ioannis; Chaudhry, Yasmin; Vashist, Surender; Zhang, Guo; Thorne, Lucy; Kao, C. Cheng

    2014-01-01

    ABSTRACT All members of the Caliciviridae family of viruses produce a subgenomic RNA during infection. The subgenomic RNA typically encodes only the major and minor capsid proteins, but in murine norovirus (MNV), the subgenomic RNA also encodes the VF1 protein, which functions to suppress host innate immune responses. To date, the mechanism of norovirus subgenomic RNA synthesis has not been characterized. We have previously described the presence of an evolutionarily conserved RNA stem-loop structure on the negative-sense RNA, the complementary sequence of which codes for the viral RNA-dependent RNA polymerase (NS7). The conserved stem-loop is positioned 6 nucleotides 3′ of the start site of the subgenomic RNA in all caliciviruses. We demonstrate that the conserved stem-loop is essential for MNV viability. Mutant MNV RNAs with substitutions in the stem-loop replicated poorly until they accumulated mutations that revert to restore the stem-loop sequence and/or structure. The stem-loop sequence functions in a noncoding context, as it was possible to restore the replication of an MNV mutant by introducing an additional copy of the stem-loop between the NS7- and VP1-coding regions. Finally, in vitro biochemical data suggest that the stem-loop sequence is sufficient for the initiation of viral RNA synthesis by the recombinant MNV RNA-dependent RNA polymerase, confirming that the stem-loop forms the core of the norovirus subgenomic promoter. IMPORTANCE Noroviruses are a significant cause of viral gastroenteritis, and it is important to understand the mechanism of norovirus RNA synthesis. Here we describe the identification of an RNA stem-loop structure that functions as the core of the norovirus subgenomic RNA promoter in cells and in vitro. This work provides new insights into the molecular mechanisms of norovirus RNA synthesis and the sequences that determine the recognition of viral RNA by the RNA-dependent RNA polymerase. PMID:25392209

  11. The murine norovirus core subgenomic RNA promoter consists of a stable stem-loop that can direct accurate initiation of RNA synthesis.

    PubMed

    Yunus, Muhammad Amir; Lin, Xiaoyan; Bailey, Dalan; Karakasiliotis, Ioannis; Chaudhry, Yasmin; Vashist, Surender; Zhang, Guo; Thorne, Lucy; Kao, C Cheng; Goodfellow, Ian

    2015-01-15

    All members of the Caliciviridae family of viruses produce a subgenomic RNA during infection. The subgenomic RNA typically encodes only the major and minor capsid proteins, but in murine norovirus (MNV), the subgenomic RNA also encodes the VF1 protein, which functions to suppress host innate immune responses. To date, the mechanism of norovirus subgenomic RNA synthesis has not been characterized. We have previously described the presence of an evolutionarily conserved RNA stem-loop structure on the negative-sense RNA, the complementary sequence of which codes for the viral RNA-dependent RNA polymerase (NS7). The conserved stem-loop is positioned 6 nucleotides 3' of the start site of the subgenomic RNA in all caliciviruses. We demonstrate that the conserved stem-loop is essential for MNV viability. Mutant MNV RNAs with substitutions in the stem-loop replicated poorly until they accumulated mutations that revert to restore the stem-loop sequence and/or structure. The stem-loop sequence functions in a noncoding context, as it was possible to restore the replication of an MNV mutant by introducing an additional copy of the stem-loop between the NS7- and VP1-coding regions. Finally, in vitro biochemical data suggest that the stem-loop sequence is sufficient for the initiation of viral RNA synthesis by the recombinant MNV RNA-dependent RNA polymerase, confirming that the stem-loop forms the core of the norovirus subgenomic promoter. Noroviruses are a significant cause of viral gastroenteritis, and it is important to understand the mechanism of norovirus RNA synthesis. Here we describe the identification of an RNA stem-loop structure that functions as the core of the norovirus subgenomic RNA promoter in cells and in vitro. This work provides new insights into the molecular mechanisms of norovirus RNA synthesis and the sequences that determine the recognition of viral RNA by the RNA-dependent RNA polymerase. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  12. Ancient Himalayan wolf (Canis lupus chanco) lineage in Upper Mustang of the Annapurna Conservation Area, Nepal

    PubMed Central

    Chetri, Madhu; Jhala, Yadvendradev V.; Jnawali, Shant R.; Subedi, Naresh; Dhakal, Maheshwar; Yumnam, Bibek

    2016-01-01

    Abstract The taxonomic status of the wolf (Canis lupus) in Nepal’s Trans-Himalaya is poorly understood. Recent genetic studies have revealed the existence of three lineages of wolves in the Indian sub-continent. Of these, the Himalayan wolf, Canis lupus chanco, has been reported to be the most ancient lineage historically distributed within the Nepal Himalaya. These wolves residing in the Trans-Himalayan region have been suggested to be smaller and very different from the European wolf. During October 2011, six fecal samples suspected to have originated from wolves were collected from Upper Mustang in the Annapurna Conservation Area of Nepal. DNA extraction and amplification of the mitochondrial (mt) control region (CR) locus yielded sequences from five out of six samples. One sample matched domestic dog sequences in GenBank, while the remaining four samples were aligned within the monophyletic and ancient Himalayan wolf clade. These four sequences which matched each other, were new and represented a novel Himalayan wolf haplotype. This result confirms that the endangered ancient Himalayan wolf is extant in Nepal. Detailed genomic study covering Nepal’s entire Himalayan landscape is recommended in order to understand their distribution, taxonomy and, genetic relatedness with other wolves potentially sharing the same landscape. PMID:27199590

  13. Functional specificity of a Hox protein mediated by the recognition of minor groove structure.

    PubMed

    Joshi, Rohit; Passner, Jonathan M; Rohs, Remo; Jain, Rinku; Sosinsky, Alona; Crickmore, Michael A; Jacob, Vinitha; Aggarwal, Aneel K; Honig, Barry; Mann, Richard S

    2007-11-02

    The recognition of specific DNA-binding sites by transcription factors is a critical yet poorly understood step in the control of gene expression. Members of the Hox family of transcription factors bind DNA by making nearly identical major groove contacts via the recognition helices of their homeodomains. In vivo specificity, however, often depends on extended and unstructured regions that link Hox homeodomains to a DNA-bound cofactor, Extradenticle (Exd). Using a combination of structure determination, computational analysis, and in vitro and in vivo assays, we show that Hox proteins recognize specific Hox-Exd binding sites via residues located in these extended regions that insert into the minor groove but only when presented with the correct DNA sequence. Our results suggest that these residues, which are conserved in a paralog-specific manner, confer specificity by recognizing a sequence-dependent DNA structure instead of directly reading a specific DNA sequence.

  14. Conservation and diversification of Msx protein in metazoan evolution.

    PubMed

    Takahashi, Hirokazu; Kamiya, Akiko; Ishiguro, Akira; Suzuki, Atsushi C; Saitou, Naruya; Toyoda, Atsushi; Aruga, Jun

    2008-01-01

    Msx (/msh) family genes encode homeodomain (HD) proteins that control ontogeny in many animal species. We compared the structures of Msx genes from a wide range of Metazoa (Porifera, Cnidaria, Nematoda, Arthropoda, Tardigrada, Platyhelminthes, Mollusca, Brachiopoda, Annelida, Echiura, Echinodermata, Hemichordata, and Chordata) to gain an understanding of the role of these genes in phylogeny. Exon-intron boundary analysis suggested that the position of the intron located N-terminally to the HDs was widely conserved in all the genes examined, including those of cnidarians. Amino acid (aa) sequence comparison revealed 3 new evolutionarily conserved domains, as well as very strong conservation of the HDs. Two of the three domains were associated with Groucho-like protein binding in both a vertebrate and a cnidarian Msx homolog, suggesting that the interaction between Groucho-like proteins and Msx proteins was established in eumetazoan ancestors. Pairwise comparison among the collected HDs and their C-flanking aa sequences revealed that the degree of sequence conservation varied depending on the animal taxa from which the sequences were derived. Highly conserved Msx genes were identified in the Vertebrata, Cephalochordata, Hemichordata, Echinodermata, Mollusca, Brachiopoda, and Anthozoa. The wide distribution of the conserved sequences in the animal phylogenetic tree suggested that metazoan ancestors had already acquired a set of conserved domains of the current Msx family genes. Interestingly, although strongly conserved sequences were recovered from the Vertebrata, Cephalochordata, and Anthozoa, the sequences from the Urochordata and Hydrozoa showed weak conservation. Because the Vertebrata-Cephalochordata-Urochordata and Anthozoa-Hydrozoa represent sister groups in the Chordata and Cnidaria, respectively, Msx sequence diversification may have occurred differentially in the course of evolution. We speculate that selective loss of the conserved domains in Msx family proteins contributed to the diversification of animal body organization.

  15. Functionally conserved enhancers with divergent sequences in distant vertebrates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Song; Oksenberg, Nir; Takayama, Sachiko

    To examine the contributions of sequence and function conservation in the evolution of enhancers, we systematically identified enhancers whose sequences are not conserved among distant groups of vertebrate species, but have homologous function and are likely to be derived from a common ancestral sequence. In conclusion, our approach combined comparative genomics and epigenomics to identify potential enhancer sequences in the genomes of three groups of distantly related vertebrate species.

  16. Functionally conserved enhancers with divergent sequences in distant vertebrates

    DOE PAGES

    Yang, Song; Oksenberg, Nir; Takayama, Sachiko; ...

    2015-10-30

    To examine the contributions of sequence and function conservation in the evolution of enhancers, we systematically identified enhancers whose sequences are not conserved among distant groups of vertebrate species, but have homologous function and are likely to be derived from a common ancestral sequence. In conclusion, our approach combined comparative genomics and epigenomics to identify potential enhancer sequences in the genomes of three groups of distantly related vertebrate species.

  17. Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences.

    PubMed

    Bergman, C M; Kreitman, M

    2001-08-01

    Comparative genomic approaches to gene and cis-regulatory prediction are based on the principle that differential DNA sequence conservation reflects variation in functional constraint. Using this principle, we analyze noncoding sequence conservation in Drosophila for 40 loci with known or suspected cis-regulatory function encompassing >100 kb of DNA. We estimate the fraction of noncoding DNA conserved in both intergenic and intronic regions and describe the length distribution of ungapped conserved noncoding blocks. On average, 22%-26% of noncoding sequences surveyed are conserved in Drosophila, with median block length approximately 19 bp. We show that point substitution in conserved noncoding blocks exhibits transition bias as well as lineage effects in base composition, and occurs more than an order of magnitude more frequently than insertion/deletion (indel) substitution. Overall, patterns of noncoding DNA structure and evolution differ remarkably little between intergenic and intronic conserved blocks, suggesting that the effects of transcription per se contribute minimally to the constraints operating on these sequences. The results of this study have implications for the development of alignment and prediction algorithms specific to noncoding DNA, as well as for models of cis-regulatory DNA sequence evolution.

  18. Rapid Evolution of Virulence and Drug Resistance in the Emerging Zoonotic Pathogen Streptococcus suis

    PubMed Central

    Holden, Matthew T. G.; Hauser, Heidi; Sanders, Mandy; Ngo, Thi Hoa; Cherevach, Inna; Cronin, Ann; Goodhead, Ian; Mungall, Karen; Quail, Michael A.; Price, Claire; Rabbinowitsch, Ester; Sharp, Sarah; Croucher, Nicholas J.; Chieu, Tran Bich; Thi Hoang Mai, Nguyen; Diep, To Song; Chinh, Nguyen Tran; Kehoe, Michael; Leigh, James A.; Ward, Philip N.; Dowson, Christopher G.; Whatmore, Adrian M.; Chanter, Neil; Iversen, Pernille; Gottschalk, Marcelo; Slater, Josh D.; Smith, Hilde E.; Spratt, Brian G.; Xu, Jianguo; Ye, Changyun; Bentley, Stephen; Barrell, Barclay G.; Schultsz, Constance; Maskell, Duncan J.; Parkhill, Julian

    2009-01-01

    Background Streptococcus suis is a zoonotic pathogen that infects pigs and can occasionally cause serious infections in humans. S. suis infections occur sporadically in human Europe and North America, but a recent major outbreak has been described in China with high levels of mortality. The mechanisms of S. suis pathogenesis in humans and pigs are poorly understood. Methodology/Principal Findings The sequencing of whole genomes of S. suis isolates provides opportunities to investigate the genetic basis of infection. Here we describe whole genome sequences of three S. suis strains from the same lineage: one from European pigs, and two from human cases from China and Vietnam. Comparative genomic analysis was used to investigate the variability of these strains. S. suis is phylogenetically distinct from other Streptococcus species for which genome sequences are currently available. Accordingly, ∼40% of the ∼2 Mb genome is unique in comparison to other Streptococcus species. Finer genomic comparisons within the species showed a high level of sequence conservation; virtually all of the genome is common to the S. suis strains. The only exceptions are three ∼90 kb regions, present in the two isolates from humans, composed of integrative conjugative elements and transposons. Carried in these regions are coding sequences associated with drug resistance. In addition, small-scale sequence variation has generated pseudogenes in putative virulence and colonization factors. Conclusions/Significance The genomic inventories of genetically related S. suis strains, isolated from distinct hosts and diseases, exhibit high levels of conservation. However, the genomes provide evidence that horizontal gene transfer has contributed to the evolution of drug resistance. PMID:19603075

  19. Biological function in the twilight zone of sequence conservation.

    PubMed

    Ponting, Chris P

    2017-08-16

    Strong DNA conservation among divergent species is an indicator of enduring functionality. With weaker sequence conservation we enter a vast 'twilight zone' in which sequence subject to transient or lower constraint cannot be distinguished easily from neutrally evolving, non-functional sequence. Twilight zone functional sequence is illuminated instead by principles of selective constraint and positive selection using genomic data acquired from within a species' population. Application of these principles reveals that despite being biochemically active, most twilight zone sequence is not functional.

  20. A strategy for detecting the conservation of folding-nucleus residues in protein superfamilies.

    PubMed

    Michnick, S W; Shakhnovich, E

    1998-01-01

    Nucleation-growth theory predicts that fast-folding peptide sequences fold to their native structure via structures in a transition-state ensemble that share a small number of native contacts (the folding nucleus). Experimental and theoretical studies of proteins suggest that residues participating in folding nuclei are conserved among homologs. We attempted to determine if this is true in proteins with highly diverged sequences but identical folds (superfamilies). We describe a strategy based on comparisons of residue conservation in natural superfamily sequences with simulated sequences (generated with a Monte-Carlo sequence design strategy) for the same proteins. The basic assumptions of the strategy were that natural sequences will conserve residues needed for folding and stability plus function, the simulated sequences contain no functional conservation, and nucleus residues make native contacts with each other. Based on these assumptions, we identified seven potential nucleus residues in ubiquitin superfamily members. Non-nucleus conserved residues were also identified; these are proposed to be involved in stabilizing native interactions. We found that all superfamily members conserved the same potential nucleus residue positions, except those for which the structural topology is significantly different. Our results suggest that the conservation of the nucleus of a specific fold can be predicted by comparing designed simulated sequences with natural highly diverged sequences that fold to the same structure. We suggest that such a strategy could be used to help plan protein folding and design experiments, to identify new superfamily members, and to subdivide superfamilies further into classes having a similar folding mechanism.

  1. Fine-tuning structural RNA alignments in the twilight zone.

    PubMed

    Bremges, Andreas; Schirmer, Stefanie; Giegerich, Robert

    2010-04-30

    A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index.

  2. J chain in the nurse shark: implications for function in a lower vertebrate.

    PubMed

    Hohman, Valerie S; Stewart, Sue E; Rumfelt, Lynn L; Greenberg, Andrew S; Avila, David W; Flajnik, Martin F; Steiner, Lisa A

    2003-06-15

    J chain is a small polypeptide covalently attached to polymeric IgA and IgM. In humans and mice, it plays a role in binding Ig to the polymeric Ig receptor for transport into secretions. The putative orthologue of mammalian J chain has been identified in the nurse shark by sequence analysis of cDNA and the polypeptide isolated from IgM. Conservation with J chains from other species is relatively poor, especially in the carboxyl-terminal portion, and, unlike other J chains, the shark protein is not acidic. The only highly conserved segment in all known J chains is a block of residues surrounding an N-linked glycosylation site. Of the eight half-cystine residues that are conserved in mammalian J chains, three are lacking in the nurse shark, including two in the carboxyl-terminal segment that have been reported to be required for binding of human J chain-containing IgA to secretory component. Taken together with these data, the relative abundance of J chain transcripts in the spleen and their absence in the spiral valve (intestine) suggest that J chain in nurse sharks may not have a role in Ig secretion. Analysis of J chain sequences in diverse species is in agreement with accepted phylogenetic relationships, with the exception of the earthworm, suggesting that the reported presence of J chain in invertebrates should be reassessed.

  3. The liverwort Pellia endiviifolia shares microtranscriptomic traits that are common to green algae and land plants

    PubMed Central

    Alaba, Sylwia; Piszczalka, Pawel; Pietrykowska, Halina; Pacak, Andrzej M; Sierocka, Izabela; Nuc, Przemyslaw W; Singh, Kashmir; Plewka, Patrycja; Sulkowska, Aleksandra; Jarmolowski, Artur; Karlowski, Wojciech M; Szweykowska-Kulinska, Zofia

    2015-01-01

    Liverworts are the most basal group of extant land plants. Nonetheless, the molecular biology of liverworts is poorly understood. Gene expression has been studied in only one species, Marchantia polymorpha. In particular, no microRNA (miRNA) sequences from liverworts have been reported. Here, Illumina-based next-generation sequencing was employed to identify small RNAs, and analyze the transcriptome and the degradome of Pellia endiviifolia. Three hundred and eleven conserved miRNA plant families were identified, and 42 new liverwort-specific miRNAs were discovered. The RNA degradome analysis revealed that target mRNAs of only three miRNAs (miR160, miR166, and miR408) have been conserved between liverworts and other land plants. New targets were identified for the remaining conserved miRNAs. Moreover, the analysis of the degradome permitted the identification of targets for 13 novel liverwort-specific miRNAs. Interestingly, three of the liverwort microRNAs show high similarity to previously reported miRNAs from Chlamydomonas reinhardtii. This is the first observation of miRNAs that exist both in a representative alga and in the liverwort P. endiviifolia but are not present in land plants. The results of the analysis of the P. endivifolia microtranscriptome support the conclusions of previous studies that placed liverworts at the root of the land plant evolutionary tree of life. PMID:25530158

  4. Comparative functional characterization of the CSR-1 22G-RNA pathway in Caenorhabditis nematodes

    PubMed Central

    Tu, Shikui; Wu, Monica Z.; Wang, Jie; Cutter, Asher D.; Weng, Zhiping; Claycomb, Julie M.

    2015-01-01

    As a champion of small RNA research for two decades, Caenorhabditis elegans has revealed the essential Argonaute CSR-1 to play key nuclear roles in modulating chromatin, chromosome segregation and germline gene expression via 22G-small RNAs. Despite CSR-1 being preserved among diverse nematodes, the conservation and divergence in function of the targets of small RNA pathways remains poorly resolved. Here we apply comparative functional genomic analysis between C. elegans and Caenorhabditis briggsae to characterize the CSR-1 pathway, its targets and their evolution. C. briggsae CSR-1-associated small RNAs that we identified by immunoprecipitation-small RNA sequencing overlap with 22G-RNAs depleted in cbr-csr-1 RNAi-treated worms. By comparing 22G-RNAs and target genes between species, we defined a set of CSR-1 target genes with conserved germline expression, enrichment in operons and more slowly evolving coding sequences than other genes, along with a small group of evolutionarily labile targets. We demonstrate that the association of CSR-1 with chromatin is preserved, and show that depletion of cbr-csr-1 leads to chromosome segregation defects and embryonic lethality. This first comparative characterization of a small RNA pathway in Caenorhabditis establishes a conserved nuclear role for CSR-1 and highlights its key role in germline gene regulation across multiple animal species. PMID:25510497

  5. Dynamics of actin evolution in dinoflagellates.

    PubMed

    Kim, Sunju; Bachvaroff, Tsvetan R; Handy, Sara M; Delwiche, Charles F

    2011-04-01

    Dinoflagellates have unique nuclei and intriguing genome characteristics with very high DNA content making complete genome sequencing difficult. In dinoflagellates, many genes are found in multicopy gene families, but the processes involved in the establishment and maintenance of these gene families are poorly understood. Understanding the dynamics of gene family evolution in dinoflagellates requires comparisons at different evolutionary scales. Studies of closely related species provide fine-scale information relative to species divergence, whereas comparisons of more distantly related species provides broad context. We selected the actin gene family as a highly expressed conserved gene previously studied in dinoflagellates. Of the 142 sequences determined in this study, 103 were from the two closely related species, Dinophysis acuminata and D. caudata, including full length and partial cDNA sequences as well as partial genomic amplicons. For these two Dinophysis species, at least three types of sequences could be identified. Most copies (79%) were relatively similar and in nucleotide trees, the sequences formed two bushy clades corresponding to the two species. In comparisons within species, only eight to ten nucleotide differences were found between these copies. The two remaining types formed clades containing sequences from both species. One type included the most similar sequences in between-species comparisons with as few as 12 nucleotide differences between species. The second type included the most divergent sequences in comparisons between and within species with up to 93 nucleotide differences between sequences. In all the sequences, most variation occurred in synonymous sites or the 5' UnTranslated Region (UTR), although there was still limited amino acid variation between most sequences. Several potential pseudogenes were found (approximately 10% of all sequences depending on species) with incomplete open reading frames due to frameshifts or early stop codons. Overall, variation in the actin gene family fits best with the "birth and death" model of evolution based on recent duplications, pseudogenes, and incomplete lineage sorting. Divergence between species was similar to variation within species, so that actin may be too conserved to be useful for phylogenetic estimation of closely related species.

  6. StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase.

    PubMed

    Zemla, Adam T; Lang, Dorothy M; Kostova, Tanya; Andino, Raul; Ecale Zhou, Carol L

    2011-06-02

    Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.

  7. Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment

    PubMed Central

    2013-01-01

    Background Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. Results In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Conclusion Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA. PMID:24564200

  8. Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.

    PubMed

    Nagar, Anurag; Hahsler, Michael

    2013-01-01

    Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA.

  9. The Most Deeply Conserved Noncoding Sequences in Plants Serve Similar Functions to Those in Vertebrates Despite Large Differences in Evolutionary Rates[W

    PubMed Central

    Burgess, Diane; Freeling, Michael

    2014-01-01

    In vertebrates, conserved noncoding elements (CNEs) are functionally constrained sequences that can show striking conservation over >400 million years of evolutionary distance and frequently are located megabases away from target developmental genes. Conserved noncoding sequences (CNSs) in plants are much shorter, and it has been difficult to detect conservation among distantly related genomes. In this article, we show not only that CNS sequences can be detected throughout the eudicot clade of flowering plants, but also that a subset of 37 CNSs can be found in all flowering plants (diverging ∼170 million years ago). These CNSs are functionally similar to vertebrate CNEs, being highly associated with transcription factor and development genes and enriched in transcription factor binding sites. Some of the most highly conserved sequences occur in genes encoding RNA binding proteins, particularly the RNA splicing–associated SR genes. Differences in sequence conservation between plants and animals are likely to reflect differences in the biology of the organisms, with plants being much more able to tolerate genomic deletions and whole-genome duplication events due, in part, to their far greater fecundity compared with vertebrates. PMID:24681619

  10. PUTATIVE GENE PROMOTER SEQUENCES IN THE CHLORELLA VIRUSES

    PubMed Central

    Fitzgerald, Lisa A.; Boucher, Philip T.; Yanai-Balser, Giane; Suhre, Karsten; Graves, Michael V.; Van Etten, James L.

    2008-01-01

    Three short (7 to 9 nucleotides) highly conserved nucleotide sequences were identified in the putative promoter regions (150 bp upstream and 50 bp downstream of the ATG translation start site) of three members of the genus Chlorovirus, family Phycodnaviridae. Most of these sequences occurred in similar locations within the defined promoter regions. The sequence and location of the motifs were often conserved among homologous ORFs within the Chlorovirus family. One of these conserved sequences (AATGACA) is predominately associated with genes expressed early in virus replication. PMID:18768195

  11. BEAUTY: an enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results.

    PubMed

    Worley, K C; Wiese, B A; Smith, R F

    1995-09-01

    BEAUTY (BLAST enhanced alignment utility) is an enhanced version of the NCBI's BLAST data base search tool that facilitates identification of the functions of matched sequences. We have created new data bases of conserved regions and functional domains for protein sequences in NCBI's Entrez data base, and BEAUTY allows this information to be incorporated directly into BLAST search results. A Conserved Regions Data Base, containing the locations of conserved regions within Entrez protein sequences, was constructed by (1) clustering the entire data base into families, (2) aligning each family using our PIMA multiple sequence alignment program, and (3) scanning the multiple alignments to locate the conserved regions within each aligned sequence. A separate Annotated Domains Data Base was constructed by extracting the locations of all annotated domains and sites from sequences represented in the Entrez, PROSITE, BLOCKS, and PRINTS data bases. BEAUTY performs a BLAST search of those Entrez sequences with conserved regions and/or annotated domains. BEAUTY then uses the information from the Conserved Regions and Annotated Domains data bases to generate, for each matched sequence, a schematic display that allows one to directly compare the relative locations of (1) the conserved regions, (2) annotated domains and sites, and (3) the locally aligned regions matched in the BLAST search. In addition, BEAUTY search results include World-Wide Web hypertext links to a number of external data bases that provide a variety of additional types of information on the function of matched sequences. This convenient integration of protein families, conserved regions, annotated domains, alignment displays, and World-Wide Web resources greatly enhances the biological informativeness of sequence similarity searches. BEAUTY searches can be performed remotely on our system using the "BCM Search Launcher" World-Wide Web pages (URL is < http:/ /gc.bcm.tmc.edu:8088/ search-launcher/launcher.html > ).

  12. A conserved mechanism for replication origin recognition and binding in archaea.

    PubMed

    Majerník, Alan I; Chong, James P J

    2008-01-15

    To date, methanogens are the only group within the archaea where firing DNA replication origins have not been demonstrated in vivo. In the present study we show that a previously identified cluster of ORB (origin recognition box) sequences do indeed function as an origin of replication in vivo in the archaeon Methanothermobacter thermautotrophicus. Although the consensus sequence of ORBs in M. thermautotrophicus is somewhat conserved when compared with ORB sequences in other archaea, the Cdc6-1 protein from M. thermautotrophicus (termed MthCdc6-1) displays sequence-specific binding that is selective for the MthORB sequence and does not recognize ORBs from other archaeal species. Stabilization of in vitro MthORB DNA binding by MthCdc6-1 requires additional conserved sequences 3' to those originally described for M. thermautotrophicus. By testing synthetic sequences bearing mutations in the MthORB consensus sequence, we show that Cdc6/ORB binding is critically dependent on the presence of an invariant guanine found in all archaeal ORB sequences. Mutation of a universally conserved arginine residue in the recognition helix of the winged helix domain of archaeal Cdc6-1 shows that specific origin sequence recognition is dependent on the interaction of this arginine residue with the invariant guanine. Recognition of a mutated origin sequence can be achieved by mutation of the conserved arginine residue to a lysine or glutamine residue. Thus despite a number of differences in protein and DNA sequences between species, the mechanism of origin recognition and binding appears to be conserved throughout the archaea.

  13. Fine-tuning structural RNA alignments in the twilight zone

    PubMed Central

    2010-01-01

    Background A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. Results Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. Conclusions Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index. PMID:20433706

  14. Development of a general method for detection and quantification of the P35S promoter based on assessment of existing methods

    PubMed Central

    Wu, Yuhua; Wang, Yulei; Li, Jun; Li, Wei; Zhang, Li; Li, Yunjing; Li, Xiaofei; Li, Jun; Zhu, Li; Wu, Gang

    2014-01-01

    The Cauliflower mosaic virus (CaMV) 35S promoter (P35S) is a commonly used target for detection of genetically modified organisms (GMOs). There are currently 24 reported detection methods, targeting different regions of the P35S promoter. Initial assessment revealed that due to the absence of primer binding sites in the P35S sequence, 19 of the 24 reported methods failed to detect P35S in MON88913 cotton, and the other two methods could only be applied to certain GMOs. The rest three reported methods were not suitable for measurement of P35S in some testing events, because SNPs in binding sites of the primer/probe would result in abnormal amplification plots and poor linear regression parameters. In this study, we discovered a conserved region in the P35S sequence through sequencing of P35S promoters from multiple transgenic events, and developed new qualitative and quantitative detection systems targeting this conserved region. The qualitative PCR could detect the P35S promoter in 23 unique GMO events with high specificity and sensitivity. The quantitative method was suitable for measurement of P35S promoter, exhibiting good agreement between the amount of template and Ct values for each testing event. This study provides a general P35S screening method, with greater coverage than existing methods. PMID:25483893

  15. Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach.

    PubMed

    Algama, Manjula; Tasker, Edward; Williams, Caitlin; Parslow, Adam C; Bryson-Richardson, Robert J; Keith, Jonathan M

    2017-03-27

    Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investigate the extent to which these elements include ncRNAs (or conserved domains of ncRNAs) and regulatory sequences. We identified 655 deeply conserved intronic sequences in a genome-wide analysis. We also performed a pathway-focussed analysis on genes involved in muscle development, detecting 27 intronic elements, of which 22 were not detected in the genome-wide analysis. At least 87% of the genome-wide and 70% of the pathway-focussed elements have existing annotations indicative of conserved RNA secondary structure. The expression of 26 of the pathway-focused elements was examined using RT-PCR, providing confirmation that they include expressed ncRNAs. Consistent with previous studies, these elements are significantly over-represented in the introns of transcription factors. This study demonstrates a novel, highly effective, Bayesian approach to identifying conserved non-coding sequences. Our results complement previous findings that these sequences are enriched in transcription factors. However, in contrast to previous studies which suggest the majority of conserved sequences are regulatory factor binding sites, the majority of conserved sequences identified using our approach contain evidence of conserved RNA secondary structures, and our laboratory results suggest most are expressed. Functional roles at DNA and RNA levels are not mutually exclusive, and many of our elements possess evidence of both. Moreover, ncRNAs play roles in transcriptional and post-transcriptional regulation, and this may contribute to the over-representation of these elements in introns of transcription factors. We attribute the higher sensitivity of the pathway-focussed analysis compared to the genome-wide analysis to improved alignment quality, suggesting that enhanced genomic alignments may reveal many more conserved intronic sequences.

  16. Characterization and Expression of the Lucina pectinata Oxygen and Sulfide Binding Hemoglobin Genes

    PubMed Central

    López-Garriga, Juan; Cadilla, Carmen L.

    2016-01-01

    The clam Lucina pectinata lives in sulfide-rich muds and houses intracellular symbiotic bacteria that need to be supplied with hydrogen sulfide and oxygen. This clam possesses three hemoglobins: hemoglobin I (HbI), a sulfide-reactive protein, and hemoglobin II (HbII) and III (HbIII), which are oxygen-reactive. We characterized the complete gene sequence and promoter regions for the oxygen reactive hemoglobins and the partial structure and promoters of the HbI gene from Lucina pectinata. We show that HbI has two mRNA variants, where the 5’end had either a sequence of 96 bp (long variant) or 37 bp (short variant). The gene structure of the oxygen reactive Hbs is defined by having 4-exons/3-introns with conservation of intron location at B12.2 and G7.0 and the presence of pre-coding introns, while the partial gene structure of HbI has the same intron conservation but appears to have a 5-exon/ 4-intron structure. A search for putative transcription factor binding sites (TFBSs) was done with the promoters for HbII, HbIII, HbI short and HbI long. The HbII, HbIII and HbI long promoters showed similar predicted TFBSs. We also characterized MITE-like elements in the HbI and HbII gene promoters and intronic regions that are similar to sequences found in other mollusk genomes. The gene expression levels of the clam Hbs, from sulfide-rich and sulfide-poor environments showed a significant decrease of expression in the symbiont-containing tissue for those clams in a sulfide-poor environment, suggesting that the sulfide concentration may be involved in the regulation of these proteins. Gene expression evaluation of the two HbI mRNA variants indicated that the longer variant is expressed at higher levels than the shorter variant in both environments. PMID:26824233

  17. DNA sequence analysis of ARS elements from chromosome III of Saccharomyces cerevisiae: identification of a new conserved sequence.

    PubMed Central

    Palzkill, T G; Oliver, S G; Newlon, C S

    1986-01-01

    Four fragments of Saccharomyces cerevisiae chromosome III DNA which carry ARS elements have been sequenced. Each fragment contains multiple copies of sequences that have at least 10 out of 11 bases of homology to a previously reported 11 bp core consensus sequence. A survey of these new ARS sequences and previously reported sequences revealed the presence of an additional 11 bp conserved element located on the 3' side of the T-rich strand of the core consensus. Subcloning analysis as well as deletion and transposon insertion mutagenesis of ARS fragments support a role for 3' conserved sequence in promoting ARS activity. PMID:3529036

  18. Multiple network alignment via multiMAGNA+.

    PubMed

    Vijayan, Vipin; Milenkovic, Tijana

    2017-08-21

    Network alignment (NA) aims to find a node mapping that identifies topologically or functionally similar network regions between molecular networks of different species. Analogous to genomic sequence alignment, NA can be used to transfer biological knowledge from well- to poorly-studied species between aligned network regions. Pairwise NA (PNA) finds similar regions between two networks while multiple NA (MNA) can align more than two networks. We focus on MNA. Existing MNA methods aim to maximize total similarity over all aligned nodes (node conservation). Then, they evaluate alignment quality by measuring the amount of conserved edges, but only after the alignment is constructed. Directly optimizing edge conservation during alignment construction in addition to node conservation may result in superior alignments. Thus, we present a novel MNA method called multiMAGNA++ that can achieve this. Indeed, multiMAGNA++ outperforms or is on par with existing MNA methods, while often completing faster than existing methods. That is, multiMAGNA++ scales well to larger network data and can be parallelized effectively. During method evaluation, we also introduce new MNA quality measures to allow for more fair MNA method comparison compared to the existing alignment quality measures. MultiMAGNA++ code is available on the method's web page at http://nd.edu/~cone/multiMAGNA++/.

  19. StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zemla, A; Lang, D; Kostova, T

    2010-11-29

    Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitatemore » the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position.« less

  20. Strong minor groove base conservation in sequence logos implies DNA distortion or base flipping during replication and transcription initiation.

    PubMed

    Schneider, T D

    2001-12-01

    The sequence logo for DNA binding sites of the bacteriophage P1 replication protein RepA shows unusually high sequence conservation ( approximately 2 bits) at a minor groove that faces RepA. However, B-form DNA can support only 1 bit of sequence conservation via contacts into the minor groove. The high conservation in RepA sites therefore implies a distorted DNA helix with direct or indirect contacts to the protein. Here I show that a high minor groove conservation signature also appears in sequence logos of sites for other replication origin binding proteins (Rts1, DnaA, P4 alpha, EBNA1, ORC) and promoter binding proteins (sigma(70), sigma(D) factors). This finding implies that DNA binding proteins generally use non-B-form DNA distortion such as base flipping to initiate replication and transcription.

  1. Germ-line variants identified by next generation sequencing in a panel of estrogen and cancer associated genes correlate with poor clinical outcome in Lynch syndrome patients.

    PubMed

    Jóri, Balazs; Kamps, Rick; Xanthoulea, Sofia; Delvoux, Bert; Blok, Marinus J; Van de Vijver, Koen K; de Koning, Bart; Oei, Felicia Trups; Tops, Carli M; Speel, Ernst Jm; Kruitwagen, Roy F; Gomez-Garcia, Encarna B; Romano, Andrea

    2015-12-01

    The risk to develop colorectal and endometrial cancers among subjects testing positive for a pathogenic Lynch syndrome mutation varies, making the risk prediction difficult. Genetic risk modifiers alter the risk conferred by inherited Lynch syndrome mutations, and their identification can improve genetic counseling. We aimed at identifying rare genetic modifiers of the risk of Lynch syndrome endometrial cancer. A family based approach was used to assess the presence of genetic risk modifiers among 35 Lynch syndrome mutation carriers having either a poor clinical phenotype (early age of endometrial cancer diagnosis or multiple cancers) or a neutral clinical phenotype. Putative genetic risk modifiers were identified by Next Generation Sequencing among a panel of 154 genes involved in endometrial physiology and carcinogenesis. A simple pipeline, based on an allele frequency lower than 0.001 and on predicted non-conservative amino-acid substitutions returned 54 variants that were considered putative risk modifiers. The presence of two or more risk modifying variants in women carrying a pathogenic Lynch syndrome mutation was associated with a poor clinical phenotype. A gene-panel is proposed that comprehends genes that can carry variants with putative modifying effects on the risk of Lynch syndrome endometrial cancer. Validation in further studies is warranted before considering the possible use of this tool in genetic counseling.

  2. G-quadruplex prediction in E. coli genome reveals a conserved putative G-quadruplex-Hairpin-Duplex switch.

    PubMed

    Kaplan, Oktay I; Berber, Burak; Hekim, Nezih; Doluca, Osman

    2016-11-02

    Many studies show that short non-coding sequences are widely conserved among regulatory elements. More and more conserved sequences are being discovered since the development of next generation sequencing technology. A common approach to identify conserved sequences with regulatory roles relies on topological changes such as hairpin formation at the DNA or RNA level. G-quadruplexes, non-canonical nucleic acid topologies with little established biological roles, are increasingly considered for conserved regulatory element discovery. Since the tertiary structure of G-quadruplexes is strongly dependent on the loop sequence which is disregarded by the generally accepted algorithm, we hypothesized that G-quadruplexes with similar topology and, indirectly, similar interaction patterns, can be determined using phylogenetic clustering based on differences in the loop sequences. Phylogenetic analysis of 52 G-quadruplex forming sequences in the Escherichia coli genome revealed two conserved G-quadruplex motifs with a potential regulatory role. Further analysis revealed that both motifs tend to form hairpins and G quadruplexes, as supported by circular dichroism studies. The phylogenetic analysis as described in this work can greatly improve the discovery of functional G-quadruplex structures and may explain unknown regulatory patterns. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Conservation and variability of West Nile virus proteins.

    PubMed

    Koo, Qi Ying; Khan, Asif M; Jung, Keun-Ok; Ramdas, Shweta; Miotto, Olivo; Tan, Tin Wee; Brusic, Vladimir; Salmon, Jerome; August, J Thomas

    2009-01-01

    West Nile virus (WNV) has emerged globally as an increasingly important pathogen for humans and domestic animals. Studies of the evolutionary diversity of the virus over its known history will help to elucidate conserved sites, and characterize their correspondence to other pathogens and their relevance to the immune system. We describe a large-scale analysis of the entire WNV proteome, aimed at identifying and characterizing evolutionarily conserved amino acid sequences. This study, which used 2,746 WNV protein sequences collected from the NCBI GenPept database, focused on analysis of peptides of length 9 amino acids or more, which are immunologically relevant as potential T-cell epitopes. Entropy-based analysis of the diversity of WNV sequences, revealed the presence of numerous evolutionarily stable nonamer positions across the proteome (entropy value of < or = 1). The representation (frequency) of nonamers variant to the predominant peptide at these stable positions was, generally, low (< or = 10% of the WNV sequences analyzed). Eighty-eight fragments of length 9-29 amino acids, representing approximately 34% of the WNV polyprotein length, were identified to be identical and evolutionarily stable in all analyzed WNV sequences. Of the 88 completely conserved sequences, 67 are also present in other flaviviruses, and several have been associated with the functional and structural properties of viral proteins. Immunoinformatic analysis revealed that the majority (78/88) of conserved sequences are potentially immunogenic, while 44 contained experimentally confirmed human T-cell epitopes. This study identified a comprehensive catalogue of completely conserved WNV sequences, many of which are shared by other flaviviruses, and majority are potential epitopes. The complete conservation of these immunologically relevant sequences through the entire recorded WNV history suggests they will be valuable as components of peptide-specific vaccines or other therapeutic applications, for sequence-specific diagnosis of a wide-range of Flavivirus infections, and for studies of homologous sequences among other flaviviruses.

  4. Discovery of SCORs: Anciently derived, highly conserved gene-associated repeats in stony corals.

    PubMed

    Qiu, Huan; Zelzion, Ehud; Putnam, Hollie M; Gates, Ruth D; Wagner, Nicole E; Adams, Diane K; Bhattacharya, Debashish

    2017-10-01

    Stony coral (Scleractinia) genomes are still poorly explored and many questions remain about their evolution and contribution to the success and longevity of reefs. We analyzed transcriptome and genome data from Montipora capitata, Acropora digitifera, and transcriptome data from 20 other coral species. To our surprise, we found highly conserved, anciently derived, Scleractinia COral-specific Repeat families (SCORs) that are abundant in all the studied lineages. SCORs form complex secondary structures and are located in untranslated regions and introns, but most abundant in intergenic DNA. These repeat families have undergone frequent duplication and degradation, suggesting a 'boom and bust' cycle of invasion and loss. We speculate that due to their surprisingly high sequence identities across deeply diverged corals, physical association with genes, and dynamic evolution, SCORs might have adaptive functions in corals that need to be explored using population genomic and function-based approaches. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Gene family innovation, conservation and loss on the animal stem lineage.

    PubMed

    Richter, Daniel J; Fozouni, Parinaz; Eisen, Michael; King, Nicole

    2018-05-31

    Choanoflagellates, the closest living relatives of animals, can provide unique insights into the changes in gene content that preceded the origin of animals. However, only two choanoflagellate genomes are currently available, providing poor coverage of their diversity. We sequenced transcriptomes of 19 additional choanoflagellate species to produce a comprehensive reconstruction of the gains and losses that shaped the ancestral animal gene repertoire. We identified ~1,944 gene families that originated on the animal stem lineage, of which only 39 are conserved across all animals in our study. In addition, ~372 gene families previously thought to be animal-specific, including Notch, Delta, and homologs of the animal Toll-like receptor genes, instead evolved prior to the animal-choanoflagellate divergence. Our findings contribute to an increasingly detailed portrait of the gene families that defined the biology of the Urmetazoan and that may underpin core features of extant animals. © 2018, Richter et al.

  6. Functional genome analysis of Bifidobacterium breve UCC2003 reveals type IVb tight adherence (Tad) pili as an essential and conserved host-colonization factor

    PubMed Central

    O'Connell Motherway, Mary; Zomer, Aldert; Leahy, Sinead C.; Reunanen, Justus; Bottacini, Francesca; Claesson, Marcus J.; O'Brien, Frances; Flynn, Kiera; Casey, Patrick G.; Moreno Munoz, Jose Antonio; Kearney, Breda; Houston, Aileen M.; O'Mahony, Caitlin; Higgins, Des G.; Shanahan, Fergus; Palva, Airi; de Vos, Willem M.; Fitzgerald, Gerald F.; Ventura, Marco; O'Toole, Paul W.; van Sinderen, Douwe

    2011-01-01

    Development of the human gut microbiota commences at birth, with bifidobacteria being among the first colonizers of the sterile newborn gastrointestinal tract. To date, the genetic basis of Bifidobacterium colonization and persistence remains poorly understood. Transcriptome analysis of the Bifidobacterium breve UCC2003 2.42-Mb genome in a murine colonization model revealed differential expression of a type IVb tight adherence (Tad) pilus-encoding gene cluster designated “tad2003.” Mutational analysis demonstrated that the tad2003 gene cluster is essential for efficient in vivo murine gut colonization, and immunogold transmission electron microscopy confirmed the presence of Tad pili at the poles of B. breve UCC2003 cells. Conservation of the Tad pilus-encoding locus among other B. breve strains and among sequenced Bifidobacterium genomes supports the notion of a ubiquitous pili-mediated host colonization and persistence mechanism for bifidobacteria. PMID:21690406

  7. Functional genome analysis of Bifidobacterium breve UCC2003 reveals type IVb tight adherence (Tad) pili as an essential and conserved host-colonization factor.

    PubMed

    O'Connell Motherway, Mary; Zomer, Aldert; Leahy, Sinead C; Reunanen, Justus; Bottacini, Francesca; Claesson, Marcus J; O'Brien, Frances; Flynn, Kiera; Casey, Patrick G; Munoz, Jose Antonio Moreno; Kearney, Breda; Houston, Aileen M; O'Mahony, Caitlin; Higgins, Des G; Shanahan, Fergus; Palva, Airi; de Vos, Willem M; Fitzgerald, Gerald F; Ventura, Marco; O'Toole, Paul W; van Sinderen, Douwe

    2011-07-05

    Development of the human gut microbiota commences at birth, with bifidobacteria being among the first colonizers of the sterile newborn gastrointestinal tract. To date, the genetic basis of Bifidobacterium colonization and persistence remains poorly understood. Transcriptome analysis of the Bifidobacterium breve UCC2003 2.42-Mb genome in a murine colonization model revealed differential expression of a type IVb tight adherence (Tad) pilus-encoding gene cluster designated "tad(2003)." Mutational analysis demonstrated that the tad(2003) gene cluster is essential for efficient in vivo murine gut colonization, and immunogold transmission electron microscopy confirmed the presence of Tad pili at the poles of B. breve UCC2003 cells. Conservation of the Tad pilus-encoding locus among other B. breve strains and among sequenced Bifidobacterium genomes supports the notion of a ubiquitous pili-mediated host colonization and persistence mechanism for bifidobacteria.

  8. A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions

    PubMed Central

    Abnousi, Armen; Broschat, Shira L.; Kalyanaraman, Ananth

    2016-01-01

    Background Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions. However, identifying conserved regions from large collections (millions) of protein sequences presents significant challenges. Methods In this paper we present a new, alignment-free method for detecting conserved regions in protein sequences called NADDA (No-Alignment Domain Detection Algorithm). Our method exploits the abundance of exact matching short subsequences (k-mers) to quickly detect conserved regions, and the power of machine learning is used to improve the prediction accuracy of detection. We present a parallel implementation of NADDA using the MapReduce framework and show that our method is highly scalable. Results We have compared NADDA with Pfam and InterPro databases. For known domains annotated by Pfam, accuracy is 83%, sensitivity 96%, and specificity 44%. For sequences with new domains not present in the training set an average accuracy of 63% is achieved when compared to Pfam. A boost in results in comparison with InterPro demonstrates the ability of NADDA to capture conserved regions beyond those present in Pfam. We have also compared NADDA with ADDA and MKDOM2, assuming Pfam as ground-truth. On average NADDA shows comparable accuracy, more balanced sensitivity and specificity, and being alignment-free, is significantly faster. Excluding the one-time cost of training, runtimes on a single processor were 49s, 10,566s, and 456s for NADDA, ADDA, and MKDOM2, respectively, for a data set comprised of approximately 2500 sequences. PMID:27552220

  9. Two rapidly evolving genes contribute to male fitness in Drosophila

    PubMed Central

    Reinhardt, Josephine A; Jones, Corbin D

    2013-01-01

    Purifying selection often results in conservation of gene sequence and function. The most functionally conserved genes are also thought to be among the most biologically essential. These observations have led to the use of sequence conservation as a proxy for functional conservation. Here we describe two genes that are exceptions to this pattern. We show that lack of sequence conservation among orthologs of CG15460 and CG15323 – herein named jean-baptiste (jb) and karr respectively – does not necessarily predict lack of functional conservation. These two Drosophila melanogaster genes are among the most rapidly evolving protein-coding genes in this species, being nearly as diverged from their D. yakuba orthologs as random sequences are. jb and karr are both expressed at an elevated level in larval males and adult testes, but they are not accessory gland proteins and their loss does not affect male fertility. Instead, knockdown of these genes in D. melanogaster via RNA interference caused male-biased viability defects. These viability effects occur prior to the third instar for jb and during late pupation for karr. We show that putative orthologs to jb and karr are also expressed strongly in the testes of other Drosophila species and have similar gene structure across species despite low levels of sequence conservation. While standard molecular evolution tests could not reject neutrality, other data hint at a role for natural selection. Together these data provide a clear case where a lack of sequence conservation does not imply a lack of conservation of expression or function. PMID:24221639

  10. Response of heat shock protein genes of the oriental fruit moth under diapause and thermal stress reveals multiple patterns dependent on the nature of stress exposure.

    PubMed

    Zhang, Bo; Peng, Yu; Zheng, Jincheng; Liang, Lina; Hoffmann, Ary A; Ma, Chun-Sen

    2016-07-01

    Heat shock protein gene (Hsp) families are thought to be important in thermal adaptation, but their expression patterns under various thermal stresses have still been poorly characterized outside of model systems. We have therefore characterized Hsp genes and their stress responses in the oriental fruit moth (OFM), Grapholita molesta, a widespread global orchard pest, and compared patterns of expression in this species to that of other insects. Genes from four Hsp families showed variable expression levels among tissues and developmental stages. Members of the Hsp40, 70, and 90 families were highly expressed under short exposures to heat and cold. Expression of Hsp40, 70, and Hsc70 family members increased in OFM undergoing diapause, while Hsp90 was downregulated. We found that there was strong sequence conservation of members of large Hsp families (Hsp40, Hsp60, Hsp70, Hsc70) across taxa, but this was not always matched by conservation of expression patterns. When the large Hsps as well as small Hsps from OFM were compared under acute and ramping heat stress, two groups of sHsps expression patterns were apparent, depending on whether expression increased or decreased immediately after stress exposure. These results highlight potential differences in conservation of function as opposed to sequence in this gene family and also point to Hsp genes potentially useful as bioindicators of diapause and thermal stress in OFM.

  11. Comparative functional characterization of the CSR-1 22G-RNA pathway in Caenorhabditis nematodes.

    PubMed

    Tu, Shikui; Wu, Monica Z; Wang, Jie; Cutter, Asher D; Weng, Zhiping; Claycomb, Julie M

    2015-01-01

    As a champion of small RNA research for two decades, Caenorhabditis elegans has revealed the essential Argonaute CSR-1 to play key nuclear roles in modulating chromatin, chromosome segregation and germline gene expression via 22G-small RNAs. Despite CSR-1 being preserved among diverse nematodes, the conservation and divergence in function of the targets of small RNA pathways remains poorly resolved. Here we apply comparative functional genomic analysis between C. elegans and Caenorhabditis briggsae to characterize the CSR-1 pathway, its targets and their evolution. C. briggsae CSR-1-associated small RNAs that we identified by immunoprecipitation-small RNA sequencing overlap with 22G-RNAs depleted in cbr-csr-1 RNAi-treated worms. By comparing 22G-RNAs and target genes between species, we defined a set of CSR-1 target genes with conserved germline expression, enrichment in operons and more slowly evolving coding sequences than other genes, along with a small group of evolutionarily labile targets. We demonstrate that the association of CSR-1 with chromatin is preserved, and show that depletion of cbr-csr-1 leads to chromosome segregation defects and embryonic lethality. This first comparative characterization of a small RNA pathway in Caenorhabditis establishes a conserved nuclear role for CSR-1 and highlights its key role in germline gene regulation across multiple animal species. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Genetic identification of Iberian rodent species using both mitochondrial and nuclear loci: application to noninvasive sampling.

    PubMed

    Barbosa, S; Pauperio, J; Searle, J B; Alves, P C

    2013-01-01

    Species identification through noninvasive sampling is increasingly used in animal conservation genetics, given that it obviates the need to handle free-living individuals. Noninvasive sampling is particularly valuable for elusive and small species such as rodents. Although rodents are not usually assumed to be the most obvious target for conservation, of the 21 species or near-species present in Iberia, three are considered endangered and declining, while several others are poorly studied. Here, we develop a genetic tool for identifying all rodent species in Iberia by noninvasive genetic sampling. To achieve this purpose, we selected one mitochondrial gene [cytochrome b (cyt-b)] and one nuclear gene [interphotoreceptor retinoid-binding protein (IRBP)], which we first sequenced using tissue samples. Both genes allow for the phylogenetic distinction of all species except the sibling species Microtus lusitanicus and Microtus duodecimcostatus. Overall, cyt-b showed higher resolution than IRBP, revealing a clear barcoding gap. To allow these markers to be applied to noninvasive samples, we selected a short highly diagnostic fragment from each gene, which we used to obtain sequences from faeces and bones from owl pellets. Amplification success for the cyt-b and IRBP fragment was 85% and 43% in faecal and 88% and 64% in owl-pellet DNA extractions, respectively. The method allows the unambiguous identification of the great majority of Iberian rodent species from noninvasive samples, with application in studies of distribution, spatial ecology and population dynamics, and for conservation. © 2012 Blackwell Publishing Ltd.

  13. CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison

    PubMed Central

    Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano

    2004-01-01

    The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features. PMID:15215464

  14. CodonLogo: a sequence logo-based viewer for codon patterns.

    PubMed

    Sharma, Virag; Murphy, David P; Provan, Gregory; Baranov, Pavel V

    2012-07-15

    Conserved patterns across a multiple sequence alignment can be visualized by generating sequence logos. Sequence logos show each column in the alignment as stacks of symbol(s) where the height of a stack is proportional to its informational content, whereas the height of each symbol within the stack is proportional to its frequency in the column. Sequence logos use symbols of either nucleotide or amino acid alphabets. However, certain regulatory signals in messenger RNA (mRNA) act as combinations of codons. Yet no tool is available for visualization of conserved codon patterns. We present the first application which allows visualization of conserved regions in a multiple sequence alignment in the context of codons. CodonLogo is based on WebLogo3 and uses the same heuristics but treats codons as inseparable units of a 64-letter alphabet. CodonLogo can discriminate patterns of codon conservation from patterns of nucleotide conservation that appear indistinguishable in standard sequence logos. The CodonLogo source code and its implementation (in a local version of the Galaxy Browser) are available at http://recode.ucc.ie/CodonLogo and through the Galaxy Tool Shed at http://toolshed.g2.bx.psu.edu/.

  15. New families of site-specific repetitive DNA sequences that comprise constitutive heterochromatin of the Syrian hamster (Mesocricetus auratus, Cricetinae, Rodentia).

    PubMed

    Yamada, Kazuhiko; Kamimura, Eikichi; Kondo, Mariko; Tsuchiya, Kimiyuki; Nishida-Umehara, Chizuko; Matsuda, Yoichi

    2006-02-01

    We molecularly cloned new families of site-specific repetitive DNA sequences from BglII- and EcoRI-digested genomic DNA of the Syrian hamster (Mesocricetus auratus, Cricetrinae, Rodentia) and characterized them by chromosome in situ hybridization and filter hybridization. They were classified into six different types of repetitive DNA sequence families according to chromosomal distribution and genome organization. The hybridization patterns of the sequences were consistent with the distribution of C-positive bands and/or Hoechst-stained heterochromatin. The centromeric major satellite DNA and sex chromosome-specific and telomeric region-specific repetitive sequences were conserved in the same genus (Mesocricetus) but divergent in different genera. The chromosome-2-specific sequence was conserved in two genera, Mesocricetus and Cricetulus, and a low copy number of repetitive sequences on the heterochromatic chromosome arms were conserved in the subfamily Cricetinae but not in the subfamily Calomyscinae. By contrast, the other type of repetitive sequences on the heterochromatic chromosome arms, which had sequence similarities to a LINE sequence of rodents, was conserved through the three subfamilies, Cricetinae, Calomyscinae and Murinae. The nucleotide divergence of the repetitive sequences of heterochromatin was well correlated with the phylogenetic relationships of the Cricetinae species, and each sequence has been independently amplified and diverged in the same genome.

  16. SEPT9 Mutations and a Conserved 17q25 Sequence in Sporadic and Hereditary Brachial Plexus Neuropathy

    PubMed Central

    Klein, Christopher J.; Wu, Yanhong; Cunningham, Julie M.; Windebank, Anthony J.; Dyck, P. James B.; Friedenberg, Scott M.; Klein, Diane M.; Dyck, Peter J.

    2009-01-01

    Background The clinical characteristics of sporadic brachial plexus neuropathy (S-BPN) and hereditary brachial plexus neuropathy (H-BPN) are similar. At times of attack inflammation in brachial plexus nerves has been identified in both conditions. SEPT-9 mutations (Arg88Trp, Ser93Phe, 5UTR-131G to C) occur in some families with H-BPN. These mutations were not found in American H-BPN kindreds with a conserved 500 Kb sequence of DNA at 17q25 (the location of SEPT-9) where a founder mutation has been suggested. Objective To study 17q25 and SEPT-9 in S-BPN (56 patients) and H-BPN (13 kindreds). Methods Allele analysis at 17q25, SEPT-9 DNA sequencing and mRNA analysis from lymphoblast cultures. Results A conserved 17q25 sequence was found in 5 of 13 H-BPN kindreds and one S-BPN patient. This conserved sequence was not found in the family with a SEPT-9 mutation (Arg88Trp) or controls (182). SEPT-9 mRNA expression did not differ between forms of H-BPN and controls. No known mutations of SEPT-9 were found in S-BPN. Conclusions/Relevance Rare S-BPN patients have the same conserved 17q25 sequence found in many American H-BPN kindreds. BPN patients with this conserved sequence do not appear to have SEPT-9 mutations or alterations of its mRNA expression levels in lymphoblast cultures. BPN patients with this conserved sequence may have the most common genetic cause in the Americas by a founder effect mutation. PMID:19204161

  17. Evolutionary conservation of regulatory elements in vertebrate HOX gene clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Santini, Simona; Boore, Jeffrey L.; Meyer, Axel

    2003-12-31

    Due to their high degree of conservation, comparisons of DNA sequences among evolutionarily distantly-related genomes permit to identify functional regions in noncoding DNA. Hox genes are optimal candidate sequences for comparative genome analyses, because they are extremely conserved in vertebrates and occur in clusters. We aligned (Pipmaker) the nucleotide sequences of HoxA clusters of tilapia, pufferfish, striped bass, zebrafish, horn shark, human and mouse (over 500 million years of evolutionary distance). We identified several highly conserved intergenic sequences, likely to be important in gene regulation. Only a few of these putative regulatory elements have been previously described as being involvedmore » in the regulation of Hox genes, while several others are new elements that might have regulatory functions. The majority of these newly identified putative regulatory elements contain short fragments that are almost completely conserved and are identical to known binding sites for regulatory proteins (Transfac). The conserved intergenic regions located between the most rostrally expressed genes in the developing embryo are longer and better retained through evolution. We document that presumed regulatory sequences are retained differentially in either A or A clusters resulting from a genome duplication in the fish lineage. This observation supports both the hypothesis that the conserved elements are involved in gene regulation and the Duplication-Deletion-Complementation model.« less

  18. The Role of DNA Barcodes in Understanding and Conservation of Mammal Diversity in Southeast Asia

    PubMed Central

    Francis, Charles M.; Borisenko, Alex V.; Ivanova, Natalia V.; Eger, Judith L.; Lim, Burton K.; Guillén-Servent, Antonio; Kruskop, Sergei V.; Mackie, Iain; Hebert, Paul D. N.

    2010-01-01

    Background Southeast Asia is recognized as a region of very high biodiversity, much of which is currently at risk due to habitat loss and other threats. However, many aspects of this diversity, even for relatively well-known groups such as mammals, are poorly known, limiting ability to develop conservation plans. This study examines the value of DNA barcodes, sequences of the mitochondrial COI gene, to enhance understanding of mammalian diversity in the region and hence to aid conservation planning. Methodology and Principal Findings DNA barcodes were obtained from nearly 1900 specimens representing 165 recognized species of bats. All morphologically or acoustically distinct species, based on classical taxonomy, could be discriminated with DNA barcodes except four closely allied species pairs. Many currently recognized species contained multiple barcode lineages, often with deep divergence suggesting unrecognized species. In addition, most widespread species showed substantial genetic differentiation across their distributions. Our results suggest that mammal species richness within the region may be underestimated by at least 50%, and there are higher levels of endemism and greater intra-specific population structure than previously recognized. Conclusions DNA barcodes can aid conservation and research by assisting field workers in identifying species, by helping taxonomists determine species groups needing more detailed analysis, and by facilitating the recognition of the appropriate units and scales for conservation planning. PMID:20838635

  19. Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs.

    PubMed

    Busk, Peter Kamp; Lange, Lene

    2013-06-01

    Functional prediction of carbohydrate-active enzymes is difficult due to low sequence identity. However, similar enzymes often share a few short motifs, e.g., around the active site, even when the overall sequences are very different. To exploit this notion for functional prediction of carbohydrate-active enzymes, we developed a simple algorithm, peptide pattern recognition (PPR), that can divide proteins into groups of sequences that share a set of short conserved sequences. When this method was used on 118 glycoside hydrolase 5 proteins with 9% average pairwise identity and representing four characterized enzymatic functions, 97% of the proteins were sorted into groups correlating with their enzymatic activity. Furthermore, we analyzed 8,138 glycoside hydrolase 13 proteins including 204 experimentally characterized enzymes with 28 different functions. There was a 91% correlation between group and enzyme activity. These results indicate that the function of carbohydrate-active enzymes can be predicted with high precision by finding short, conserved motifs in their sequences. The glycoside hydrolase 61 family is important for fungal biomass conversion, but only a few proteins of this family have been functionally characterized. Interestingly, PPR divided 743 glycoside hydrolase 61 proteins into 16 subfamilies useful for targeted investigation of the function of these proteins and pinpointed three conserved motifs with putative importance for enzyme activity. Furthermore, the conserved sequences were useful for cloning of new, subfamily-specific glycoside hydrolase 61 proteins from 14 fungi. In conclusion, identification of conserved sequence motifs is a new approach to sequence analysis that can predict carbohydrate-active enzyme functions with high precision.

  20. Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

    PubMed Central

    2012-01-01

    Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods. Conclusions Appropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems. PMID:22643026

  1. Transcriptional Activation Signals Found in the Epstein-Barr Virus (EBV) Latency C Promoter Are Conserved in the Latency C Promoter Sequences from Baboon and Rhesus Monkey EBV-Like Lymphocryptoviruses (Cercopithicine Herpesviruses 12 and 15)

    PubMed Central

    Fuentes-Pananá, Ezequiel M.; Swaminathan, Sankar; Ling, Paul D.

    1999-01-01

    The Epstein-Barr virus (EBV) EBNA2 protein is a transcriptional activator that controls viral latent gene expression and is essential for EBV-driven B-cell immortalization. EBNA2 is expressed from the viral C promoter (Cp) and regulates its own expression by activating Cp through interaction with the cellular DNA binding protein CBF1. Through regulation of Cp and EBNA2 expression, EBV controls the pattern of latent protein expression and the type of latency established. To gain further insight into the important regulatory elements that modulate Cp usage, we isolated and sequenced the Cp regions corresponding to nucleotides 10251 to 11479 of the EBV genome (−1079 to +144 relative to the transcription initiation site) from the EBV-like lymphocryptoviruses found in baboons (herpesvirus papio; HVP) and Rhesus macaques (RhEBV). Sequence comparison of the approximately 1,230-bp Cp regions from these primate viruses revealed that EBV and HVP Cp sequences are 64% conserved, EBV and RhEBV Cp sequences are 66% conserved, and HVP and RhEBV Cp sequences are 65% conserved relative to each other. Approximately 50% of the residues are conserved among all three sequences, yet all three viruses have retained response elements for glucocorticoids, two positionally conserved CCAAT boxes, and positionally conserved TATA boxes. The putative EBNA2 100-bp enhancers within these promoters contain 54 conserved residues, and the binding sites for CBF1 and CBF2 are well conserved. Cp usage in the HVP- and RhEBV-transformed cell lines was detected by S1 nuclease protection analysis. Transient-transfection analysis showed that promoters of both HVP and RhEBV are responsive to EBNA2 and that they bind CBF1 and CBF2 in gel mobility shift assays. These results suggest that similar mechanisms for regulation of latent gene expression are conserved among the EBV-related lymphocryptoviruses found in nonhuman primates. PMID:9847397

  2. Transcriptional activation signals found in the Epstein-Barr virus (EBV) latency C promoter are conserved in the latency C promoter sequences from baboon and Rhesus monkey EBV-like lymphocryptoviruses (cercopithicine herpesviruses 12 and 15).

    PubMed

    Fuentes-Pananá, E M; Swaminathan, S; Ling, P D

    1999-01-01

    The Epstein-Barr virus (EBV) EBNA2 protein is a transcriptional activator that controls viral latent gene expression and is essential for EBV-driven B-cell immortalization. EBNA2 is expressed from the viral C promoter (Cp) and regulates its own expression by activating Cp through interaction with the cellular DNA binding protein CBF1. Through regulation of Cp and EBNA2 expression, EBV controls the pattern of latent protein expression and the type of latency established. To gain further insight into the important regulatory elements that modulate Cp usage, we isolated and sequenced the Cp regions corresponding to nucleotides 10251 to 11479 of the EBV genome (-1079 to +144 relative to the transcription initiation site) from the EBV-like lymphocryptoviruses found in baboons (herpesvirus papio; HVP) and Rhesus macaques (RhEBV). Sequence comparison of the approximately 1,230-bp Cp regions from these primate viruses revealed that EBV and HVP Cp sequences are 64% conserved, EBV and RhEBV Cp sequences are 66% conserved, and HVP and RhEBV Cp sequences are 65% conserved relative to each other. Approximately 50% of the residues are conserved among all three sequences, yet all three viruses have retained response elements for glucocorticoids, two positionally conserved CCAAT boxes, and positionally conserved TATA boxes. The putative EBNA2 100-bp enhancers within these promoters contain 54 conserved residues, and the binding sites for CBF1 and CBF2 are well conserved. Cp usage in the HVP- and RhEBV-transformed cell lines was detected by S1 nuclease protection analysis. Transient-transfection analysis showed that promoters of both HVP and RhEBV are responsive to EBNA2 and that they bind CBF1 and CBF2 in gel mobility shift assays. These results suggest that similar mechanisms for regulation of latent gene expression are conserved among the EBV-related lymphocryptoviruses found in nonhuman primates.

  3. Combining protein sequence, structure, and dynamics: A novel approach for functional evolution analysis of PAS domain superfamily.

    PubMed

    Dong, Zheng; Zhou, Hongyu; Tao, Peng

    2018-02-01

    PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence-structure-dynamics-function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence-conserved residues and build phylogenetic tree. Three-dimensional structure alignment was also applied to obtain structure-conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics. © 2017 The Protein Society.

  4. Comparative genomics using microarrays reveals divergence and loss of virulence-associated genes in host-specific strains of the insect pathogen Metarhizium anisopliae.

    PubMed

    Wang, Sibao; Leclerque, Andreas; Pava-Ripoll, Monica; Fang, Weiguo; St Leger, Raymond J

    2009-06-01

    Many strains of Metarhizium anisopliae have broad host ranges, but others are specialists and adapted to particular hosts. Patterns of gene duplication, divergence, and deletion in three generalist and three specialist strains were investigated by heterologous hybridization of genomic DNA to genes from the generalist strain Ma2575. As expected, major life processes are highly conserved, presumably due to purifying selection. However, up to 7% of Ma2575 genes were highly divergent or absent in specialist strains. Many of these sequences are conserved in other fungal species, suggesting that there has been rapid evolution and loss in specialist Metarhizium genomes. Some poorly hybridizing genes in specialists were functionally coordinated, indicative of reductive evolution. These included several involved in toxin biosynthesis and sugar metabolism in root exudates, suggesting that specialists are losing genes required to live in alternative hosts or as saprophytes. Several components of mobile genetic elements were also highly divergent or lost in specialists. Exceptionally, the genome of the specialist cricket pathogen Ma443 contained extra insertion elements that might play a role in generating evolutionary novelty. This study throws light on the abundance of orphans in genomes, as 15% of orphan sequences were found to be rapidly evolving in the Ma2575 lineage.

  5. Genome-wide characterization of microRNA in foxtail millet (Setaria italica)

    PubMed Central

    2013-01-01

    Background MicroRNAs (miRNAs) are a class of short non-coding, endogenous RNAs that play key roles in many biological processes in both animals and plants. Although many miRNAs have been identified in a large number of organisms, the miRNAs in foxtail millet (Setaria italica) have, until now, been poorly understood. Results In this study, two replicate small RNA libraries from foxtail millet shoots were sequenced, and 40 million reads representing over 10 million unique sequences were generated. We identified 43 known miRNAs, 172 novel miRNAs and 2 mirtron precursor candidates in foxtail millet. Some miRNA*s of the known and novel miRNAs were detected as well. Further, eight novel miRNAs were validated by stem-loop RT-PCR. Potential targets of the foxtail millet miRNAs were predicted based on our strict criteria. Of the predicted target genes, 79% (351) had functional annotations in InterPro and GO analyses, indicating the targets of the miRNAs were involved in a wide range of regulatory functions and some specific biological processes. A total of 69 pairs of syntenic miRNA precursors that were conserved between foxtail millet and sorghum were found. Additionally, stem-loop RT-PCR was conducted to confirm the tissue-specific expression of some miRNAs in the four tissues identified by deep-sequencing. Conclusions We predicted, for the first time, 215 miRNAs and 447 miRNA targets in foxtail millet at a genome-wide level. The precursors, expression levels, miRNA* sequences, target functions, conservation, and evolution of miRNAs we identified were investigated. Some of the novel foxtail millet miRNAs and miRNA targets were validated experimentally. PMID:24330712

  6. Genome-wide characterization of microRNA in foxtail millet (Setaria italica).

    PubMed

    Yi, Fei; Xie, Shaojun; Liu, Yuwei; Qi, Xin; Yu, Jingjuan

    2013-12-13

    MicroRNAs (miRNAs) are a class of short non-coding, endogenous RNAs that play key roles in many biological processes in both animals and plants. Although many miRNAs have been identified in a large number of organisms, the miRNAs in foxtail millet (Setaria italica) have, until now, been poorly understood. In this study, two replicate small RNA libraries from foxtail millet shoots were sequenced, and 40 million reads representing over 10 million unique sequences were generated. We identified 43 known miRNAs, 172 novel miRNAs and 2 mirtron precursor candidates in foxtail millet. Some miRNA*s of the known and novel miRNAs were detected as well. Further, eight novel miRNAs were validated by stem-loop RT-PCR. Potential targets of the foxtail millet miRNAs were predicted based on our strict criteria. Of the predicted target genes, 79% (351) had functional annotations in InterPro and GO analyses, indicating the targets of the miRNAs were involved in a wide range of regulatory functions and some specific biological processes. A total of 69 pairs of syntenic miRNA precursors that were conserved between foxtail millet and sorghum were found. Additionally, stem-loop RT-PCR was conducted to confirm the tissue-specific expression of some miRNAs in the four tissues identified by deep-sequencing. We predicted, for the first time, 215 miRNAs and 447 miRNA targets in foxtail millet at a genome-wide level. The precursors, expression levels, miRNA* sequences, target functions, conservation, and evolution of miRNAs we identified were investigated. Some of the novel foxtail millet miRNAs and miRNA targets were validated experimentally.

  7. Quantifying the relationship between sequence and three-dimensional structure conservation in RNA

    PubMed Central

    2010-01-01

    Background In recent years, the number of available RNA structures has rapidly grown reflecting the increased interest on RNA biology. Similarly to the studies carried out two decades ago for proteins, which gave the fundamental grounds for developing comparative protein structure prediction methods, we are now able to quantify the relationship between sequence and structure conservation in RNA. Results Here we introduce an all-against-all sequence- and three-dimensional (3D) structure-based comparison of a representative set of RNA structures, which have allowed us to quantitatively confirm that: (i) there is a measurable relationship between sequence and structure conservation that weakens for alignments resulting in below 60% sequence identity, (ii) evolution tends to conserve more RNA structure than sequence, and (iii) there is a twilight zone for RNA homology detection. Discussion The computational analysis here presented quantitatively describes the relationship between sequence and structure for RNA molecules and defines a twilight zone region for detecting RNA homology. Our work could represent the theoretical basis and limitations for future developments in comparative RNA 3D structure prediction. PMID:20550657

  8. Differential sequence diversity at merozoite surface protein-1 locus of Plasmodium knowlesi from humans and macaques in Thailand.

    PubMed

    Putaporntip, Chaturong; Thongaree, Siriporn; Jongwutiwes, Somchai

    2013-08-01

    To determine the genetic diversity and potential transmission routes of Plasmodium knowlesi, we analyzed the complete nucleotide sequence of the gene encoding the merozoite surface protein-1 of this simian malaria (Pkmsp-1), an asexual blood-stage vaccine candidate, from naturally infected humans and macaques in Thailand. Analysis of Pkmsp-1 sequences from humans (n=12) and monkeys (n=12) reveals five conserved and four variable domains. Most nucleotide substitutions in conserved domains were dimorphic whereas three of four variable domains contained complex repeats with extensive sequence and size variation. Besides purifying selection in conserved domains, evidence of intragenic recombination scattering across Pkmsp-1 was detected. The number of haplotypes, haplotype diversity, nucleotide diversity and recombination sites of human-derived sequences exceeded that of monkey-derived sequences. Phylogenetic networks based on concatenated conserved sequences of Pkmsp-1 displayed a character pattern that could have arisen from sampling process or the presence of two independent routes of P. knowlesi transmission, i.e. from macaques to human and from human to humans in Thailand. Copyright © 2013 Elsevier B.V. All rights reserved.

  9. A Differential Abundance Analysis of Very Metal-poor Stars

    NASA Astrophysics Data System (ADS)

    O'Malley, Erin M.; McWilliam, Andrew; Chaboyer, Brian; Thompson, Ian

    2017-04-01

    We have performed a differential line-by-line chemical abundance analysis, ultimately relative to the Sun, of nine very metal-poor main-sequence (MS) halo stars, near [Fe/H] = -2 dex. Our abundances range from -2.66≤slant [{Fe}/{{H}}]≤slant -1.40 dex with conservative uncertainties of 0.07 dex. We find an average [α/Fe] = 0.34 ± 0.09 dex, typical of the Milky Way. While our spectroscopic atmosphere parameters provide good agreement with Hubble Space Telescope parallaxes, there is significant disagreement with temperature and gravity parameters indicated by observed colors and theoretical isochrones. Although a systematic underestimate of the stellar temperature by a few hundred degrees could explain this difference, it is not supported by current effective temperature studies and would create large uncertainties in the abundance determinations. Both 1D and < 3{{D}}> hydrodynamical models combined with separate 1D non-LTE effects do not yet account for the atmospheres of real metal-poor MS stars, but a fully 3D non-LTE treatment may be able to explain the ionization imbalance found in this work.

  10. Airway and Feeding Outcomes of Mandibular Distraction, Tongue-Lip Adhesion, and Conservative Management in Pierre Robin Sequence: A Prospective Study.

    PubMed

    Khansa, Ibrahim; Hall, Courtney; Madhoun, Lauren L; Splaingard, Mark; Baylis, Adriane; Kirschner, Richard E; Pearson, Gregory D

    2017-04-01

    Pierre Robin sequence is characterized by mandibular retrognathia and glossoptosis resulting in airway obstruction and feeding difficulties. When conservative management fails, mandibular distraction osteogenesis or tongue-lip adhesion may be required to avoid tracheostomy. The authors' goal was to prospectively evaluate the airway and feeding outcomes of their comprehensive approach to Pierre Robin sequence, which includes conservative management, mandibular distraction osteogenesis, and tongue-lip adhesion. A longitudinal study of newborns with Pierre Robin sequence treated at a pediatric academic medical center between 2010 and 2015 was performed. Baseline feeding and respiratory data were collected. Patients underwent conservative management if they demonstrated sustainable weight gain without tube feeds, and if their airway was stable with positioning alone. Patients who required surgery underwent tongue-lip adhesion or mandibular distraction osteogenesis based on family and surgeon preference. Postoperative airway and feeding data were collected. Twenty-eight patients with Pierre Robin sequence were followed prospectively. Thirty-two percent had a syndrome. Ten underwent mandibular distraction osteogenesis, eight underwent tongue-lip adhesion, and 10 were treated conservatively. There were no differences in days to extubation or discharge, change in weight percentile, requirement for gastrostomy tube, or residual obstructive sleep apnea between the three groups. No patients required tracheostomy. The greatest reduction in apnea-hypopnea index occurred with mandibular distraction osteogenesis, followed by tongue-lip adhesion and conservative management. Careful selection of which patients with Pierre Robin sequence need surgery, and of the most appropriate surgical procedure for each patient, can minimize the need for postprocedure tracheostomy. A comprehensive approach to Pierre Robin sequence that includes conservative management, mandibular distraction osteogenesis, and tongue-lip adhesion can result in excellent airway and feeding outcomes. Therapeutic, II.

  11. The Number, Organization, and Size of Polymorphic Membrane Protein Coding Sequences as well as the Most Conserved Pmp Protein Differ within and across Chlamydia Species.

    PubMed

    Van Lent, Sarah; Creasy, Heather Huot; Myers, Garry S A; Vanrompay, Daisy

    2016-01-01

    Variation is a central trait of the polymorphic membrane protein (Pmp) family. The number of pmp coding sequences differs between Chlamydia species, but it is unknown whether the number of pmp coding sequences is constant within a Chlamydia species. The level of conservation of the Pmp proteins has previously only been determined for Chlamydia trachomatis. As different Pmp proteins might be indispensible for the pathogenesis of different Chlamydia species, this study investigated the conservation of Pmp proteins both within and across C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci. The pmp coding sequences were annotated in 16 C. trachomatis, 6 C. pneumoniae, 2 C. abortus, and 16 C. psittaci genomes. The number and organization of polymorphic membrane coding sequences differed within and across the analyzed Chlamydia species. The length of coding sequences of pmpA,pmpB, and pmpH was conserved among all analyzed genomes, while the length of pmpE/F and pmpG, and remarkably also of the subtype pmpD, differed among the analyzed genomes. PmpD, PmpA, PmpH, and PmpA were the most conserved Pmp in C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci, respectively. PmpB was the most conserved Pmp across the 4 analyzed Chlamydia species. © 2016 S. Karger AG, Basel.

  12. Identification of microRNAs differentially expressed involved in male flower development.

    PubMed

    Wang, Zhengjia; Huang, Jianqin; Sun, Zhichao; Zheng, Bingsong

    2015-03-01

    Hickory (Carya cathayensis Sarg.) is one of the most economically important woody trees in eastern China, but its long flowering phase delays yield. Our understanding of the regulatory roles of microRNAs (miRNAs) in male flower development in hickory remains poor. Using high-throughput sequencing technology, we have pyrosequenced two small RNA libraries from two male flower differentiation stages in hickory. Analysis of the sequencing data identified 114 conserved miRNAs that belonged to 23 miRNA families, five novel miRNAs including their corresponding miRNA*s, and 22 plausible miRNA candidates. Differential expression analysis revealed 12 miRNA sequences that were upregulated in the later (reproductive) stage of male flower development. Quantitative real-time PCR showed similar expression trends as that of the deep sequencing. Novel miRNAs and plausible miRNA candidates were predicted using bioinformatic analysis methods. The miRNAs newly identified in this study have increased the number of known miRNAs in hickory, and the identification of differentially expressed miRNAs will provide new avenues for studies into miRNAs involved in the process of male flower development in hickory and other related trees.

  13. Evolutionarily conserved regions and hydrophobic contacts at the superfamily level: The case of the fold-type I, pyridoxal-5′-phosphate-dependent enzymes

    PubMed Central

    Paiardini, Alessandro; Bossa, Francesco; Pascarella, Stefano

    2004-01-01

    The wealth of biological information provided by structural and genomic projects opens new prospects of understanding life and evolution at the molecular level. In this work, it is shown how computational approaches can be exploited to pinpoint protein structural features that remain invariant upon long evolutionary periods in the fold-type I, PLP-dependent enzymes. A nonredundant set of 23 superposed crystallographic structures belonging to this superfamily was built. Members of this family typically display high-structural conservation despite low-sequence identity. For each structure, a multiple-sequence alignment of orthologous sequences was obtained, and the 23 alignments were merged using the structural information to obtain a comprehensive multiple alignment of 921 sequences of fold-type I enzymes. The structurally conserved regions (SCRs), the evolutionarily conserved residues, and the conserved hydrophobic contacts (CHCs) were extracted from this data set, using both sequence and structural information. The results of this study identified a structural pattern of hydrophobic contacts shared by all of the superfamily members of fold-type I enzymes and involved in native interactions. This profile highlights the presence of a nucleus for this fold, in which residues participating in the most conserved native interactions exhibit preferential evolutionary conservation, that correlates significantly (r = 0.70) with the extent of mean hydrophobic contact value of their apolar fraction. PMID:15498941

  14. Polymerase Chain Reaction (PCR)-based methods for detection and identification of mycotoxigenic Penicillium species using conserved genes

    USDA-ARS?s Scientific Manuscript database

    Polymerase chain reaction amplification of conserved genes and sequence analysis provides a very powerful tool for the identification of toxigenic as well as non-toxigenic Penicillium species. Sequences are obtained by amplification of the gene fragment, sequencing via capillary electrophoresis of d...

  15. RNA expression in a cartilaginous fish cell line reveals ancient 3′ noncoding regions highly conserved in vertebrates

    PubMed Central

    Forest, David; Nishikawa, Ryuhei; Kobayashi, Hiroshi; Parton, Angela; Bayne, Christopher J.; Barnes, David W.

    2007-01-01

    We have established a cartilaginous fish cell line [Squalus acanthias embryo cell line (SAE)], a mesenchymal stem cell line derived from the embryo of an elasmobranch, the spiny dogfish shark S. acanthias. Elasmobranchs (sharks and rays) first appeared >400 million years ago, and existing species provide useful models for comparative vertebrate cell biology, physiology, and genomics. Comparative vertebrate genomics among evolutionarily distant organisms can provide sequence conservation information that facilitates identification of critical coding and noncoding regions. Although these genomic analyses are informative, experimental verification of functions of genomic sequences depends heavily on cell culture approaches. Using ESTs defining mRNAs derived from the SAE cell line, we identified lengthy and highly conserved gene-specific nucleotide sequences in the noncoding 3′ UTRs of eight genes involved in the regulation of cell growth and proliferation. Conserved noncoding 3′ mRNA regions detected by using the shark nucleotide sequences as a starting point were found in a range of other vertebrate orders, including bony fish, birds, amphibians, and mammals. Nucleotide identity of shark and human in these regions was remarkably well conserved. Our results indicate that highly conserved gene sequences dating from the appearance of jawed vertebrates and representing potential cis-regulatory elements can be identified through the use of cartilaginous fish as a baseline. Because the expression of genes in the SAE cell line was prerequisite for their identification, this cartilaginous fish culture system also provides a physiologically valid tool to test functional hypotheses on the role of these ancient conserved sequences in comparative cell biology. PMID:17227856

  16. Sequence conservation on the Y chromosome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gibson, L.H.; Yang-Feng, L.; Lau, C.

    The Y chromosome is present in all mammals and is considered to be essential to sex determination. Despite intense genomic research, only a few genes have been identified and mapped to this chromosome in humans. Several of them, such as SRY and ZFY, have been demonstrated to be conserved and Y-located in other mammals. In order to address the issue of sequence conservation on the Y chromosome, we performed fluorescence in situ hybridization (FISH) with DNA from a human Y cosmid library as a probe to study the Y chromosomes from other mammalian species. Total DNA from 3,000-4,500 cosmid poolsmore » were labeled with biotinylated-dUTP and hybridized to metaphase chromosomes. For human and primate preparations, human cot1 DNA was included in the hybridization mixture to suppress the hybridization from repeat sequences. FISH signals were detected on the Y chromosomes of human, gorilla, orangutan and baboon (Old World monkey) and were absent on those of squirrel monkey (New World monkey), Indian munjac, wood lemming, Chinese hamster, rat and mouse. Since sequence analysis suggested that specific genes, e.g. SRY and ZFY, are conserved between these two groups, the lack of detectable hybridization in the latter group implies either that conservation of the human Y sequences is limited to the Y chromosomes of the great apes and Old World monkeys, or that the size of the syntenic segment is too small to be detected under the resolution of FISH, or that homologeous sequences have undergone considerable divergence. Further studies with reduced hybridization stringency are currently being conducted. Our results provide some clues as to Y-sequence conservation across species and demonstrate the limitations of FISH across species with total DNA sequences from a particular chromosome.« less

  17. Evolutionary and biophysical relationships among the papillomavirus E2 proteins.

    PubMed

    Blakaj, Dukagjin M; Fernandez-Fuentes, Narcis; Chen, Zigui; Hegde, Rashmi; Fiser, Andras; Burk, Robert D; Brenowitz, Michael

    2009-01-01

    Infection by human papillomavirus (HPV) may result in clinical conditions ranging from benign warts to invasive cancer. The HPV E2 protein represses oncoprotein transcription and is required for viral replication. HPV E2 binds to palindromic DNA sequences of highly conserved four base pair sequences flanking an identical length variable 'spacer'. E2 proteins directly contact the conserved but not the spacer DNA. Variation in naturally occurring spacer sequences results in differential protein affinity that is dependent on their sensitivity to the spacer DNA's unique conformational and/or dynamic properties. This article explores the biophysical character of this core viral protein with the goal of identifying characteristics that associated with risk of virally caused malignancy. The amino acid sequence, 3d structure and electrostatic features of the E2 protein DNA binding domain are highly conserved; specific interactions with DNA binding sites have also been conserved. In contrast, the E2 protein's transactivation domain does not have extensive surfaces of highly conserved residues. Rather, regions of high conservation are localized to small surface patches. Implications to cancer biology are discussed.

  18. HMMerThread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition.

    PubMed

    Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine

    2011-03-10

    Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de.

  19. HMMerThread: Detecting Remote, Functional Conserved Domains in Entire Genomes by Combining Relaxed Sequence-Database Searches with Fold Recognition

    PubMed Central

    Bradshaw, Charles Richard; Surendranath, Vineeth; Henschel, Robert; Mueller, Matthias Stefan; Habermann, Bianca Hermine

    2011-01-01

    Conserved domains in proteins are one of the major sources of functional information for experimental design and genome-level annotation. Though search tools for conserved domain databases such as Hidden Markov Models (HMMs) are sensitive in detecting conserved domains in proteins when they share sufficient sequence similarity, they tend to miss more divergent family members, as they lack a reliable statistical framework for the detection of low sequence similarity. We have developed a greatly improved HMMerThread algorithm that can detect remotely conserved domains in highly divergent sequences. HMMerThread combines relaxed conserved domain searches with fold recognition to eliminate false positive, sequence-based identifications. With an accuracy of 90%, our software is able to automatically predict highly divergent members of conserved domain families with an associated 3-dimensional structure. We give additional confidence to our predictions by validation across species. We have run HMMerThread searches on eight proteomes including human and present a rich resource of remotely conserved domains, which adds significantly to the functional annotation of entire proteomes. We find ∼4500 cross-species validated, remotely conserved domain predictions in the human proteome alone. As an example, we find a DNA-binding domain in the C-terminal part of the A-kinase anchor protein 10 (AKAP10), a PKA adaptor that has been implicated in cardiac arrhythmias and premature cardiac death, which upon stress likely translocates from mitochondria to the nucleus/nucleolus. Based on our prediction, we propose that with this HLH-domain, AKAP10 is involved in the transcriptional control of stress response. Further remotely conserved domains we discuss are examples from areas such as sporulation, chromosome segregation and signalling during immune response. The HMMerThread algorithm is able to automatically detect the presence of remotely conserved domains in proteins based on weak sequence similarity. Our predictions open up new avenues for biological and medical studies. Genome-wide HMMerThread domains are available at http://vm1-hmmerthread.age.mpg.de. PMID:21423752

  20. Functional centromeres in Astragalus sinicus include a compact centromere-specific histone H3 and a 20-bp tandem repeat.

    PubMed

    Tek, Ahmet L; Kashihara, Kazunari; Murata, Minoru; Nagaki, Kiyotaka

    2011-11-01

    The centromere plays an essential role for proper chromosome segregation during cell division and usually harbors long arrays of tandem repeated satellite DNA sequences. Although this function is conserved among eukaryotes, the sequences of centromeric DNA repeats are variable. Most of our understanding of functional centromeres, which are defined by localization of a centromere-specific histone H3 (CENH3) protein, comes from model organisms. The components of the functional centromere in legumes are poorly known. The genus Astragalus is a member of the legumes and bears the largest numbers of species among angiosperms. Therefore, we studied the components of centromeres in Astragalus sinicus. We identified the CenH3 homolog of A. sinicus, AsCenH3 that is the most compact in size among higher eukaryotes. A CENH3-based assay revealed the functional centromeric DNA sequences from A. sinicus, called CentAs. The CentAs repeat is localized in A. sinicus centromeres, and comprises an AT-rich tandem repeat with a monomer size of 20 nucleotides.

  1. Conservation of coevolving protein interfaces bridges prokaryote–eukaryote homologies in the twilight zone

    PubMed Central

    Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso

    2016-01-01

    Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach. PMID:27965389

  2. Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone.

    PubMed

    Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso

    2016-12-27

    Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.

  3. Targeting Conserved Genes in Penicillium Species.

    PubMed

    Peterson, Stephen W

    2017-01-01

    Polymerase chain reaction amplification of conserved genes and sequence analysis provides a very powerful tool for the identification of toxigenic as well as non-toxigenic Penicillium species. Sequences are obtained by amplification of the gene fragment, sequencing via capillary electrophoresis of dideoxynucleotide-labeled fragments or NGS. The sequences are compared to a database of validated isolates. Identification of species indicates the potential of the fungus to make particular mycotoxins.

  4. Nucleotide sequence of the ribosomal RNA gene of Physarum polycephalum: intron 2 and its flanking regions of the 26S rRNA gene.

    PubMed Central

    Nomiyama, H; Kuhara, S; Kukita, T; Otsuka, T; Sakaki, Y

    1981-01-01

    The 26S ribosomal RNA gene of Physarum polycephalum is interrupted by two introns, and we have previously determined the sequence of one of them (intron 1) (Nomiyama et al. Proc.Natl.Acad.Sci.USA 78, 1376-1380, 1981). In this study we sequenced the second intron (intron 2) of about 0.5 kb length and its flanking regions, and found that one nucleotide at each junction is identical in intron 1 and intron 2, though the junction regions share no other sequence homology. Comparison of the flanking exon sequences to E. coli 23S rRNA sequences shows that conserved sequences are interspersed with tracts having little homology. In particular, the region encompassing the intron 2 interruption site is highly conserved. The E. coli ribosomal protein L1 binding region is also conserved. Images PMID:6171776

  5. Nucleotide sequence determination of guinea-pig casein B mRNA reveals homology with bovine and rat alpha s1 caseins and conservation of the non-coding regions of the mRNA.

    PubMed Central

    Hall, L; Laird, J E; Craig, R K

    1984-01-01

    Nucleotide sequence analysis of cloned guinea-pig casein B cDNA sequences has identified two casein B variants related to the bovine and rat alpha s1 caseins. Amino acid homology was largely confined to the known bovine or predicted rat phosphorylation sites and within the 'signal' precursor sequence. Comparison of the deduced nucleotide sequence of the guinea-pig and rat alpha s1 casein mRNA species showed greater sequence conservation in the non-coding than in the coding regions, suggesting a functional and possibly regulatory role for the non-coding regions of casein mRNA. The results provide insight into the evolution of the casein genes, and raise questions as to the role of conserved nucleotide sequences within the non-coding regions of mRNA species. Images Fig. 1. PMID:6548375

  6. Molecular evolution of ependymin and the phylogenetic resolution of early divergences among euteleost fishes.

    PubMed

    Ortí, G; Meyer, A

    1996-04-01

    The rate and pattern of DNA evolution of ependymin, a single-copy gene coding for a highly expressed glycoprotein in the brain matrix of teleost fishes, is characterized and its phylogenetic utility for fish systematics is assessed. DNA sequences were determined from catfish, electric fish, and characiforms and compared with published ependymin sequences from cyprinids, salmon, pike, and herring. Among these groups, ependymin amino acid sequences were highly divergent (up to 60% sequence difference), but had surprisingly similar hydropathy profiles and invariant glycosylation sites, suggesting that functional properties of the proteins are conserved. Comparison of base composition at third codon positions and introns revealed AT-rich introns and GC-rich third codon positions, suggesting that the biased codon usage observed might not be due to mutational bias. Phylogenetic information content of third codon positions was surprisingly high and sufficient to recover the most basal nodes of the tree, in spite of the observation that pairwise distances (at third codon positions) were well above the presumed saturation level. This finding can be explained by the high proportion of phylogenetically informative nonsynonymous changes at third codon positions among these highly divergent proteins. Ependymin DNA sequences have established the first molecular evidence for the monophyly of a group containing salmonids and esociforms. In addition, ependymin suggests a sister group relationship of electric fish (Gymnotiformes) and Characiformes, constituting a significant departure from currently accepted classifications. However, relationships among characiform lineages were not completely resolved by ependymin sequences in spite of seemingly appropriate levels of variation among taxa and considerably low levels of homoplasy in the data (consistency index = 0.7). If the diversification of Characiformes took place in an "explosive" manner, over a relatively short period of time this pattern should also be observed using other phylogenetic markers. Poor conservation of ependymin's primary structure hinders the design of efficient primers for PCR that could be used in wide-ranging fish systematic studies. However, alternative methods like PCR amplification from cDNA used here should provide promising comparative sequence data for the resolution of phylogenetic relationships among other basal lineages of teleost fishes.

  7. The genome sequence of the model ascomycete fungus Podospora anserina.

    PubMed

    Espagne, Eric; Lespinet, Olivier; Malagnac, Fabienne; Da Silva, Corinne; Jaillon, Olivier; Porcel, Betina M; Couloux, Arnaud; Aury, Jean-Marc; Ségurens, Béatrice; Poulain, Julie; Anthouard, Véronique; Grossetete, Sandrine; Khalili, Hamid; Coppin, Evelyne; Déquard-Chablat, Michelle; Picard, Marguerite; Contamine, Véronique; Arnaise, Sylvie; Bourdais, Anne; Berteaux-Lecellier, Véronique; Gautheret, Daniel; de Vries, Ronald P; Battaglia, Evy; Coutinho, Pedro M; Danchin, Etienne Gj; Henrissat, Bernard; Khoury, Riyad El; Sainsard-Chanet, Annie; Boivin, Antoine; Pinan-Lucarré, Bérangère; Sellem, Carole H; Debuchy, Robert; Wincker, Patrick; Weissenbach, Jean; Silar, Philippe

    2008-01-01

    The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development. We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed sequence tag collection. Similar to higher eukaryotes, the P. anserina transcription/splicing machinery generates numerous non-conventional transcripts. Comparison of the P. anserina genome and orthologous gene set with the one of its close relatives, Neurospora crassa, shows that synteny is poorly conserved, the main result of evolution being gene shuffling in the same chromosome. The P. anserina genome contains fewer repeated sequences and has evolved new genes by duplication since its separation from N. crassa, despite the presence of the repeat induced point mutation mechanism that mutates duplicated sequences. We also provide evidence that frequent gene loss took place in the lineages leading to P. anserina and N. crassa. P. anserina contains a large and highly specialized set of genes involved in utilization of natural carbon sources commonly found in its natural biotope. It includes genes potentially involved in lignin degradation and efficient cellulose breakdown. The features of the P. anserina genome indicate a highly dynamic evolution since the divergence of P. anserina and N. crassa, leading to the ability of the former to use specific complex carbon sources that match its needs in its natural biotope.

  8. Global MLST of Salmonella Typhi Revisited in Post-genomic Era: Genetic Conservation, Population Structure, and Comparative Genomics of Rare Sequence Types.

    PubMed

    Yap, Kien-Pong; Ho, Wing S; Gan, Han M; Chai, Lay C; Thong, Kwai L

    2016-01-01

    Typhoid fever, caused by Salmonella enterica serovar Typhi, remains an important public health burden in Southeast Asia and other endemic countries. Various genotyping methods have been applied to study the genetic variations of this human-restricted pathogen. Multilocus sequence typing (MLST) is one of the widely accepted methods, and recently, there is a growing interest in the re-application of MLST in the post-genomic era. In this study, we provide the global MLST distribution of S. Typhi utilizing both publicly available 1,826 S. Typhi genome sequences in addition to performing conventional MLST on S. Typhi strains isolated from various endemic regions spanning over a century. Our global MLST analysis confirms the predominance of two sequence types (ST1 and ST2) co-existing in the endemic regions. Interestingly, S. Typhi strains with ST8 are currently confined within the African continent. Comparative genomic analyses of ST8 and other rare STs with genomes of ST1/ST2 revealed unique mutations in important virulence genes such as flhB, sipC, and tviD that may explain the variations that differentiate between seemingly successful (widespread) and unsuccessful (poor dissemination) S. Typhi populations. Large scale whole-genome phylogeny demonstrated evidence of phylogeographical structuring and showed that ST8 may have diverged from the earlier ancestral population of ST1 and ST2, which later lost some of its fitness advantages, leading to poor worldwide dissemination. In response to the unprecedented increase in genomic data, this study demonstrates and highlights the utility of large-scale genome-based MLST as a quick and effective approach to narrow the scope of in-depth comparative genomic analysis and consequently provide new insights into the fine scale of pathogen evolution and population structure.

  9. Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite gene clusters.

    PubMed

    Dallery, Jean-Félix; Lapalu, Nicolas; Zampounis, Antonios; Pigné, Sandrine; Luyten, Isabelle; Amselem, Joëlle; Wittenberg, Alexander H J; Zhou, Shiguo; de Queiroz, Marisa V; Robin, Guillaume P; Auger, Annie; Hainaut, Matthieu; Henrissat, Bernard; Kim, Ki-Tae; Lee, Yong-Hwan; Lespinet, Olivier; Schwartz, David C; Thon, Michael R; O'Connell, Richard J

    2017-08-29

    The ascomycete fungus Colletotrichum higginsianum causes anthracnose disease of brassica crops and the model plant Arabidopsis thaliana. Previous versions of the genome sequence were highly fragmented, causing errors in the prediction of protein-coding genes and preventing the analysis of repetitive sequences and genome architecture. Here, we re-sequenced the genome using single-molecule real-time (SMRT) sequencing technology and, in combination with optical map data, this provided a gapless assembly of all twelve chromosomes except for the ribosomal DNA repeat cluster on chromosome 7. The more accurate gene annotation made possible by this new assembly revealed a large repertoire of secondary metabolism (SM) key genes (89) and putative biosynthetic pathways (77 SM gene clusters). The two mini-chromosomes differed from the ten core chromosomes in being repeat- and AT-rich and gene-poor but were significantly enriched with genes encoding putative secreted effector proteins. Transposable elements (TEs) were found to occupy 7% of the genome by length. Certain TE families showed a statistically significant association with effector genes and SM cluster genes and were transcriptionally active at particular stages of fungal development. All 24 subtelomeres were found to contain one of three highly-conserved repeat elements which, by providing sites for homologous recombination, were probably instrumental in four segmental duplications. The gapless genome of C. higginsianum provides access to repeat-rich regions that were previously poorly assembled, notably the mini-chromosomes and subtelomeres, and allowed prediction of the complete SM gene repertoire. It also provides insights into the potential role of TEs in gene and genome evolution and host adaptation in this asexual pathogen.

  10. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

    PubMed Central

    Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

    2015-01-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930

  11. Intracellular diversity of the V4 and V9 regions of the 18S rRNA in marine protists (radiolarians) assessed by high-throughput sequencing.

    PubMed

    Decelle, Johan; Romac, Sarah; Sasaki, Eriko; Not, Fabrice; Mahé, Frédéric

    2014-01-01

    Metabarcoding is a powerful tool for exploring microbial diversity in the environment, but its accurate interpretation is impeded by diverse technical (e.g. PCR and sequencing errors) and biological biases (e.g. intra-individual polymorphism) that remain poorly understood. To help interpret environmental metabarcoding datasets, we investigated the intracellular diversity of the V4 and V9 regions of the 18S rRNA gene from Acantharia and Nassellaria (radiolarians) using 454 pyrosequencing. Individual cells of radiolarians were isolated, and PCRs were performed with generalist primers to amplify the V4 and V9 regions. Different denoising procedures were employed to filter the pyrosequenced raw amplicons (Acacia, AmpliconNoise, Linkage method). For each of the six isolated cells, an average of 541 V4 and 562 V9 amplicons assigned to radiolarians were obtained, from which one numerically dominant sequence and several minor variants were found. At the 97% identity, a diversity metrics commonly used in environmental surveys, up to 5 distinct OTUs were detected in a single cell. However, most amplicons grouped within a single OTU whereas other OTUs contained very few amplicons. Different analytical methods provided evidence that most minor variants forming different OTUs correspond to PCR and sequencing artifacts. Duplicate PCR and sequencing from the same DNA extract of a single cell had only 9 to 16% of unique amplicons in common, and alignment visualization of V4 and V9 amplicons showed that most minor variants contained substitutions in highly-conserved regions. We conclude that intracellular variability of the 18S rRNA in radiolarians is very limited despite its multi-copy nature and the existence of multiple nuclei in these protists. Our study recommends some technical guidelines to conservatively discard artificial amplicons from metabarcoding datasets, and thus properly assess the diversity and richness of protists in the environment.

  12. Scop3D: three-dimensional visualization of sequence conservation.

    PubMed

    Vermeire, Tessa; Vermaere, Stijn; Schepens, Bert; Saelens, Xavier; Van Gucht, Steven; Martens, Lennart; Vandermarliere, Elien

    2015-04-01

    The integration of a protein's structure with its known sequence variation provides insight on how that protein evolves, for instance in terms of (changing) function or immunogenicity. Yet, collating the corresponding sequence variants into a multiple sequence alignment, calculating each position's conservation, and mapping this information back onto a relevant structure is not straightforward. We therefore built the Sequence Conservation on Protein 3D structure (scop3D) tool to perform these tasks automatically. The output consists of two modified PDB files in which the B-values for each position are replaced by the percentage sequence conservation, or the information entropy for each position, respectively. Furthermore, text files with absolute and relative amino acid occurrences for each position are also provided, along with snapshots of the protein from six distinct directions in space. The visualization provided by scop3D can for instance be used as an aid in vaccine development or to identify antigenic hotspots, which we here demonstrate based on an analysis of the fusion proteins of human respiratory syncytial virus and mumps virus. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Sequencing Needs for Viral Diagnostics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gardner, S N; Lam, M; Mulakken, N J

    2004-01-26

    We built a system to guide decisions regarding the amount of genomic sequencing required to develop diagnostic DNA signatures, which are short sequences that are sufficient to uniquely identify a viral species. We used our existing DNA diagnostic signature prediction pipeline, which selects regions of a target species genome that are conserved among strains of the target (for reliability, to prevent false negatives) and unique relative to other species (for specificity, to avoid false positives). We performed simulations, based on existing sequence data, to assess the number of genome sequences of a target species and of close phylogenetic relatives (''nearmore » neighbors'') that are required to predict diagnostic signature regions that are conserved among strains of the target species and unique relative to other bacterial and viral species. For DNA viruses such as variola (smallpox), three target genomes provide sufficient guidance for selecting species-wide signatures. Three near neighbor genomes are critical for species specificity. In contrast, most RNA viruses require four target genomes and no near neighbor genomes, since lack of conservation among strains is more limiting than uniqueness. SARS and Ebola Zaire are exceptional, as additional target genomes currently do not improve predictions, but near neighbor sequences are urgently needed. Our results also indicate that double stranded DNA viruses are more conserved among strains than are RNA viruses, since in most cases there was at least one conserved signature candidate for the DNA viruses and zero conserved signature candidates for the RNA viruses.« less

  14. Cloning and sequence analysis of a cDNA encoding the alpha-subunit of mouse beta-N-acetylhexosaminidase and comparison with the human enzyme.

    PubMed Central

    Beccari, T; Hoade, J; Orlacchio, A; Stirling, J L

    1992-01-01

    cDNAs encoding the mouse beta-N-acetylhexosaminidase alpha-subunit were isolated from a mouse testis library. The longest of these (1.7 kb) was sequenced and showed 83% similarity with the human alpha-subunit cDNA sequence. The 5' end of the coding sequence was obtained from a genomic DNA clone. Alignment of the human and mouse sequences showed that all three putative N-glycosylation sites are conserved, but that the mouse alpha-subunit has an additional site towards the C-terminus. All eight cysteines in the human sequence are conserved in the mouse. There are an additional two cysteines in the mouse alpha-subunit signal peptide. All amino acids affected in Tay-Sachs-disease mutations are conserved in the mouse. Images Fig. 1. PMID:1379046

  15. Genome-wide identification of conserved microRNA and their response to drought stress in Dongxiang wild rice (Oryza rufipogon Griff.).

    PubMed

    Zhang, Fantao; Luo, Xiangdong; Zhou, Yi; Xie, Jiankun

    2016-04-01

    To identify drought stress-responsive conserved microRNA (miRNA) from Dongxiang wild rice (Oryza rufipogon Griff., DXWR) on a genome-wide scale, high-throughput sequencing technology was used to sequence libraries of DXWR samples, treated with and without drought stress. 505 conserved miRNAs corresponding to 215 families were identified. 17 were significantly down-regulated and 16 were up-regulated under drought stress. Stem-loop qRT-PCR revealed the same expression patterns as high-throughput sequencing, suggesting the accuracy of the sequencing result was high. Potential target genes of the drought-responsive miRNA were predicted to be involved in diverse biological processes. Furthermore, 16 miRNA families were first identified to be involved in drought stress response from plants. These results present a comprehensive view of the conserved miRNA and their expression patterns under drought stress for DXWR, which will provide valuable information and sequence resources for future basis studies.

  16. Conserved intergenic sequences revealed by CTAG-profiling in Salmonella: thermodynamic modeling for function prediction

    NASA Astrophysics Data System (ADS)

    Tang, Le; Zhu, Songling; Mastriani, Emilio; Fang, Xin; Zhou, Yu-Jie; Li, Yong-Guo; Johnston, Randal N.; Guo, Zheng; Liu, Gui-Rong; Liu, Shu-Lin

    2017-03-01

    Highly conserved short sequences help identify functional genomic regions and facilitate genomic annotation. We used Salmonella as the model to search the genome for evolutionarily conserved regions and focused on the tetranucleotide sequence CTAG for its potentially important functions. In Salmonella, CTAG is highly conserved across the lineages and large numbers of CTAG-containing short sequences fall in intergenic regions, strongly indicating their biological importance. Computer modeling demonstrated stable stem-loop structures in some of the CTAG-containing intergenic regions, and substitution of a nucleotide of the CTAG sequence would radically rearrange the free energy and disrupt the structure. The postulated degeneration of CTAG takes distinct patterns among Salmonella lineages and provides novel information about genomic divergence and evolution of these bacterial pathogens. Comparison of the vertically and horizontally transmitted genomic segments showed different CTAG distribution landscapes, with the genome amelioration process to remove CTAG taking place inward from both terminals of the horizontally acquired segment.

  17. Cloning and functional characterization of the Xenopus orthologue of the Treacher Collins syndrome (TCOF1) gene product.

    PubMed

    Gonzales, Bianca; Yang, Hushan; Henning, Dale; Valdez, Benigno C

    2005-10-10

    Treacher Collins syndrome (TCS) is an autosomal dominant disorder of craniofacial development caused by mutations in the TCOF1 gene, which encodes the nucleolar phosphoprotein treacle. We previously reported a function for mammalian treacle in ribosomal DNA gene transcription by its interaction with upstream binding factor. As an initial step in the development of a TCS model for frog the cDNA that encodes the Xenopus laevis treacle was cloned. Although the derived amino acid sequence shows a poor homology with its mammalian orthologues, Xenopus treacle has 11 highly homologous direct repeats near the center of the protein molecule similar to those present in its human, dog and mouse orthologues. Comparison of their amino acid compositions indicates conservation of predominant specific amino acid residues. Antisense-mediated down-regulation of treacle expression in X. laevis oocytes resulted in inhibition of rDNA gene transcription. The results suggest evolutionary conservation of the function of treacle in ribosomal RNA biogenesis in higher eukaryotes.

  18. Role of IRS-2 in insulin and cytokine signalling.

    PubMed

    Sun, X J; Wang, L M; Zhang, Y; Yenush, L; Myers, M G; Glasheen, E; Lane, W S; Pierce, J H; White, M F

    1995-09-14

    The protein IRS-1 acts as an interface between signalling proteins with Src-homology-2 domains (SH2 proteins) and the receptors for insulin, IGF-1, growth hormone, several interleukins (IL-4, IL-9, IL-13) and other cytokines. It regulates gene expression and stimulates mitogenesis, and appears to mediate insulin/IGF-1-stimulated glucose transport. Thus, survival of the IRS-1-/- mouse with only mild resistance to insulin was surprising. This dilemma is provisionally resolved with our discovery of a second IRS-signalling protein. We purified and cloned a likely candidate called 4PS from myeloid progenitor cells and, because of its resemblance to IRS-1, we designate it IRS-2. Alignment of the sequences of IRS-2 and IRS-1 revealed a highly conserved amino terminus containing a pleckstrin-homology domain and a phosphotyrosine-binding domain, and a poorly conserved carboxy terminus containing several tyrosine phosphorylation motifs. IRS-2 is expressed in many cells, including tissues from IRS-1-/- mice, and may be essential for signalling by several receptor systems.

  19. The impact of p53 protein core domain structural alteration on ovarian cancer survival.

    PubMed

    Rose, Stephen L; Robertson, Andrew D; Goodheart, Michael J; Smith, Brian J; DeYoung, Barry R; Buller, Richard E

    2003-09-15

    Although survival with a p53 missense mutation is highly variable, p53-null mutation is an independent adverse prognostic factor for advanced stage ovarian cancer. By evaluating ovarian cancer survival based upon a structure function analysis of the p53 protein, we tested the hypothesis that not all missense mutations are equivalent. The p53 gene was sequenced from 267 consecutive ovarian cancers. The effect of individual missense mutations on p53 structure was analyzed using the International Agency for Research on Cancer p53 Mutational Database, which specifies the effects of p53 mutations on p53 core domain structure. Mutations in the p53 core domain were classified as either explained or not explained in structural or functional terms by their predicted effects on protein folding, protein-DNA contacts, or mutation in highly conserved residues. Null mutations were classified by their mechanism of origin. Mutations were sequenced from 125 tumors. Effects of 62 of the 82 missense mutations (76%) could be explained by alterations in the p53 protein. Twenty-three (28%) of the explained mutations occurred in highly conserved regions of the p53 core protein. Twenty-two nonsense point mutations and 21 frameshift null mutations were sequenced. Survival was independent of missense mutation type and mechanism of null mutation. The hypothesis that not all missense mutations are equivalent is, therefore, rejected. Furthermore, p53 core domain structural alteration secondary to missense point mutation is not functionally equivalent to a p53-null mutation. The poor prognosis associated with p53-null mutation is independent of the mutation mechanism.

  20. Identification and characterization of ARS-like sequences as putative origin(s) of replication in human malaria parasite Plasmodium falciparum.

    PubMed

    Agarwal, Meetu; Bhowmick, Krishanu; Shah, Kushal; Krishnamachari, Annangarachari; Dhar, Suman Kumar

    2017-08-01

    DNA replication is a fundamental process in genome maintenance, and initiates from several genomic sites (origins) in eukaryotes. In Saccharomyces cerevisiae, conserved sequences known as autonomously replicating sequences (ARSs) provide a landing pad for the origin recognition complex (ORC), leading to replication initiation. Although origins from higher eukaryotes share some common sequence features, the definitive genomic organization of these sites remains elusive. The human malaria parasite Plasmodium falciparum undergoes multiple rounds of DNA replication; therefore, control of initiation events is crucial to ensure proper replication. However, the sites of DNA replication initiation and the mechanism by which replication is initiated are poorly understood. Here, we have identified and characterized putative origins in P. falciparum by bioinformatics analyses and experimental approaches. An autocorrelation measure method was initially used to search for regions with marked fluctuation (dips) in the chromosome, which we hypothesized might contain potential origins. Indeed, S. cerevisiae ARS consensus sequences were found in dip regions. Several of these P. falciparum sequences were validated with chromatin immunoprecipitation-quantitative PCR, nascent strand abundance and a plasmid stability assay. Subsequently, the same sequences were used in yeast to confirm their potential as origins in vivo. Our results identify the presence of functional ARSs in P. falciparum and provide meaningful insights into replication origins in these deadly parasites. These data could be useful in designing transgenic vectors with improved stability for transfection in P. falciparum. © 2017 Federation of European Biochemical Societies.

  1. Functionally conserved cis-regulatory elements of COL18A1 identified through zebrafish transgenesis.

    PubMed

    Kague, Erika; Bessling, Seneca L; Lee, Josephine; Hu, Gui; Passos-Bueno, Maria Rita; Fisher, Shannon

    2010-01-15

    Type XVIII collagen is a component of basement membranes, and expressed prominently in the eye, blood vessels, liver, and the central nervous system. Homozygous mutations in COL18A1 lead to Knobloch Syndrome, characterized by ocular defects and occipital encephalocele. However, relatively little has been described on the role of type XVIII collagen in development, and nothing is known about the regulation of its tissue-specific expression pattern. We have used zebrafish transgenesis to identify and characterize cis-regulatory sequences controlling expression of the human gene. Candidate enhancers were selected from non-coding sequence associated with COL18A1 based on sequence conservation among mammals. Although these displayed no overt conservation with orthologous zebrafish sequences, four regions nonetheless acted as tissue-specific transcriptional enhancers in the zebrafish embryo, and together recapitulated the major aspects of col18a1 expression. Additional post-hoc computational analysis on positive enhancer sequences revealed alignments between mammalian and teleost sequences, which we hypothesize predict the corresponding zebrafish enhancers; for one of these, we demonstrate functional overlap with the orthologous human enhancer sequence. Our results provide important insight into the biological function and regulation of COL18A1, and point to additional sequences that may contribute to complex diseases involving COL18A1. More generally, we show that combining functional data with targeted analyses for phylogenetic conservation can reveal conserved cis-regulatory elements in the large number of cases where computational alignment alone falls short. Copyright 2009 Elsevier Inc. All rights reserved.

  2. Coiled-coil length: Size does matter.

    PubMed

    Surkont, Jaroslaw; Diekmann, Yoan; Ryder, Pearl V; Pereira-Leal, Jose B

    2015-12-01

    Protein evolution is governed by processes that alter primary sequence but also the length of proteins. Protein length may change in different ways, but insertions, deletions and duplications are the most common. An optimal protein size is a trade-off between sequence extension, which may change protein stability or lead to acquisition of a new function, and shrinkage that decreases metabolic cost of protein synthesis. Despite the general tendency for length conservation across orthologous proteins, the propensity to accept insertions and deletions is heterogeneous along the sequence. For example, protein regions rich in repetitive peptide motifs are well known to extensively vary their length across species. Here, we analyze length conservation of coiled-coils, domains formed by an ubiquitous, repetitive peptide motif present in all domains of life, that frequently plays a structural role in the cell. We observed that, despite the repetitive nature, the length of coiled-coil domains is generally highly conserved throughout the tree of life, even when the remaining parts of the protein change, including globular domains. Length conservation is independent of primary amino acid sequence variation, and represents a conservation of domain physical size. This suggests that the conservation of domain size is due to functional constraints. © 2015 Wiley Periodicals, Inc.

  3. Protein Sectors: Statistical Coupling Analysis versus Conservation

    PubMed Central

    Teşileanu, Tiberiu; Colwell, Lucy J.; Leibler, Stanislas

    2015-01-01

    Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed “sectors”. The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation. PMID:25723535

  4. Mapping the transcription start points of the Staphylococcus aureus eap, emp, and vwb promoters reveals a conserved octanucleotide sequence that is essential for expression of these genes.

    PubMed

    Harraghy, Niamh; Homerova, Dagmar; Herrmann, Mathias; Kormanec, Jan

    2008-01-01

    Mapping the transcription start points of the eap, emp, and vwb promoters revealed a conserved octanucleotide sequence (COS). Deleting this sequence abolished the expression of eap, emp, and vwb. However, electrophoretic mobility shift assays gave no evidence that this sequence was a binding site for SarA or SaeR, known regulators of eap and emp.

  5. Discovery and profiling of novel and conserved microRNAs during flower development in Carya cathayensis via deep sequencing.

    PubMed

    Wang, Zheng Jia; Huang, Jian Qin; Huang, You Jun; Li, Zheng; Zheng, Bing Song

    2012-08-01

    Hickory (Carya cathayensis Sarg.) is an economically important woody plant in China, but its long juvenile phase delays yield. MicroRNAs (miRNAs) are critical regulators of genes and important for normal plant development and physiology, including flower development. We used Solexa technology to sequence two small RNA libraries from two floral differentiation stages in hickory to identify miRNAs related to flower development. We identified 39 conserved miRNA sequences from 114 loci belonging to 23 families as well as two novel and ten potential novel miRNAs belonging to nine families. Moreover, 35 conserved miRNA*s and two novel miRNA*s were detected. Twenty miRNA sequences from 49 loci belonging to 11 families were differentially expressed; all were up-regulated at the later stage of flower development in hickory. Quantitative real-time PCR of 12 conserved miRNA sequences, five novel miRNA families, and two novel miRNA*s validated that all were expressed during hickory flower development, and the expression patterns were similar to those detected with Solexa sequencing. Finally, a total of 146 targets of the novel and conserved miRNAs were predicted. This study identified a diverse set of miRNAs that were closely related to hickory flower development and that could help in plant floral induction.

  6. Evolutionary growth process of highly conserved sequences in vertebrate genomes.

    PubMed

    Ishibashi, Minaka; Noda, Akiko Ogura; Sakate, Ryuichi; Imanishi, Tadashi

    2012-08-01

    Genome sequence comparison between evolutionarily distant species revealed ultraconserved elements (UCEs) among mammals under strong purifying selection. Most of them were also conserved among vertebrates. Because they tend to be located in the flanking regions of developmental genes, they would have fundamental roles in creating vertebrate body plans. However, the evolutionary origin and selection mechanism of these UCEs remain unclear. Here we report that UCEs arose in primitive vertebrates, and gradually grew in vertebrate evolution. We searched for UCEs in two teleost fishes, Tetraodon nigroviridis and Oryzias latipes, and found 554 UCEs with 100% identity over 100 bps. Comparison of teleost and mammalian UCEs revealed 43 pairs of common, jawed-vertebrate UCEs (jUCE) with high sequence identities, ranging from 83.1% to 99.2%. Ten of them retain lower similarities to the Petromyzon marinus genome, and the substitution rates of four non-exonic jUCEs were reduced after the teleost-mammal divergence, suggesting that robust conservation had been acquired in the jawed vertebrate lineage. Our results indicate that prototypical UCEs originated before the divergence of jawed and jawless vertebrates and have been frozen as perfect conserved sequences in the jawed vertebrate lineage. In addition, our comparative sequence analyses of UCEs and neighboring regions resulted in a discovery of lineage-specific conserved sequences. They were added progressively to prototypical UCEs, suggesting step-wise acquisition of novel regulatory roles. Our results indicate that conserved non-coding elements (CNEs) consist of blocks with distinct evolutionary history, each having been frozen since different evolutionary era along the vertebrate lineage. Copyright © 2012 Elsevier B.V. All rights reserved.

  7. cisprimertool: software to implement a comparative genomics strategy for the development of conserved intron scanning (CIS) markers.

    PubMed

    Jayashree, B; Jagadeesh, V T; Hoisington, D

    2008-05-01

    The availability of complete, annotated genomic sequence information in model organisms is a rich resource that can be extended to understudied orphan crops through comparative genomic approaches. We report here a software tool (cisprimertool) for the identification of conserved intron scanning regions using expressed sequence tag alignments to a completely sequenced model crop genome. The method used is based on earlier studies reporting the assessment of conserved intron scanning primers (called CISP) within relatively conserved exons located near exon-intron boundaries from onion, banana, sorghum and pearl millet alignments with rice. The tool is freely available to academic users at http://www.icrisat.org/gt-bt/CISPTool.htm. © 2007 ICRISAT.

  8. Conserved structures formed by heterogeneous RNA sequences drive silencing of an inflammation responsive post-transcriptional operon

    PubMed Central

    Basu, Abhijit; Jain, Niyati; Tolbert, Blanton S.; Komar, Anton A.

    2017-01-01

    Abstract RNA–protein interactions with physiological outcomes usually rely on conserved sequences within the RNA element. By contrast, activity of the diverse gamma-interferon-activated inhibitor of translation (GAIT)-elements relies on the conserved RNA folding motifs rather than the conserved sequence motifs. These elements drive the translational silencing of a group of chemokine (CC/CXC) and chemokine receptor (CCR) mRNAs, thereby helping to resolve physiological inflammation. Despite sequence dissimilarity, these RNA elements adopt common secondary structures (as revealed by 2D-1H NMR spectroscopy), providing a basis for their interaction with the RNA-binding GAIT complex. However, many of these elements (e.g. those derived from CCL22, CXCL13, CCR4 and ceruloplasmin (Cp) mRNAs) have substantially different affinities for GAIT complex binding. Toeprinting analysis shows that different positions within the overall conserved GAIT element structure contribute to differential affinities of the GAIT protein complex towards the elements. Thus, heterogeneity of GAIT elements may provide hierarchical fine-tuning of the resolution of inflammation. PMID:29069516

  9. Early Evolution of Conserved Regulatory Sequences Associated with Development in Vertebrates

    PubMed Central

    McEwen, Gayle K.; Goode, Debbie K.; Parker, Hugo J.; Woolfe, Adam; Callaway, Heather; Elgar, Greg

    2009-01-01

    Comparisons between diverse vertebrate genomes have uncovered thousands of highly conserved non-coding sequences, an increasing number of which have been shown to function as enhancers during early development. Despite their extreme conservation over 500 million years from humans to cartilaginous fish, these elements appear to be largely absent in invertebrates, and, to date, there has been little understanding of their mode of action or the evolutionary processes that have modelled them. We have now exploited emerging genomic sequence data for the sea lamprey, Petromyzon marinus, to explore the depth of conservation of this type of element in the earliest diverging extant vertebrate lineage, the jawless fish (agnathans). We searched for conserved non-coding elements (CNEs) at 13 human gene loci and identified lamprey elements associated with all but two of these gene regions. Although markedly shorter and less well conserved than within jawed vertebrates, identified lamprey CNEs are able to drive specific patterns of expression in zebrafish embryos, which are almost identical to those driven by the equivalent human elements. These CNEs are therefore a unique and defining characteristic of all vertebrates. Furthermore, alignment of lamprey and other vertebrate CNEs should permit the identification of persistent sequence signatures that are responsible for common patterns of expression and contribute to the elucidation of the regulatory language in CNEs. Identifying the core regulatory code for development, common to all vertebrates, provides a foundation upon which regulatory networks can be constructed and might also illuminate how large conserved regulatory sequence blocks evolve and become fixed in genomic DNA. PMID:20011110

  10. Phylogenetic analysis reveals conservation and diversification of micro RNA166 genes among diverse plant species.

    PubMed

    Barik, Suvakanta; SarkarDas, Shabari; Singh, Archita; Gautam, Vibhav; Kumar, Pramod; Majee, Manoj; Sarkar, Ananda K

    2014-01-01

    Similar to the majority of the microRNAs, mature miR166s are derived from multiple members of MIR166 genes (precursors) and regulate various aspects of plant development by negatively regulating their target genes (Class III HD-ZIP). The evolutionary conservation or functional diversification of miRNA166 family members remains elusive. Here, we show the phylogenetic relationships among MIR166 precursor and mature sequences from three diverse model plant species. Despite strong conservation, some mature miR166 sequences, such as ppt-miR166m, have undergone sequence variation. Critical sequence variation in ppt-miR166m has led to functional diversification, as it targets non-HD-ZIPIII gene transcript (s). MIR166 precursor sequences have diverged in a lineage specific manner, and both precursors and mature osa-miR166i/j are highly conserved. Interestingly, polycistronic MIR166s were present in Physcomitrella and Oryza but not in Arabidopsis. The nature of cis-regulatory motifs on the upstream promoter sequences of MIR166 genes indicates their possible contribution to the functional variation observed among miR166 species. Copyright © 2013 Elsevier Inc. All rights reserved.

  11. SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments.

    PubMed

    Ajawatanawong, Pravech; Atkinson, Gemma C; Watson-Haigh, Nathan S; Mackenzie, Bryony; Baldauf, Sandra L

    2012-07-01

    Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at www.seqfire.org/.

  12. Systems genetics identifies a convergent gene network for cognition and neurodevelopmental disease.

    PubMed

    Johnson, Michael R; Shkura, Kirill; Langley, Sarah R; Delahaye-Duriez, Andree; Srivastava, Prashant; Hill, W David; Rackham, Owen J L; Davies, Gail; Harris, Sarah E; Moreno-Moral, Aida; Rotival, Maxime; Speed, Doug; Petrovski, Slavé; Katz, Anaïs; Hayward, Caroline; Porteous, David J; Smith, Blair H; Padmanabhan, Sandosh; Hocking, Lynne J; Starr, John M; Liewald, David C; Visconti, Alessia; Falchi, Mario; Bottolo, Leonardo; Rossetti, Tiziana; Danis, Bénédicte; Mazzuferi, Manuela; Foerch, Patrik; Grote, Alexander; Helmstaedter, Christoph; Becker, Albert J; Kaminski, Rafal M; Deary, Ian J; Petretto, Enrico

    2016-02-01

    Genetic determinants of cognition are poorly characterized, and their relationship to genes that confer risk for neurodevelopmental disease is unclear. Here we performed a systems-level analysis of genome-wide gene expression data to infer gene-regulatory networks conserved across species and brain regions. Two of these networks, M1 and M3, showed replicable enrichment for common genetic variants underlying healthy human cognitive abilities, including memory. Using exome sequence data from 6,871 trios, we found that M3 genes were also enriched for mutations ascertained from patients with neurodevelopmental disease generally, and intellectual disability and epileptic encephalopathy in particular. M3 consists of 150 genes whose expression is tightly developmentally regulated, but which are collectively poorly annotated for known functional pathways. These results illustrate how systems-level analyses can reveal previously unappreciated relationships between neurodevelopmental disease-associated genes in the developed human brain, and provide empirical support for a convergent gene-regulatory network influencing cognition and neurodevelopmental disease.

  13. Analysis of phytoplasma-responsive sRNAs provide insight into the pathogenic mechanisms of mulberry yellow dwarf disease

    NASA Astrophysics Data System (ADS)

    Gai, Ying-Ping; Li, Yi-Qun; Guo, Fang-Yue; Yuan, Chuan-Zhong; Mo, Yao-Yao; Zhang, Hua-Liang; Wang, Hong; Ji, Xian-Ling

    2014-06-01

    The yellow dwarf disease associated with phytoplasmas is one of the most devastating diseases of mulberry and the pathogenesis involved in the disease is poorly understood. To analyze the molecular mechanisms mediating gene expression in mulberry-phytoplasma interaction, the comprehensive sRNA changes of mulberry leaf in response to phytoplasma-infection were examined. A total of 164 conserved miRNAs and 23 novel miRNAs were identified, and 62 conserved miRNAs and 13 novel miRNAs were found to be involved in the response to phytoplasma-infection. Meanwhile, target genes of the responsive miRNAs were identified by sequencing of the degradome library. In addition, the endogenous siRNAs were sequenced, and their expression profiles were characterized. Interestingly, we found that phytoplasma infection induced the accumulation of mul-miR393-5p which was resulted from the increased transcription of MulMIR393A, and mul-miR393-5p most likely initiate the biogenesis of siRNAs from TIR1 transcript. Based on the results, we can conclude that phytoplasma-responsive sRNAs modulate multiple hormone pathways and play crucial roles in the regulation of development and metabolism. These responsive sRNAs may work cooperatively in the response to phytoplasma-infection and be responsible for some symptoms in the infected plants.

  14. Principles of regulatory information conservation between mouse and human.

    PubMed

    Cheng, Yong; Ma, Zhihai; Kim, Bong-Hyun; Wu, Weisheng; Cayting, Philip; Boyle, Alan P; Sundaram, Vasavi; Xing, Xiaoyun; Dogan, Nergiz; Li, Jingjing; Euskirchen, Ghia; Lin, Shin; Lin, Yiing; Visel, Axel; Kawli, Trupti; Yang, Xinqiong; Patacsil, Dorrelyn; Keller, Cheryl A; Giardine, Belinda; Kundaje, Anshul; Wang, Ting; Pennacchio, Len A; Weng, Zhiping; Hardison, Ross C; Snyder, Michael P

    2014-11-20

    To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.

  15. Functionally essential, invariant glutamate near the C-terminus of strand beta 5 in various (alpha/beta)8-barrel enzymes as a possible indicator of their evolutionary relatedness.

    PubMed

    Janecek, S; Baláz, S

    1995-08-01

    Twelve different (alpha/beta)8-barrel enzymes belonging to three structurally distinct families were found to contain, near the C-terminus of their strand beta 5, a conserved invariant glutamic acid residue that plays an important functional role in each of these enzymes. The search was based on the idea that a conserved sequence region of an (alpha/beta)8-barrel enzyme should be more or less conserved also in the equivalent part of the structure of the other enzymes with this folding motif owing to their mutual evolutionary relatedness. For this purpose, the sequence region around the well conserved fifth beta-strand of alpha-amylase containing catalytic glutamate (Glu230, Aspergillus oryzae alpha-amylase numbering), was used as the sequence-structural template. The isolated sequence stretches of the 12 (alpha/beta)8-barrels are discussed from both the sequence-structural and the evolutionary point of view, the invariant glutamate residue being proposed to be a joining feature of the studied group of enzymes remaining from their ancestral (alpha/beta)8-barrel.

  16. Protocol to obtain targeted transcript sequence data from snake venom samples collected in the Colombian field.

    PubMed

    Fonseca, Alejandra; Renjifo-Ibáñez, Camila; Renjifo, Juan Manuel; Cabrera, Rodrigo

    2018-03-21

    Snake venoms are a mixture of different molecules that can be used in the design of drugs for various diseases. The study of these venoms has relied on strategies that use complete venom extracted from animals in captivity or from venom glands that require the sacrifice of the animals. Colombia, a country with political and geographical conflicts has difficult access to certain regions. A strategy that can prevent the sacrifice of animals and could allow the study of samples collected in the field is necessary. We report the use of lyophilized venom from Crotalus durissus cumanensis as a model to test, for the first time, a protocol for the amplification of complete toxins from Colombian venom samples collected in the field. In this protocol, primers were designed from conserved region from Crotalus sp. mRNA and EST regions to maximize the likelihood of coding sequence amplification. We obtained the sequences of Metalloproteinases II, Disintegrins, Disintegrin-Like, Phospholipases A 2, C-type Lectins and Serine proteinases from Crotalus durissus cumanensis and compared them to different Crotalus sp sequences available on databases obtaining concordance between the toxins amplified and those reported. Our strategy allows the use of lyophilized venom to obtain complete toxin sequences from samples collected in the field and the study of poorly characterized venoms in challenging environments. Copyright © 2018 Elsevier Ltd. All rights reserved.

  17. [Soil propagule bank of ectomycorrhizal fungi in natural forest of Pinus bungeana].

    PubMed

    Zhao, Nan Xing; Han, Qi Sheng; Huang, Jian

    2017-12-01

    To conserve and restore the forest of Pinu bungeana, we investigated the soil propagule bank of ectomycorrhizal (ECM) fungi in a severely disturbed natural forest of P. bungeana in Shaanxi Province, China. We used a seedling-bioassay method to bait the ECM fungal propagules in the soils collected from the forest site. ECM was identified by combining morph typing with ITS-PCR-sequencing. We obtained 73 unique sequences from the ECM associated with P. bungeana seedlings, and assigned them into 12 ECM fungal OTUs at the threshold of 97% based on the sequence similarity. Rarefaction curve displayed almost all ECM fungi in the propagule bank were detected. The most frequent OTU (80%) showed poor similarity (75%) with existing sequences in the online database, which suggested it might be a new species. Cenococcum geophilum, Tomentella sp., Tuber sp. were common species in the propagule bank. Although C. geophilum and Tomentella sp. were frequently detected in other soil propagule banks of pine forest, the most frequent OTU was not assigned to known genus or family, which indicated the host-specif of ECM propagule banks associa-ted with P. bungeana. This result confirmed the importance of the special ECM propagule banks associated with P. bungeana for natural forest restoration.

  18. EffectorP: predicting fungal effector proteins from secretomes using machine learning.

    PubMed

    Sperschneider, Jana; Gardiner, Donald M; Dodds, Peter N; Tini, Francesco; Covarelli, Lorenzo; Singh, Karam B; Manners, John M; Taylor, Jennifer M

    2016-04-01

    Eukaryotic filamentous plant pathogens secrete effector proteins that modulate the host cell to facilitate infection. Computational effector candidate identification and subsequent functional characterization delivers valuable insights into plant-pathogen interactions. However, effector prediction in fungi has been challenging due to a lack of unifying sequence features such as conserved N-terminal sequence motifs. Fungal effectors are commonly predicted from secretomes based on criteria such as small size and cysteine-rich, which suffers from poor accuracy. We present EffectorP which pioneers the application of machine learning to fungal effector prediction. EffectorP improves fungal effector prediction from secretomes based on a robust signal of sequence-derived properties, achieving sensitivity and specificity of over 80%. Features that discriminate fungal effectors from secreted noneffectors are predominantly sequence length, molecular weight and protein net charge, as well as cysteine, serine and tryptophan content. We demonstrate that EffectorP is powerful when combined with in planta expression data for predicting high-priority effector candidates. EffectorP is the first prediction program for fungal effectors based on machine learning. Our findings will facilitate functional fungal effector studies and improve our understanding of effectors in plant-pathogen interactions. EffectorP is available at http://effectorp.csiro.au. © 2015 CSIRO New Phytologist © 2015 New Phytologist Trust.

  19. Comparative analysis of the small RNA transcriptomes of Pinus contorta and Oryza sativa

    PubMed Central

    Morin, Ryan D.; Aksay, Gozde; Dolgosheina, Elena; Ebhardt, H. Alexander; Magrini, Vincent; Mardis, Elaine R.; Sahinalp, S. Cenk; Unrau, Peter J.

    2008-01-01

    The diversity of microRNAs and small-interfering RNAs has been extensively explored within angiosperms by focusing on a few key organisms such as Oryza sativa and Arabidopsis thaliana. A deeper division of the plants is defined by the radiation of the angiosperms and gymnosperms, with the latter comprising the commercially important conifers. The conifers are expected to provide important information regarding the evolution of highly conserved small regulatory RNAs. Deep sequencing provides the means to characterize and quantitatively profile small RNAs in understudied organisms such as these. Pyrosequencing of small RNAs from O. sativa revealed, as expected, ∼21- and ∼24-nt RNAs. The former contained known microRNAs, and the latter largely comprised intergenic-derived sequences likely representing heterochromatin siRNAs. In contrast, sequences from Pinus contorta were dominated by 21-nt small RNAs. Using a novel sequence-based clustering algorithm, we identified sequences belonging to 18 highly conserved microRNA families in P. contorta as well as numerous clusters of conserved small RNAs of unknown function. Using multiple methods, including expressed sequence folding and machine learning algorithms, we found a further 53 candidate novel microRNA families, 51 appearing specific to the P. contorta library. In addition, alignment of small RNA sequences to the O. sativa genome revealed six perfectly conserved classes of small RNA that included chloroplast transcripts and specific types of genomic repeats. The conservation of microRNAs and other small RNAs between the conifers and the angiosperms indicates that important RNA silencing processes were highly developed in the earliest spermatophytes. Genomic mapping of all sequences to the O. sativa genome can be viewed at http://microrna.bcgsc.ca/cgi-bin/gbrowse/rice_build_3/. PMID:18323537

  20. Structural and evolutionary adaptation of rhoptry kinases and pseudokinases, a family of coccidian virulence factors

    PubMed Central

    2013-01-01

    Background The widespread protozoan parasite Toxoplasma gondii interferes with host cell functions by exporting the contents of a unique apical organelle, the rhoptry. Among the mix of secreted proteins are an expanded, lineage-specific family of protein kinases termed rhoptry kinases (ROPKs), several of which have been shown to be key virulence factors, including the pseudokinase ROP5. The extent and details of the diversification of this protein family are poorly understood. Results In this study, we comprehensively catalogued the ROPK family in the genomes of Toxoplasma gondii, Neospora caninum and Eimeria tenella, as well as portions of the unfinished genome of Sarcocystis neurona, and classified the identified genes into 42 distinct subfamilies. We systematically compared the rhoptry kinase protein sequences and structures to each other and to the broader superfamily of eukaryotic protein kinases to study the patterns of diversification and neofunctionalization in the ROPK family and its subfamilies. We identified three ROPK sub-clades of particular interest: those bearing a structurally conserved N-terminal extension to the kinase domain (NTE), an E. tenella-specific expansion, and a basal cluster including ROP35 and BPK1 that we term ROPKL. Structural analysis in light of the solved structures ROP2, ROP5, ROP8 and in comparison to typical eukaryotic protein kinases revealed ROPK-specific conservation patterns in two key regions of the kinase domain, surrounding a ROPK-conserved insert in the kinase hinge region and a disulfide bridge in the kinase substrate-binding lobe. We also examined conservation patterns specific to the NTE-bearing clade. We discuss the possible functional consequences of each. Conclusions Our work sheds light on several important but previously unrecognized features shared among rhoptry kinases, as well as the essential differences between active and degenerate protein kinases. We identify the most distinctive ROPK-specific features conserved across both active kinases and pseudokinases, and discuss these in terms of sequence motifs, evolutionary context, structural impact and potential functional relevance. By characterizing the proteins that enable these parasites to invade the host cell and co-opt its signaling mechanisms, we provide guidance on potential therapeutic targets for the diseases caused by coccidian parasites. PMID:23742205

  1. Genome sequences of a mouse-avirulent and a mouse-virulent strain of Ross River virus.

    PubMed

    Faragher, S G; Meek, A D; Rice, C M; Dalgarno, L

    1988-04-01

    The nucleotide sequence of the genomic RNA of a mouse-avirulent strain of Ross River virus, RRV NB5092 (isolated in 1969), has been determined and the corresponding sequence for the prototype mouse-virulent strain, RRV T48 (isolated in 1959), has been completed. The RRV NB5092 genome is approximately 11,674 nucleotides in length, compared with 11,853 nucleotides for RRV T48. RRV NB5092 and RRV T48 have the same genome organization. For both viruses an untranslated region of 80 nucleotides at the 5' end of the genome is followed by a 7440-nucleotide open reading frame which is interrupted after 5586 nucleotides by a single opal termination codon. By homology with other alphaviruses, the 5586-nucleotide open reading frame encodes the nonstructural proteins nsP1, nsP2, and nsP3; a fourth nonstructural protein, nsP4, is produced by read-through of the opal codon. The RRV nonstructural proteins show strong homology with the corresponding proteins of Sindbis virus and Semliki Forest virus in terms of size, net charge, and hydropathy characteristics. However, homology is not uniform between or within the proteins; nsP1, nsP2, and nsP4 contain extended domains which are highly conserved between alphaviruses, while the C-terminal region of nsP3 shows little conservation in sequence or length between alphaviruses. An untranslated "junction" region of 44 nucleotides (for RRV NB5092) or 47 nucleotides (for RRV T48) separates the nonstructural and structural protein coding regions. The structural proteins (capsid-E3-E2-6K-E1) are translated from an open reading frame of 3762 nucleotides which is followed by a 3'-untranslated region of approximately 348 nucleotides (for RRV NB5092) or 524 nucleotides (for RRV T48). Excluding deletions and insertions, the genomes of RRV NB5092 and RRV T48 differ at 284 nucleotides, representing a sequence divergence of 2.38%. Sequence deletions or insertions were found only in the noncoding regions and include a 173-nucleotide deletion in the 3'-untranslated region of RRV NB5092, compared with RRV T48. In the coding regions, most of the nucleotide differences are silent; there are 36 amino acid differences in the nonstructural proteins and 12 in the structural proteins. The distribution of amino acid differences between the two RRV strains correlates with the location of domains which are poorly conserved in sequence between alphaviruses. The possible role of amino acid differences in envelope glycoproteins E1 and E2 in determining the different antigenic and biological properties of RRV NB5092 and RRV T48 is discussed.

  2. Differences in fat and muscle mass associated with a functional human polymorphism in a post-transcriptional BMP2 gene regulatory element.

    PubMed

    Devaney, Joseph M; Tosi, Laura L; Fritz, David T; Gordish-Dressman, Heather A; Jiang, Shan; Orkunoglu-Suer, Funda E; Gordon, Andrew H; Harmon, Brennan T; Thompson, Paul D; Clarkson, Priscilla M; Angelopoulos, Theodore J; Gordon, Paul M; Moyna, Niall M; Pescatello, Linda S; Visich, Paul S; Zoeller, Robert F; Brandoli, Cinzia; Hoffman, Eric P; Rogers, Melissa B

    2009-08-15

    A classic morphogen, bone morphogenetic protein 2 (BMP2) regulates the differentiation of pluripotent mesenchymal cells. High BMP2 levels promote osteogenesis or chondrogenesis and low levels promote adipogenesis. BMP2 inhibits myogenesis. Thus, BMP2 synthesis is tightly controlled. Several hundred nucleotides within the 3' untranslated regions of BMP2 genes are conserved from mammals to fishes indicating that the region is under stringent selective pressure. Our analyses indicate that this region controls BMP2 synthesis by post-transcriptional mechanisms. A common A to C single nucleotide polymorphism (SNP) in the BMP2 gene (rs15705, +A1123C) disrupts a putative post-transcriptional regulatory motif within the human ultra-conserved sequence. In vitro studies indicate that RNAs bearing the A or C alleles have different protein binding characteristics in extracts from mesenchymal cells. Reporter genes with the C allele of the ultra-conserved sequence were differentially expressed in mesenchymal cells. Finally, we analyzed MRI data from the upper arm of 517 healthy individuals aged 18-41 years. Individuals with the C/C genotype were associated with lower baseline subcutaneous fat volumes (P = 0.0030) and an increased gain in skeletal muscle volume (P = 0.0060) following resistance training in a cohort of young males. The rs15705 SNP explained 2-4% of inter-individual variability in the measured parameters. The rs15705 variant is one of the first genetic markers that may be exploited to facilitate early diagnosis, treatment, and/or prevention of diseases associated with poor fitness. Furthermore, understanding the mechanisms by which regulatory polymorphisms influence BMP2 synthesis will reveal novel pharmaceutical targets for these disabling conditions. (c) 2009 Wiley-Liss, Inc.

  3. Differences in Fat and Muscle Mass Associated With a Functional Human Polymorphism in a Post-Transcriptional BMP2 Gene Regulatory Element

    PubMed Central

    Devaney, Joseph M.; Tosi, Laura L.; Fritz, David T.; Gordish-Dressman, Heather A.; Jiang, Shan; Orkunoglu-Suer, Funda E.; Gordon, Andrew H.; Harmon, Brennan T.; Thompson, Paul D.; Clarkson, Priscilla M.; Angelopoulos, Theodore J.; Gordon, Paul M.; Moyna, Niall M.; Pescatello, Linda S.; Visich, Paul S.; Zoeller, Robert F.; Brandoli, Cinzia; Hoffman, Eric P.; Rogers, Melissa B.

    2014-01-01

    A classic morphogen, bone morphogenetic protein 2 (BMP2) regulates the differentiation of pluripotent mesenchymal cells. High BMP2 levels promote osteogenesis or chondrogenesis and low levels promote adipogenesis. BMP2 inhibits myogenesis. Thus, BMP2 synthesis is tightly controlled. Several hundred nucleotides within the 3′ untranslated regions of BMP2 genes are conserved from mammals to fishes indicating that the region is under stringent selective pressure. Our analyses indicate that this region controls BMP2 synthesis by post-transcriptional mechanisms. A common A to C single nucleotide polymorphism (SNP) in the BMP2 gene (rs15705, +A1123C) disrupts a putative post-transcriptional regulatory motif within the human ultra-conserved sequence. In vitro studies indicate that RNAs bearing the A or C alleles have different protein binding characteristics in extracts from mesenchymal cells. Reporter genes with the C allele of the ultra-conserved sequence were differentially expressed in mesenchymal cells. Finally, we analyzed MRI data from the upper arm of 517 healthy individuals aged 18–41 years. Individuals with the C/C genotype were associated with lower baseline subcutaneous fat volumes (P = 0.0030) and an increased gain in skeletal muscle volume (P = 0.0060) following resistance training in a cohort of young males. The rs15705 SNP explained 2–4% of inter-individual variability in the measured parameters. The rs15705 variant is one of the first genetic markers that maybe exploited to facilitate early diagnosis, treatment, and/or prevention of diseases associated with poor fitness. Furthermore, understanding the mechanisms by which regulatory polymorphisms influence BMP2 synthesis will reveal novel pharmaceutical targets for these disabling conditions. PMID:19492344

  4. Computational identification of developmental enhancers:conservation and function of transcription factor binding-site clustersin drosophila melanogaster and drosophila psedoobscura

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berman, Benjamin P.; Pfeiffer, Barret D.; Laverty, Todd R.

    2004-08-06

    The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene, and assayedmore » embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Measuring conservation of sequence features closely linked to function--such as binding-site clustering--makes better use of comparative sequence data than commonly used methods that examine only sequence identity.« less

  5. Alternative Polyadenylation Regulates CELF1/CUGBP1 Target Transcripts Following T Cell Activation

    PubMed Central

    Beisang, Daniel; Reilly, Cavan; Bohjanen, Paul R.

    2014-01-01

    Alternative polyadenylation (APA) is an evolutionarily conserved mechanism for regulating gene expression. Transcript 3′ end shortening through changes in polyadenylation site usage occurs following T cell activation, but the consequences of APA on gene expression are poorly understood. We previously showed that GU-rich elements (GREs) found in the 3′ untranslated regions of select transcripts mediate rapid mRNA decay by recruiting the protein CELF1/CUGBP1. Using a global RNA sequencing approach, we found that a network of CELF1 target transcripts involved in cell division underwent preferential 3′ end shortening via APA following T cell activation, resulting in decreased inclusion of CELF1 binding sites and increased transcript expression. We present a model whereby CELF1 regulates APA site selection following T cell activation through reversible binding to nearby GRE sequences. These findings provide insight into the role of APA in controlling cellular proliferation during biological processes such as development, oncogenesis and T cell activation PMID:25123787

  6. Genomic analysis reveals hidden biodiversity within colugos, the sister group to primates

    PubMed Central

    Mason, Victor C.; Li, Gang; Minx, Patrick; Schmitz, Jürgen; Churakov, Gennady; Doronina, Liliya; Melin, Amanda D.; Dominy, Nathaniel J.; Lim, Norman T-L.; Springer, Mark S.; Wilson, Richard K.; Warren, Wesley C.; Helgen, Kristofer M.; Murphy, William J.

    2016-01-01

    Colugos are among the most poorly studied mammals despite their centrality to resolving supraordinal primate relationships. Two described species of these gliding mammals are the sole living members of the order Dermoptera, distributed throughout Southeast Asia. We generated a draft genome sequence for a Sunda colugo and a Philippine colugo reference alignment, and used these to identify colugo-specific genetic changes that were enriched in sensory and musculoskeletal-related genes that likely underlie their nocturnal and gliding adaptations. Phylogenomic analysis and catalogs of rare genomic changes overwhelmingly support the contested hypothesis that colugos are the sister group to primates (Primatomorpha), to the exclusion of treeshrews. We captured ~140 kb of orthologous sequence data from colugo museum specimens sampled across their range and identified large genetic differences between many geographically isolated populations that may result in a >300% increase in the number of recognized colugo species. Our results identify conservation units to mitigate future losses of this enigmatic mammalian order. PMID:27532052

  7. The Arab genome: Health and wealth.

    PubMed

    Zayed, Hatem

    2016-11-05

    The 22 Arab nations have a unique genetic structure, which reflects both conserved and diverse gene pools due to the prevalent endogamous and consanguineous marriage culture and the long history of admixture among different ethnic subcultures descended from the Asian, European, and African continents. Human genome sequencing has enabled large-scale genomic studies of different populations and has become a powerful tool for studying disease predictions and diagnosis. Despite the importance of the Arab genome for better understanding the dynamics of the human genome, discovering rare genetic variations, and studying early human migration out of Africa, it is poorly represented in human genome databases, such as HapMap and the 1000 Genomes Project. In this review, I demonstrate the significance of sequencing the Arab genome and setting an Arab genome reference(s) for better understanding the molecular pathogenesis of genetic diseases, discovering novel/rare variants, and identifying a meaningful genotype-phenotype correlation for complex diseases. Copyright © 2016. Published by Elsevier B.V.

  8. The genome sequence of the model ascomycete fungus Podospora anserina

    PubMed Central

    Espagne, Eric; Lespinet, Olivier; Malagnac, Fabienne; Da Silva, Corinne; Jaillon, Olivier; Porcel, Betina M; Couloux, Arnaud; Aury, Jean-Marc; Ségurens, Béatrice; Poulain, Julie; Anthouard, Véronique; Grossetete, Sandrine; Khalili, Hamid; Coppin, Evelyne; Déquard-Chablat, Michelle; Picard, Marguerite; Contamine, Véronique; Arnaise, Sylvie; Bourdais, Anne; Berteaux-Lecellier, Véronique; Gautheret, Daniel; de Vries, Ronald P; Battaglia, Evy; Coutinho, Pedro M; Danchin, Etienne GJ; Henrissat, Bernard; Khoury, Riyad EL; Sainsard-Chanet, Annie; Boivin, Antoine; Pinan-Lucarré, Bérangère; Sellem, Carole H; Debuchy, Robert; Wincker, Patrick; Weissenbach, Jean; Silar, Philippe

    2008-01-01

    Background The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development. Results We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed sequence tag collection. Similar to higher eukaryotes, the P. anserina transcription/splicing machinery generates numerous non-conventional transcripts. Comparison of the P. anserina genome and orthologous gene set with the one of its close relatives, Neurospora crassa, shows that synteny is poorly conserved, the main result of evolution being gene shuffling in the same chromosome. The P. anserina genome contains fewer repeated sequences and has evolved new genes by duplication since its separation from N. crassa, despite the presence of the repeat induced point mutation mechanism that mutates duplicated sequences. We also provide evidence that frequent gene loss took place in the lineages leading to P. anserina and N. crassa. P. anserina contains a large and highly specialized set of genes involved in utilization of natural carbon sources commonly found in its natural biotope. It includes genes potentially involved in lignin degradation and efficient cellulose breakdown. Conclusion The features of the P. anserina genome indicate a highly dynamic evolution since the divergence of P. anserina and N. crassa, leading to the ability of the former to use specific complex carbon sources that match its needs in its natural biotope. PMID:18460219

  9. Conserved noncoding sequences (CNSs) in higher plants.

    PubMed

    Freeling, Michael; Subramaniam, Shabarinath

    2009-04-01

    Plant conserved noncoding sequences (CNSs)--a specific category of phylogenetic footprint--have been shown experimentally to function. No plant CNS is conserved to the extent that ultraconserved noncoding sequences are conserved in vertebrates. Plant CNSs are enriched in known transcription factor or other cis-acting binding sites, and are usually clustered around genes. Genes that encode transcription factors and/or those that respond to stimuli are particularly CNS-rich. Only rarely could this function involve small RNA binding. Some transcribed CNSs encode short translation products as a form of negative control. Approximately 4% of Arabidopsis gene content is estimated to be both CNS-rich and occupies a relatively long stretch of chromosome: Bigfoot genes (long phylogenetic footprints). We discuss a 'DNA-templated protein assembly' idea that might help explain Bigfoot gene CNSs.

  10. Genetic differences between blood- and brain-derived viral sequences from human immunodeficiency virus type 1-infected patients: evidence of conserved elements in the V3 region of the envelope protein of brain-derived sequences.

    PubMed Central

    Korber, B T; Kunstman, K J; Patterson, B K; Furtado, M; McEvilly, M M; Levy, R; Wolinsky, S M

    1994-01-01

    Human immunodeficiency virus type 1 (HIV-1) sequences were generated from blood and from brain tissue obtained by stereotactic biopsy from six patients undergoing a diagnostic neurosurgical procedure. Proviral DNA was directly amplified by nested PCR, and 8 to 36 clones from each sample were sequenced. Phylogenetic analysis of intrapatient envelope V3-V5 region HIV-1 DNA sequence sets revealed that brain viral sequences were clustered relative to the blood viral sequences, suggestive of tissue-specific compartmentalization of the virus in four of the six cases. In the other two cases, the blood and brain virus sequences were intermingled in the phylogenetic analyses, suggesting trafficking of virus between the two tissues. Slide-based PCR-driven in situ hybridization of two of the patients' brain biopsy samples confirmed our interpretation of the intrapatient phylogenetic analyses. Interpatient V3 region brain-derived sequence distances were significantly less than blood-derived sequence distances. Relative to the tip of the loop, the set of brain-derived viral sequences had a tendency towards negative or neutral charge compared with the set of blood-derived viral sequences. Entropy calculations were used as a measure of the variability at each position in alignments of blood and brain viral sequences. A relatively conserved set of positions were found, with a significantly lower entropy in the brain-than in the blood-derived viral sequences. These sites constitute a brain "signature pattern," or a noncontiguous set of amino acids in the V3 region conserved in viral sequences derived from brain tissue. This brain-derived signature pattern was also well preserved among isolates previously characterized in vitro as macrophage tropic. Macrophage-monocyte tropism may be the biological constraint that results in the conservation of the viral brain signature pattern. Images PMID:7933130

  11. The kinetoplast DNA of the Australian trypanosome, Trypanosoma copemani, shares features with Trypanosoma cruzi and Trypanosoma lewisi.

    PubMed

    Botero, Adriana; Kapeller, Irit; Cooper, Crystal; Clode, Peta L; Shlomai, Joseph; Thompson, R C Andrew

    2018-05-17

    Kinetoplast DNA (kDNA) is the mitochondrial genome of trypanosomatids. It consists of a few dozen maxicircles and several thousand minicircles, all catenated topologically to form a two-dimensional DNA network. Minicircles are heterogeneous in size and sequence among species. They present one or several conserved regions that contain three highly conserved sequence blocks. CSB-1 (10 bp sequence) and CSB-2 (8 bp sequence) present lower interspecies homology, while CSB-3 (12 bp sequence) or the Universal Minicircle Sequence is conserved within most trypanosomatids. The Universal Minicircle Sequence is located at the replication origin of the minicircles, and is the binding site for the UMS binding protein, a protein involved in trypanosomatid survival and virulence. Here, we describe the structure and organisation of the kDNA of Trypanosoma copemani, a parasite that has been shown to infect mammalian cells and has been associated with the drastic decline of the endangered Australian marsupial, the woylie (Bettongia penicillata). Deep genomic sequencing showed that T. copemani presents two classes of minicircles that share sequence identity and organisation in the conserved sequence blocks with those of Trypanosoma cruzi and Trypanosoma lewisi. A 19,257 bp partial region of the maxicircle of T. copemani that contained the entire coding region was obtained. Comparative analysis of the T. copemani entire maxicircle coding region with the coding regions of T. cruzi and T. lewisi showed they share 71.05% and 71.28% identity, respectively. The shared features in the maxicircle/minicircle organisation and sequence between T. copemani and T. cruzi/T. lewisi suggest similarities in their process of kDNA replication, and are of significance in understanding the evolution of Australian trypanosomes. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  12. Conserved Sequence Preferences Contribute to Substrate Recognition by the Proteasome*

    PubMed Central

    Yu, Houqing; Singh Gautam, Amit K.; Wilmington, Shameika R.; Wylie, Dennis; Martinez-Fonts, Kirby; Kago, Grace; Warburton, Marie; Chavali, Sreenivas; Inobe, Tomonao; Finkelstein, Ilya J.; Babu, M. Madan

    2016-01-01

    The proteasome has pronounced preferences for the amino acid sequence of its substrates at the site where it initiates degradation. Here, we report that modulating these sequences can tune the steady-state abundance of proteins over 2 orders of magnitude in cells. This is the same dynamic range as seen for inducing ubiquitination through a classic N-end rule degron. The stability and abundance of His3 constructs dictated by the initiation site affect survival of yeast cells and show that variation in proteasomal initiation can affect fitness. The proteasome's sequence preferences are linked directly to the affinity of the initiation sites to their receptor on the proteasome and are conserved between Saccharomyces cerevisiae, Schizosaccharomyces pombe, and human cells. These findings establish that the sequence composition of unstructured initiation sites influences protein abundance in vivo in an evolutionarily conserved manner and can affect phenotype and fitness. PMID:27226608

  13. Genome-wide discovery of novel and conserved microRNAs in white shrimp (Litopenaeus vannamei).

    PubMed

    Xi, Qian-Yun; Xiong, Yuan-Yan; Wang, Yuan-Mei; Cheng, Xiao; Qi, Qi-En; Shu, Gang; Wang, Song-Bo; Wang, Li-Na; Gao, Ping; Zhu, Xiao-Tong; Jiang, Qing-Yan; Zhang, Yong-Liang; Liu, Li

    2015-01-01

    Of late years, a large amount of conserved and species-specific microRNAs (miRNAs) have been performed on identification from species which are economically important but lack a full genome sequence. In this study, Solexa deep sequencing and cross-species miRNA microarray were used to detect miRNAs in white shrimp. We identified 239 conserved miRNAs, 14 miRNA* sequences and 20 novel miRNAs by bioinformatics analysis from 7,561,406 high-quality reads representing 325,370 distinct sequences. The all 20 novel miRNAs were species-specific in white shrimp and not homologous in other species. Using the conserved miRNAs from the miRBase database as a query set to search for homologs from shrimp expressed sequence tags (ESTs), 32 conserved computationally predicted miRNAs were discovered in shrimp. In addition, using microarray analysis in the shrimp fed with Panax ginseng polysaccharide complex, 151 conserved miRNAs were identified, 18 of which were significant up-expression, while 49 miRNAs were significant down-expression. In particular, qRT-PCR analysis was also performed for nine miRNAs in three shrimp tissues such as muscle, gill and hepatopancreas. Results showed that these miRNAs expression are tissue specific. Combining results of the three methods, we detected 20 novel and 394 conserved miRNAs. Verification with quantitative reverse transcription (qRT-PCR) and Northern blot showed a high confidentiality of data. The study provides the first comprehensive specific miRNA profile of white shrimp, which includes useful information for future investigations into the function of miRNAs in regulation of shrimp development and immunology.

  14. Substantial Loss of Conserved and Gain of Novel MicroRNA Families in Flatworms

    PubMed Central

    Fromm, Bastian; Worren, Merete Molton; Hahn, Christoph; Hovig, Eivind; Bachmann, Lutz

    2013-01-01

    Recent studies on microRNA (miRNA) evolution focused mainly on the comparison of miRNA complements between animal clades. However, evolution of miRNAs within such groups is poorly explored despite the availability of comparable data that in some cases lack only a few key taxa. For flatworms (Platyhelminthes), miRNA complements are available for some free-living flatworms and all major parasitic lineages, except for the Monogenea. We present the miRNA complement of the monogenean flatworm Gyrodactylus salaris that facilitates a comprehensive analysis of miRNA evolution in Platyhelminthes. Using the newly designed bioinformatics pipeline miRCandRef, the miRNA complement was disentangled from next-generation sequencing of small RNAs and genomic DNA without a priori genome assembly. It consists of 39 miRNA hairpin loci of conserved miRNA families, and 22 novel miRNAs. A comparison with the miRNA complements of Schmidtea mediterranea (Turbellaria), Schistosoma japonicum (Trematoda), and Echinococcus granulosus (Cestoda) reveals a substantial loss of conserved bilaterian, protostomian, and lophotrochozoan miRNAs. Eight of the 46 expected conserved miRNAs were lost in all flatworms, 16 in Neodermata and 24 conserved miRNAs could not be detected in the cestode and the trematode. Such a gradual loss of miRNAs has not been reported before for other animal phyla. Currently, little is known about miRNAs in Platyhelminthes, and for the majority of the lost miRNAs there is no prediction of function. As suggested earlier they might be related to morphological simplifications. The presence and absence of 153 conserved miRNAs was compared for platyhelminths and 32 other metazoan taxa. Phylogenetic analyses support the monophyly of Platyhelminthes (Turbellaria + Neodermata [Monogenea {Trematoda + Cestoda}]). PMID:24025793

  15. Functional Targets of the Monogenic Diabetes Transcription Factors HNF-1α and HNF-4α Are Highly Conserved Between Mice and Humans

    PubMed Central

    Boj, Sylvia F.; Servitja, Joan Marc; Martin, David; Rios, Martin; Talianidis, Iannis; Guigo, Roderic; Ferrer, Jorge

    2009-01-01

    OBJECTIVE The evolutionary conservation of transcriptional mechanisms has been widely exploited to understand human biology and disease. Recent findings, however, unexpectedly showed that the transcriptional regulators hepatocyte nuclear factor (HNF)-1α and -4α rarely bind to the same genes in mice and humans, leading to the proposal that tissue-specific transcriptional regulation has undergone extensive divergence in the two species. Such observations have major implications for the use of mouse models to understand HNF-1α– and HNF-4α–deficient diabetes. However, the significance of studies that assess binding without considering regulatory function is poorly understood. RESEARCH DESIGN AND METHODS We compared previously reported mouse and human HNF-1α and HNF-4α binding studies with independent binding experiments. We also integrated binding studies with mouse and human loss-of-function gene expression datasets. RESULTS First, we confirmed the existence of species-specific HNF-1α and -4α binding, yet observed incomplete detection of binding in the different datasets, causing an underestimation of binding conservation. Second, only a minor fraction of HNF-1α– and HNF-4α–bound genes were downregulated in the absence of these regulators. This subset of functional targets did not show evidence for evolutionary divergence of binding or binding sequence motifs. Finally, we observed differences between conserved and species-specific binding properties. For example, conserved binding was more frequently located near transcriptional start sites and was more likely to involve multiple binding events in the same gene. CONCLUSIONS Despite evolutionary changes in binding, essential direct transcriptional functions of HNF-1α and -4α are largely conserved between mice and humans. PMID:19188435

  16. Fungal Genes in Context: Genome Architecture Reflects Regulatory Complexity and Function

    PubMed Central

    Noble, Luke M.; Andrianopoulos, Alex

    2013-01-01

    Gene context determines gene expression, with local chromosomal environment most influential. Comparative genomic analysis is often limited in scope to conserved or divergent gene and protein families, and fungi are well suited to this approach with low functional redundancy and relatively streamlined genomes. We show here that one aspect of gene context, the amount of potential upstream regulatory sequence maintained through evolution, is highly predictive of both molecular function and biological process in diverse fungi. Orthologs with large upstream intergenic regions (UIRs) are strongly enriched in information processing functions, such as signal transduction and sequence-specific DNA binding, and, in the genus Aspergillus, include the majority of experimentally studied, high-level developmental and metabolic transcriptional regulators. Many uncharacterized genes are also present in this class and, by implication, may be of similar importance. Large intergenic regions also share two novel sequence characteristics, currently of unknown significance: they are enriched for plus-strand polypyrimidine tracts and an information-rich, putative regulatory motif that was present in the last common ancestor of the Pezizomycotina. Systematic consideration of gene UIR in comparative genomics, particularly for poorly characterized species, could help reveal organisms’ regulatory priorities. PMID:23699226

  17. The genome sequence of the plant pathogen Xylella fastidiosa. The Xylella fastidiosa Consortium of the Organization for Nucleotide Sequencing and Analysis.

    PubMed

    Simpson, A J; Reinach, F C; Arruda, P; Abreu, F A; Acencio, M; Alvarenga, R; Alves, L M; Araya, J E; Baia, G S; Baptista, C S; Barros, M H; Bonaccorsi, E D; Bordin, S; Bové, J M; Briones, M R; Bueno, M R; Camargo, A A; Camargo, L E; Carraro, D M; Carrer, H; Colauto, N B; Colombo, C; Costa, F F; Costa, M C; Costa-Neto, C M; Coutinho, L L; Cristofani, M; Dias-Neto, E; Docena, C; El-Dorry, H; Facincani, A P; Ferreira, A J; Ferreira, V C; Ferro, J A; Fraga, J S; França, S C; Franco, M C; Frohme, M; Furlan, L R; Garnier, M; Goldman, G H; Goldman, M H; Gomes, S L; Gruber, A; Ho, P L; Hoheisel, J D; Junqueira, M L; Kemper, E L; Kitajima, J P; Krieger, J E; Kuramae, E E; Laigret, F; Lambais, M R; Leite, L C; Lemos, E G; Lemos, M V; Lopes, S A; Lopes, C R; Machado, J A; Machado, M A; Madeira, A M; Madeira, H M; Marino, C L; Marques, M V; Martins, E A; Martins, E M; Matsukuma, A Y; Menck, C F; Miracca, E C; Miyaki, C Y; Monteriro-Vitorello, C B; Moon, D H; Nagai, M A; Nascimento, A L; Netto, L E; Nhani, A; Nobrega, F G; Nunes, L R; Oliveira, M A; de Oliveira, M C; de Oliveira, R C; Palmieri, D A; Paris, A; Peixoto, B R; Pereira, G A; Pereira, H A; Pesquero, J B; Quaggio, R B; Roberto, P G; Rodrigues, V; de M Rosa, A J; de Rosa, V E; de Sá, R G; Santelli, R V; Sawasaki, H E; da Silva, A C; da Silva, A M; da Silva, F R; da Silva, W A; da Silveira, J F; Silvestri, M L; Siqueira, W J; de Souza, A A; de Souza, A P; Terenzi, M F; Truffi, D; Tsai, S M; Tsuhako, M H; Vallada, H; Van Sluys, M A; Verjovski-Almeida, S; Vettore, A L; Zago, M A; Zatz, M; Meidanis, J; Setubal, J C

    2000-07-13

    Xylella fastidiosa is a fastidious, xylem-limited bacterium that causes a range of economically important plant diseases. Here we report the complete genome sequence of X. fastidiosa clone 9a5c, which causes citrus variegated chlorosis--a serious disease of orange trees. The genome comprises a 52.7% GC-rich 2,679,305-base-pair (bp) circular chromosome and two plasmids of 51,158 bp and 1,285 bp. We can assign putative functions to 47% of the 2,904 predicted coding regions. Efficient metabolic functions are predicted, with sugars as the principal energy and carbon source, supporting existence in the nutrient-poor xylem sap. The mechanisms associated with pathogenicity and virulence involve toxins, antibiotics and ion sequestration systems, as well as bacterium-bacterium and bacterium-host interactions mediated by a range of proteins. Orthologues of some of these proteins have only been identified in animal and human pathogens; their presence in X. fastidiosa indicates that the molecular basis for bacterial pathogenicity is both conserved and independent of host. At least 83 genes are bacteriophage-derived and include virulence-associated genes from other bacteria, providing direct evidence of phage-mediated horizontal gene transfer.

  18. Variation in the genomic locations and sequence conservation of STAR elements among staphylococcal species provides insight into DNA repeat evolution

    PubMed Central

    2012-01-01

    Background Staphylococcus aureus Repeat (STAR) elements are a type of interspersed intergenic direct repeat. In this study the conservation and variation in these elements was explored by bioinformatic analyses of published staphylococcal genome sequences and through sequencing of specific STAR element loci from a large set of S. aureus isolates. Results Using bioinformatic analyses, we found that the STAR elements were located in different genomic loci within each staphylococcal species. There was no correlation between the number of STAR elements in each genome and the evolutionary relatedness of staphylococcal species, however higher levels of repeats were observed in both S. aureus and S. lugdunensis compared to other staphylococcal species. Unexpectedly, sequencing of the internal spacer sequences of individual repeat elements from multiple isolates showed conservation at the sequence level within deep evolutionary lineages of S. aureus. Whilst individual STAR element loci were demonstrated to expand and contract, the sequences associated with each locus were stable and distinct from one another. Conclusions The high degree of lineage and locus-specific conservation of these intergenic repeat regions suggests that STAR elements are maintained due to selective or molecular forces with some of these elements having an important role in cell physiology. The high prevalence in two of the more virulent staphylococcal species is indicative of a potential role for STAR elements in pathogenesis. PMID:23020678

  19. Marker genes that are less conserved in their sequences are useful for predicting genome-wide similarity levels between closely related prokaryotic strains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lan, Yemin; Rosen, Gail; Hershberg, Ruth

    The 16s rRNA gene is so far the most widely used marker for taxonomical classification and separation of prokaryotes. Since it is universally conserved among prokaryotes, it is possible to use this gene to classify a broad range of prokaryotic organisms. At the same time, it has often been noted that the 16s rRNA gene is too conserved to separate between prokaryotes at finer taxonomic levels. In this paper, we examine how well levels of similarity of 16s rRNA and 73 additional universal or nearly universal marker genes correlate with genome-wide levels of gene sequence similarity. We demonstrate that themore » percent identity of 16s rRNA predicts genome-wide levels of similarity very well for distantly related prokaryotes, but not for closely related ones. In closely related prokaryotes, we find that there are many other marker genes for which levels of similarity are much more predictive of genome-wide levels of gene sequence similarity. Finally, we show that the identities of the markers that are most useful for predicting genome-wide levels of similarity within closely related prokaryotic lineages vary greatly between lineages. However, the most useful markers are always those that are least conserved in their sequences within each lineage. In conclusion, our results show that by choosing markers that are less conserved in their sequences within a lineage of interest, it is possible to better predict genome-wide gene sequence similarity between closely related prokaryotes than is possible using the 16s rRNA gene. We point readers towards a database we have created (POGO-DB) that can be used to easily establish which markers show lowest levels of sequence conservation within different prokaryotic lineages.« less

  20. Marker genes that are less conserved in their sequences are useful for predicting genome-wide similarity levels between closely related prokaryotic strains

    DOE PAGES

    Lan, Yemin; Rosen, Gail; Hershberg, Ruth

    2016-05-03

    The 16s rRNA gene is so far the most widely used marker for taxonomical classification and separation of prokaryotes. Since it is universally conserved among prokaryotes, it is possible to use this gene to classify a broad range of prokaryotic organisms. At the same time, it has often been noted that the 16s rRNA gene is too conserved to separate between prokaryotes at finer taxonomic levels. In this paper, we examine how well levels of similarity of 16s rRNA and 73 additional universal or nearly universal marker genes correlate with genome-wide levels of gene sequence similarity. We demonstrate that themore » percent identity of 16s rRNA predicts genome-wide levels of similarity very well for distantly related prokaryotes, but not for closely related ones. In closely related prokaryotes, we find that there are many other marker genes for which levels of similarity are much more predictive of genome-wide levels of gene sequence similarity. Finally, we show that the identities of the markers that are most useful for predicting genome-wide levels of similarity within closely related prokaryotic lineages vary greatly between lineages. However, the most useful markers are always those that are least conserved in their sequences within each lineage. In conclusion, our results show that by choosing markers that are less conserved in their sequences within a lineage of interest, it is possible to better predict genome-wide gene sequence similarity between closely related prokaryotes than is possible using the 16s rRNA gene. We point readers towards a database we have created (POGO-DB) that can be used to easily establish which markers show lowest levels of sequence conservation within different prokaryotic lineages.« less

  1. Conservation of Three-Dimensional Helix-Loop-Helix Structure through the Vertebrate Lineage Reopens the Cold Case of Gonadotropin-Releasing Hormone-Associated Peptide.

    PubMed

    Pérez Sirkin, Daniela I; Lafont, Anne-Gaëlle; Kamech, Nédia; Somoza, Gustavo M; Vissio, Paula G; Dufour, Sylvie

    2017-01-01

    GnRH-associated peptide (GAP) is the C-terminal portion of the gonadotropin-releasing hormone (GnRH) preprohormone. Although it was reported in mammals that GAP may act as a prolactin-inhibiting factor and can be co-secreted with GnRH into the hypophyseal portal blood, GAP has been practically out of the research circuit for about 20 years. Comparative studies highlighted the low conservation of GAP primary amino acid sequences among vertebrates, contributing to consider that this peptide only participates in the folding or carrying process of GnRH. Considering that the three-dimensional (3D) structure of a protein may define its function, the aim of this study was to evaluate if GAP sequences and 3D structures are conserved in the vertebrate lineage. GAP sequences from various vertebrates were retrieved from databases. Analysis of primary amino acid sequence identity and similarity, molecular phylogeny, and prediction of 3D structures were performed. Amino acid sequence comparison and phylogeny analyses confirmed the large variation of GAP sequences throughout vertebrate radiation. In contrast, prediction of the 3D structure revealed a striking conservation of the 3D structure of GAP1 (GAP associated with the hypophysiotropic type 1 GnRH), despite low amino acid sequence conservation. This GAP1 peptide presented a typical helix-loop-helix (HLH) structure in all the vertebrate species analyzed. This HLH structure could also be predicted for GAP2 in some but not all vertebrate species and in none of the GAP3 analyzed. These results allowed us to infer that selective pressures have maintained GAP1 HLH structure throughout the vertebrate lineage. The conservation of the HLH motif, known to confer biological activity to various proteins, suggests that GAP1 peptides may exert some hypophysiotropic biological functions across vertebrate radiation.

  2. Conservation of Three-Dimensional Helix-Loop-Helix Structure through the Vertebrate Lineage Reopens the Cold Case of Gonadotropin-Releasing Hormone-Associated Peptide

    PubMed Central

    Pérez Sirkin, Daniela I.; Lafont, Anne-Gaëlle; Kamech, Nédia; Somoza, Gustavo M.; Vissio, Paula G.; Dufour, Sylvie

    2017-01-01

    GnRH-associated peptide (GAP) is the C-terminal portion of the gonadotropin-releasing hormone (GnRH) preprohormone. Although it was reported in mammals that GAP may act as a prolactin-inhibiting factor and can be co-secreted with GnRH into the hypophyseal portal blood, GAP has been practically out of the research circuit for about 20 years. Comparative studies highlighted the low conservation of GAP primary amino acid sequences among vertebrates, contributing to consider that this peptide only participates in the folding or carrying process of GnRH. Considering that the three-dimensional (3D) structure of a protein may define its function, the aim of this study was to evaluate if GAP sequences and 3D structures are conserved in the vertebrate lineage. GAP sequences from various vertebrates were retrieved from databases. Analysis of primary amino acid sequence identity and similarity, molecular phylogeny, and prediction of 3D structures were performed. Amino acid sequence comparison and phylogeny analyses confirmed the large variation of GAP sequences throughout vertebrate radiation. In contrast, prediction of the 3D structure revealed a striking conservation of the 3D structure of GAP1 (GAP associated with the hypophysiotropic type 1 GnRH), despite low amino acid sequence conservation. This GAP1 peptide presented a typical helix-loop-helix (HLH) structure in all the vertebrate species analyzed. This HLH structure could also be predicted for GAP2 in some but not all vertebrate species and in none of the GAP3 analyzed. These results allowed us to infer that selective pressures have maintained GAP1 HLH structure throughout the vertebrate lineage. The conservation of the HLH motif, known to confer biological activity to various proteins, suggests that GAP1 peptides may exert some hypophysiotropic biological functions across vertebrate radiation. PMID:28878737

  3. Transcription Factors Bind Thousands of Active and InactiveRegions in the Drosophila Blastoderm

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Xiao-Yong; MacArthur, Stewart; Bourgon, Richard

    2008-01-10

    Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. Here, we use whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anterior-posterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched inmore » bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over forty well-characterized enhancers known to respond to these factors as well as several hundred putative new cis-regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsal-ventral patterning genes, whose expression we show to be quantitatively modulated by anterior-posterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly-bound regions are not involved in early-embryonic transcriptional regulation, and a significant proportion may be nonfunctional. Surprisingly, for five of the six factors, their recognition sites are not unambiguously more constrained evolutionarily than the immediate flanking DNA, even in more highly bound and presumably functional regions, indicating that comparative DNA sequence analysis is limited in its ability to identify functional transcription factor targets.« less

  4. The Mitochondrial Genome of Chara vulgaris: Insights into the Mitochondrial DNA Architecture of the Last Common Ancestor of Green Algae and Land PlantsW⃞

    PubMed Central

    Turmel, Monique; Otis, Christian; Lemieux, Claude

    2003-01-01

    Mitochondrial DNA (mtDNA) has undergone radical changes during the evolution of green plants, yet little is known about the dynamics of mtDNA evolution in this phylum. Land plant mtDNAs differ from the few green algal mtDNAs that have been analyzed to date by their expanded size, long spacers, and diversity of introns. We have determined the mtDNA sequence of Chara vulgaris (Charophyceae), a green alga belonging to the charophycean order (Charales) that is thought to be the most closely related alga to land plants. This 67,737-bp mtDNA sequence, displaying 68 conserved genes and 27 introns, was compared with those of three angiosperms, the bryophyte Marchantia polymorpha, the charophycean alga Chaetosphaeridium globosum (Coleochaetales), and the green alga Mesostigma viride. Despite important differences in size and intron composition, Chara mtDNA strikingly resembles Marchantia mtDNA; for instance, all except 9 of 68 conserved genes lie within blocks of colinear sequences. Overall, our genome comparisons and phylogenetic analyses provide unequivocal support for a sister-group relationship between the Charales and the land plants. Only four introns in land plant mtDNAs appear to have been inherited vertically from a charalean algar ancestor. We infer that the common ancestor of green algae and land plants harbored a tightly packed, gene-rich, and relatively intron-poor mitochondrial genome. The group II introns in this ancestral genome appear to have spread to new mtDNA sites during the evolution of bryophytes and charalean green algae, accounting for part of the intron diversity found in Chara and land plant mitochondria. PMID:12897260

  5. Evidence for the Concerted Evolution between Short Linear Protein Motifs and Their Flanking Regions

    PubMed Central

    Chica, Claudia; Diella, Francesca; Gibson, Toby J.

    2009-01-01

    Background Linear motifs are short modules of protein sequences that play a crucial role in mediating and regulating many protein–protein interactions. The function of linear motifs strongly depends on the context, e.g. functional instances mainly occur inside flexible regions that are accessible for interaction. Sometimes linear motifs appear as isolated islands of conservation in multiple sequence alignments. However, they also occur in larger blocks of sequence conservation, suggesting an active role for the neighbouring amino acids. Results The evolution of regions flanking 116 functional linear motif instances was studied. The conservation of the amino acid sequence and order/disorder tendency of those regions was related to presence/absence of the instance. For the majority of the analysed instances, the pairs of sequences conserving the linear motif were also observed to maintain a similar local structural tendency and/or to have higher local sequence conservation when compared to pairs of sequences where one is missing the linear motif. Furthermore, those instances have a higher chance to co–evolve with the neighbouring residues in comparison to the distant ones. Those findings are supported by examples where the regulation of the linear motif–mediated interaction has been shown to depend on the modifications (e.g. phosphorylation) at neighbouring positions or is thought to benefit from the binding versatility of disordered regions. Conclusion The results suggest that flanking regions are relevant for linear motif–mediated interactions, both at the structural and sequence level. More interestingly, they indicate that the prediction of linear motif instances can be enriched with contextual information by performing a sequence analysis similar to the one presented here. This can facilitate the understanding of the role of these predicted instances in determining the protein function inside the broader context of the cellular network where they arise. PMID:19584925

  6. CoSMoS: Conserved Sequence Motif Search in the proteome

    PubMed Central

    Liu, Xiao I; Korde, Neeraj; Jakob, Ursula; Leichert, Lars I

    2006-01-01

    Background With the ever-increasing number of gene sequences in the public databases, generating and analyzing multiple sequence alignments becomes increasingly time consuming. Nevertheless it is a task performed on a regular basis by researchers in many labs. Results We have now created a database called CoSMoS to find the occurrences and at the same time evaluate the significance of sequence motifs and amino acids encoded in the whole genome of the model organism Escherichia coli K12. We provide a precomputed set of multiple sequence alignments for each individual E. coli protein with all of its homologues in the RefSeq database. The alignments themselves, information about the occurrence of sequence motifs together with information on the conservation of each of the more than 1.3 million amino acids encoded in the E. coli genome can be accessed via the web interface of CoSMoS. Conclusion CoSMoS is a valuable tool to identify highly conserved sequence motifs, to find regions suitable for mutational studies in functional analyses and to predict important structural features in E. coli proteins. PMID:16433915

  7. Identification of a Conserved Non-Protein-Coding Genomic Element that Plays an Essential Role in Alphabaculovirus Pathogenesis

    PubMed Central

    Kikhno, Irina

    2014-01-01

    Highly homologous sequences 154–157 bp in length grouped under the name of “conserved non-protein-coding element” (CNE) were revealed in all of the sequenced genomes of baculoviruses belonging to the genus Alphabaculovirus. A CNE alignment led to the detection of a set of highly conserved nucleotide clusters that occupy strictly conserved positions in the CNE sequence. The significant length of the CNE and conservation of both its length and cluster architecture were identified as a combination of characteristics that make this CNE different from known viral non-coding functional sequences. The essential role of the CNE in the Alphabaculovirus life cycle was demonstrated through the use of a CNE-knockout Autographa californica multiple nucleopolyhedrovirus (AcMNPV) bacmid. It was shown that the essential function of the CNE was not mediated by the presumed expression activities of the protein- and non-protein-coding genes that overlap the AcMNPV CNE. On the basis of the presented data, the AcMNPV CNE was categorized as a complex-structured, polyfunctional genomic element involved in an essential DNA transaction that is associated with an undefined function of the baculovirus genome. PMID:24740153

  8. Comparative Sequence and X-Inactivation Analyses of a Domain of Escape in Human Xp11.2 and the Conserved Segment in Mouse

    PubMed Central

    Tsuchiya, Karen D.; Greally, John M.; Yi, Yajun; Noel, Kevin P.; Truong, Jean-Pierre; Disteche, Christine M.

    2004-01-01

    We have performed X-inactivation and sequence analyses on 350 kb of sequence from human Xp11.2, a region shown previously to contain a cluster of genes that escape X inactivation, and we compared this region with the region of conserved synteny in mouse. We identified several new transcripts from this region in human and in mouse, which defined the full extent of the domain escaping X inactivation in both species. In human, escape from X inactivation involves an uninterrupted 235-kb domain of multiple genes. Despite highly conserved gene content and order between the two species, Smcx is the only mouse gene from the conserved segment that escapes inactivation. As repetitive sequences are believed to facilitate spreading of X inactivation along the chromosome, we compared the repetitive sequence composition of this region between the two species. We found that long terminal repeats (LTRs) were decreased in the human domain of escape, but not in the majority of the conserved mouse region adjacent to Smcx in which genes were subject to X inactivation, suggesting that these repeats might be excluded from escape domains to prevent spreading of silencing. Our findings indicate that genomic context, as well as gene-specific regulatory elements, interact to determine expression of a gene from the inactive X-chromosome. PMID:15197169

  9. Effects of a Non-Conservative Sequence on the Properties of β-glucuronidase from Aspergillus terreus Li-20

    PubMed Central

    Liu, Yanli; Huangfu, Jie; Qi, Feng; Kaleem, Imdad; E, Wenwen; Li, Chun

    2012-01-01

    We cloned the β-glucuronidase gene (AtGUS) from Aspergillus terreus Li-20 encoding 657 amino acids (aa), which can transform glycyrrhizin into glycyrrhetinic acid monoglucuronide (GAMG) and glycyrrhetinic acid (GA). Based on sequence alignment, the C-terminal non-conservative sequence showed low identity with those of other species; thus, the partial sequence AtGUS(-3t) (1–592 aa) was amplified to determine the effects of the non-conservative sequence on the enzymatic properties. AtGUS and AtGUS(-3t) were expressed in E. coli BL21, producing AtGUS-E and AtGUS(-3t)-E, respectively. At the similar optimum temperature (55°C) and pH (AtGUS-E, 6.6; AtGUS(-3t)-E, 7.0) conditions, the thermal stability of AtGUS(-3t)-E was enhanced at 65°C, and the metal ions Co2+, Ca2+ and Ni2+ showed opposite effects on AtGUS-E and AtGUS(-3t)-E, respectively. Furthermore, Km of AtGUS(-3t)-E (1.95 mM) was just nearly one-seventh that of AtGUS-E (12.9 mM), whereas the catalytic efficiency of AtGUS(-3t)-E was 3.2 fold higher than that of AtGUS-E (7.16 vs. 2.24 mM s−1), revealing that the truncation of non-conservative sequence can significantly improve the catalytic efficiency of AtGUS. Conformational analysis illustrated significant difference in the secondary structure between AtGUS-E and AtGUS(-3t)-E by circular dichroism (CD). The results showed that the truncation of the non-conservative sequence could preferably alter and influence the stability and catalytic efficiency of enzyme. PMID:22347419

  10. Computational identification of developmental enhancers:conservation and function of transcription factor binding-site clustersin drosophila melanogaster and drosophila psedoobscura

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berman, Benjamin P.; Pfeiffer, Barret D.; Laverty, Todd R.

    2004-08-06

    Background The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. Results We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene,more » and assayed embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Conclusions Measuring conservation of sequence features closely linked to function - such as binding-site clustering - makes better use of comparative sequence data than commonly used methods that examine only sequence identity.« less

  11. Delineating slowly and rapidly evolving fractions of the Drosophila genome.

    PubMed

    Keith, Jonathan M; Adams, Peter; Stephen, Stuart; Mattick, John S

    2008-05-01

    Evolutionary conservation is an important indicator of function and a major component of bioinformatic methods to identify non-protein-coding genes. We present a new Bayesian method for segmenting pairwise alignments of eukaryotic genomes while simultaneously classifying segments into slowly and rapidly evolving fractions. We also describe an information criterion similar to the Akaike Information Criterion (AIC) for determining the number of classes. Working with pairwise alignments enables detection of differences in conservation patterns among closely related species. We analyzed three whole-genome and three partial-genome pairwise alignments among eight Drosophila species. Three distinct classes of conservation level were detected. Sequences comprising the most slowly evolving component were consistent across a range of species pairs, and constituted approximately 62-66% of the D. melanogaster genome. Almost all (>90%) of the aligned protein-coding sequence is in this fraction, suggesting much of it (comprising the majority of the Drosophila genome, including approximately 56% of non-protein-coding sequences) is functional. The size and content of the most rapidly evolving component was species dependent, and varied from 1.6% to 4.8%. This fraction is also enriched for protein-coding sequence (while containing significant amounts of non-protein-coding sequence), suggesting it is under positive selection. We also classified segments according to conservation and GC content simultaneously. This analysis identified numerous sub-classes of those identified on the basis of conservation alone, but was nevertheless consistent with that classification. Software, data, and results available at www.maths.qut.edu.au/-keithj/. Genomic segments comprising the conservation classes available in BED format.

  12. COOLAIR Antisense RNAs Form Evolutionarily Conserved Elaborate Secondary Structures

    DOE PAGES

    Hawkes, Emily J.; Hennelly, Scott P.; Novikova, Irina V.; ...

    2016-09-20

    There is considerable debate about the functionality of long non-coding RNAs (lncRNAs). Lack of sequence conservation has been used to argue against functional relevance. Here, we investigated antisense lncRNAs, called COOLAIR, at the A. thaliana FLC locus and experimentally determined their secondary structure. The major COOLAIR variants are highly structured, organized by exon. The distally polyadenylated transcript has a complex multi-domain structure, altered by a single non-coding SNP defining a functionally distinct A. thaliana FLC haplotype. The A. thaliana COOLAIR secondary structure was used to predict COOLAIR exons in evolutionarily divergent Brassicaceae species. These predictions were validated through chemical probingmore » and cloning. Despite the relatively low nucleotide sequence identity, the structures, including multi-helix junctions, show remarkable evolutionary conservation. In a number of places, the structure is conserved through covariation of a non-contiguous DNA sequence. This structural conservation supports a functional role for COOLAIR transcripts rather than, or in addition to, antisense transcription.« less

  13. Conservation of hot regions in protein-protein interaction in evolution.

    PubMed

    Hu, Jing; Li, Jiarui; Chen, Nansheng; Zhang, Xiaolong

    2016-11-01

    The hot regions of protein-protein interactions refer to the active area which formed by those most important residues to protein combination process. With the research development on protein interactions, lots of predicted hot regions can be discovered efficiently by intelligent computing methods, while performing biology experiments to verify each every prediction is hardly to be done due to the time-cost and the complexity of the experiment. This study based on the research of hot spot residue conservations, the proposed method is used to verify authenticity of predicted hot regions that using machine learning algorithm combined with protein's biological features and sequence conservation, though multiple sequence alignment, module substitute matrix and sequence similarity to create conservation scoring algorithm, and then using threshold module to verify the conservation tendency of hot regions in evolution. This research work gives an effective method to verify predicted hot regions in protein-protein interactions, which also provides a useful way to deeply investigate the functional activities of protein hot regions. Copyright © 2016. Published by Elsevier Inc.

  14. COOLAIR Antisense RNAs Form Evolutionarily Conserved Elaborate Secondary Structures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hawkes, Emily J.; Hennelly, Scott P.; Novikova, Irina V.

    There is considerable debate about the functionality of long non-coding RNAs (lncRNAs). Lack of sequence conservation has been used to argue against functional relevance. Here, we investigated antisense lncRNAs, called COOLAIR, at the A. thaliana FLC locus and experimentally determined their secondary structure. The major COOLAIR variants are highly structured, organized by exon. The distally polyadenylated transcript has a complex multi-domain structure, altered by a single non-coding SNP defining a functionally distinct A. thaliana FLC haplotype. The A. thaliana COOLAIR secondary structure was used to predict COOLAIR exons in evolutionarily divergent Brassicaceae species. These predictions were validated through chemical probingmore » and cloning. Despite the relatively low nucleotide sequence identity, the structures, including multi-helix junctions, show remarkable evolutionary conservation. In a number of places, the structure is conserved through covariation of a non-contiguous DNA sequence. This structural conservation supports a functional role for COOLAIR transcripts rather than, or in addition to, antisense transcription.« less

  15. Principles of regulatory information conservation between mouse and human

    DOE PAGES

    Cheng, Yong; Ma, Zhihai; Kim, Bong-Hyun; ...

    2014-11-19

    To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human–mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and withmore » genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Lastly, single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.« less

  16. THE GRK4 SUBFAMILY OF G PROTEIN-COUPLED RECEPTOR KINASES: ALTERNATIVE SPLICING, GENE ORGANIZATION, AND SEQUENCE CONSERVATION

    EPA Science Inventory

    The GRK4 subfamily of G protein-coupled receptor kinases. Alternative splicing, gene organization, and sequence conservation.

    Premont RT, Macrae AD, Aparicio SA, Kendall HE, Welch JE, Lefkowitz RJ.

    Department of Medicine, Howard Hughes Medical Institute, Duke Univer...

  17. 18 CFR 401.37 - Sequence of approval.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 18 Conservation of Power and Water Resources 2 2011-04-01 2011-04-01 false Sequence of approval. 401.37 Section 401.37 Conservation of Power and Water Resources DELAWARE RIVER BASIN COMMISSION ADMINISTRATIVE MANUAL RULES OF PRACTICE AND PROCEDURE Project Review Under Section 3.8 of the Compact § 401.37...

  18. Functions of the 3′ and 5′ genome RNA regions of members of the genus Flavivirus

    PubMed Central

    Brinton, Margo A.; Basu, Mausumi

    2015-01-01

    The positive sense genomes of members of the genus Flavivirus in the family Flaviviridae are ~11 kb nts in length and have a 5′ type I cap but no 3′ poly A. The 5′ and 3′ terminal regions contain short conserved sequences that are proposed to be repeated remnants of an ancient sequence. However, the functions of most of these conserved sequences have not yet been determined. The terminal regions of the genome also contain multiple conserved RNA structures. Functional data for many of these structures has been obtained. Three sets of complementary 3′ and 5′ terminal region sequences, some of which are located in conserved RNA structures, interact to form a panhandle structure that is required for initiation of minus strand RNA synthesis with the 5′ terminal structure functioning as the promoter. How the switch from the terminal RNA structure base pairing to the long distance RNA-RNA interaction is triggered and regulated is not well understood but evidence suggests involvement of a cell protein binding to three sites on the 3′ terminal RNA structures and a cis-acting metastable 3′ RNA element in the 3′ terminal structure. Cell proteins may also be involved in facilitating exponential replication of nascent genomic RNA within replication vesicles at later times of infection cycle. Other conserved RNA structures and/or sequences in the 5′ and 3′ terminal regions have been proposed to regulate genome translation. Additional functions of the 5′ and 3′ terminal sequences have also been reported. PMID:25683510

  19. Use of a Drosophila Genome-Wide Conserved Sequence Database to Identify Functionally Related cis-Regulatory Enhancers

    PubMed Central

    Brody, Thomas; Yavatkar, Amarendra S; Kuzin, Alexander; Kundu, Mukta; Tyson, Leonard J; Ross, Jermaine; Lin, Tzu-Yang; Lee, Chi-Hon; Awasaki, Takeshi; Lee, Tzumin; Odenwald, Ward F

    2012-01-01

    Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs. Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization. Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process. Developmental Dynamics 241:169–189, 2012. © 2011 Wiley Periodicals, Inc. Key findings A genome-wide catalog of Drosophila conserved DNA sequence clusters. cis-Decoder discovers functionally related enhancers. Functionally related enhancers share balanced sequence element copy numbers. Many enhancers function during multiple phases of development. PMID:22174086

  20. Comparison of ZP3 protein sequences among vertebrate species: to obtain a consensus sequence for immunocontraception.

    PubMed

    Zhu, X; Naz, R K

    1999-03-01

    The deduced ZP3 amino acid (aa) sequences of 13 vertebrate species namely mouse, hamster, rabbit, pig, porcine, cow, dog, cat, human, bonnet, marmoset, carp, and frog were compared using the PILEUP and PRETTY alignment programs (GCG, Wisconsin, USA). The published aa sequences obtained from 13 vertebrate species indicated the overall evolutionarily conservation in the N-terminus, central region, and C-terminus of the ZP3 polypeptide. More variations of ZP3 polypeptide sequences were seen in the alignments of carp and frog from the 11 mammalian species making the leader sequence more prominent. The canonical furin proteolytic processing signal at the C-terminus was found in all the ZP3 polypeptide sequences except of carp and frog. In the central region, the ZP3 deduced aa sequences of all the 13 vertebrate species aligned well, and six relatively conserved sequences were found. There are 11 conserved cysteine residues in the central region across all species including carp and frog, indicating that these residues have longer evolutionary history. The ZP3 aa sequence similarities were examined using the GAP program (GCG). The highest aa similarities are observed between the members of the same order within the class mammalia, and also (95.4%) between pig (ungulata) and rabbit (lagomorpha). The deduced ZP3 aa sequences per se may not be enough to build a phylogenetic tree.

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Takahashi, Ryo, E-mail: inter.takahashi@gmail.com; Todo, Yasuyuki, E-mail: yastodo@k.u-tokyo.ac.jp

    In recent years, shade coffee certification programs have attracted increasing attention from forest conservation and development organizations. The certification programs could be expected to promote forest conservation by providing a premium price to shade coffee producers. However, little is known about the significance of the conservation efforts generated by certification programs. In particular, the relationship between the impact of the certification and producer characteristics has yet to be examined. The purpose of this study, which was conducted in Ethiopia, was to examine the impact of a shade coffee certification program on forest conservation and its relationship with the socioeconomic characteristicsmore » of the producers. Remote sensing data of 2005 and 2010 was used to gauge the changes in forest area. Employing a probit model, we found that a forest coffee area being certified increased the probability of forest conservation by 19.3 percentage points relative to forest coffee areas lacking certification. We also found that although economically poor producers tended to engage in forest clearing, the forest coffee certification program had a significant impact on these producers. This result suggests that the certification program significantly affects the behaviors of economically poor producers and motivates these producers to conserve the forest. -- Highlights: • We employed the probit mode to evaluate the impact of the shade coffee certification on forest conservation in Ethiopia. • We estimated how the impact of the certification varied among producers with different characteristics. • The certification increased the probability of conserving forest by 19.3 percentage points. • Certification program motivated the economically poor producers to conserve the forest.« less

  2. Characterization of Rous sarcoma virus polyadenylation site use in vitro

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Maciolek, Nicole L.; McNally, Mark T.

    2008-05-10

    Polyadenylation of Rous sarcoma virus (RSV) RNA is inefficient, as approximately 15% of RSV RNAs represent read-through transcripts that use a downstream cellular polyadenylation site (poly(A) site). Read-through transcription has implications for the virus and the host since it is associated with oncogene capture and tumor induction. To explore the basis of inefficient RSV RNA 3'-end formation, we characterized RSV polyadenylation in vitro using HeLa cell nuclear extracts and HEK293 whole cell extracts. RSV polyadenylation substrates composed of the natural 3' end of viral RNA and various lengths of upstream sequence showed little or no polyadenylation, indicating that the RSVmore » poly(A) site is suboptimal. Efficiently used poly(A) sites often have identifiable upstream and downstream elements (USEs and DSEs) in close proximity to the conserved AAUAAA signal. The sequences upstream and downstream of the RSV poly(A) site deviate from those found in efficiently used poly(A) sites, which may explain inefficient RSV polyadenylation. To assess the quality of the RSV USEs and DSEs, the well-characterized SV40 late USEs and/or DSEs were substituted for the RSV elements and vice versa, which showed that the USEs and DSEs from RSV are suboptimal but functional. CstF interacted poorly with the RSV polyadenylation substrate, and the inactivity of the RSV poly(A) site was at least in part due to poor CstF binding since tethering CstF to the RSV substrate activated polyadenylation. Our data are consistent with poor polyadenylation factor binding sites in both the USE and DSE as the basis for inefficient use of the RSV poly(A) site and point to the importance of additional elements within RSV RNA in promoting 3' end formation.« less

  3. A novel pathway for the biosynthesis of heme in Archaea: genome-based bioinformatic predictions and experimental evidence.

    PubMed

    Storbeck, Sonja; Rolfes, Sarah; Raux-Deery, Evelyne; Warren, Martin J; Jahn, Dieter; Layer, Gunhild

    2010-12-13

    Heme is an essential prosthetic group for many proteins involved in fundamental biological processes in all three domains of life. In Eukaryota and Bacteria heme is formed via a conserved and well-studied biosynthetic pathway. Surprisingly, in Archaea heme biosynthesis proceeds via an alternative route which is poorly understood. In order to formulate a working hypothesis for this novel pathway, we searched 59 completely sequenced archaeal genomes for the presence of gene clusters consisting of established heme biosynthetic genes and colocalized conserved candidate genes. Within the majority of archaeal genomes it was possible to identify such heme biosynthesis gene clusters. From this analysis we have been able to identify several novel heme biosynthesis genes that are restricted to archaea. Intriguingly, several of the encoded proteins display similarity to enzymes involved in heme d(1) biosynthesis. To initiate an experimental verification of our proposals two Methanosarcina barkeri proteins predicted to catalyze the initial steps of archaeal heme biosynthesis were recombinantly produced, purified, and their predicted enzymatic functions verified.

  4. News from the protein mutability landscape.

    PubMed

    Hecht, Maximilian; Bromberg, Yana; Rost, Burkhard

    2013-11-01

    Some mutations of protein residues matter more than others, and these are often conserved evolutionarily. The explosion of deep sequencing and genotyping increasingly requires the distinction between effect and neutral variants. The simplest approach predicts all mutations of conserved residues to have an effect; however, this works poorly, at best. Many computational tools that are optimized to predict the impact of point mutations provide more detail. Here, we expand the perspective from the view of single variants to the level of sketching the entire mutability landscape. This landscape is defined by the impact of substituting every residue at each position in a protein by each of the 19 non-native amino acids. We review some of the powerful conclusions about protein function, stability and their robustness to mutation that can be drawn from such an analysis. Large-scale experimental and computational mutagenesis experiments are increasingly furthering our understanding of protein function and of the genotype-phenotype associations. We also discuss how these can be used to improve predictions of protein function and pathogenicity of missense variants. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights reserved.

  5. A Novel Pathway for the Biosynthesis of Heme in Archaea: Genome-Based Bioinformatic Predictions and Experimental Evidence

    PubMed Central

    Storbeck, Sonja; Rolfes, Sarah; Raux-Deery, Evelyne; Warren, Martin J.; Jahn, Dieter; Layer, Gunhild

    2010-01-01

    Heme is an essential prosthetic group for many proteins involved in fundamental biological processes in all three domains of life. In Eukaryota and Bacteria heme is formed via a conserved and well-studied biosynthetic pathway. Surprisingly, in Archaea heme biosynthesis proceeds via an alternative route which is poorly understood. In order to formulate a working hypothesis for this novel pathway, we searched 59 completely sequenced archaeal genomes for the presence of gene clusters consisting of established heme biosynthetic genes and colocalized conserved candidate genes. Within the majority of archaeal genomes it was possible to identify such heme biosynthesis gene clusters. From this analysis we have been able to identify several novel heme biosynthesis genes that are restricted to archaea. Intriguingly, several of the encoded proteins display similarity to enzymes involved in heme d 1 biosynthesis. To initiate an experimental verification of our proposals two Methanosarcina barkeri proteins predicted to catalyze the initial steps of archaeal heme biosynthesis were recombinantly produced, purified, and their predicted enzymatic functions verified. PMID:21197080

  6. Conservative forgetful scholars: How people learn causal structure through sequences of interventions.

    PubMed

    Bramley, Neil R; Lagnado, David A; Speekenbrink, Maarten

    2015-05-01

    Interacting with a system is key to uncovering its causal structure. A computational framework for interventional causal learning has been developed over the last decade, but how real causal learners might achieve or approximate the computations entailed by this framework is still poorly understood. Here we describe an interactive computer task in which participants were incentivized to learn the structure of probabilistic causal systems through free selection of multiple interventions. We develop models of participants' intervention choices and online structure judgments, using expected utility gain, probability gain, and information gain and introducing plausible memory and processing constraints. We find that successful participants are best described by a model that acts to maximize information (rather than expected score or probability of being correct); that forgets much of the evidence received in earlier trials; but that mitigates this by being conservative, preferring structures consistent with earlier stated beliefs. We explore 2 heuristics that partly explain how participants might be approximating these models without explicitly representing or updating a hypothesis space. (c) 2015 APA, all rights reserved).

  7. Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses

    PubMed Central

    Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael

    2013-01-01

    Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343

  8. The wheat cytochrome oxidase subunit II gene has an intron insert and three radical amino acid changes relative to maize

    PubMed Central

    Bonen, Linda; Boer, Poppo H.; Gray, Michael W.

    1984-01-01

    We have determined the sequence of the wheat mitochondrial gene for cytochrome oxidase subunit II (COII) and find that its derived protein sequence differs from that of maize at only three amino acid positions. Unexpectedly, all three replacements are non-conservative ones. The wheat COII gene has a highly-conserved intron at the same position as in maize, but the wheat intron is 1.5 times longer because of an insert relative to its maize counterpart. Hybridization analysis of mitochondrial DNA from rye, pea, broad bean and cucumber indicates strong sequence conservation of COII coding sequences among all these higher plants. However, only rye and maize mitochondrial DNA show homology with wheat COII intron sequences and rye alone with intron-insert sequences. We find that a sequence identical to the region of the 5' exon corresponding to the transmembrane domain of the COII protein is present at a second genomic location in wheat mitochondria. These variations in COII gene structure and size, as well as the presence of repeated COII sequences, illustrate at the DNA sequence level, factors which contribute to higher plant mitochondrial DNA diversity and complexity. ImagesFig. 3.Fig. 4.Fig. 5. PMID:16453565

  9. Separation of the PROX1 gene from upstream conserved elements in a complex inversion/translocation patient with hypoplastic left heart

    PubMed Central

    Gill, Harinder K; Parsons, Sian R; Spalluto, Cosma; Davies, Angela F; Knorz, Victoria J; Burlinson, Clare EG; Ng, Bee Ling; Carter, Nigel P; Ogilvie, Caroline Mackie; Wilson, David I; Roberts, Roland G

    2009-01-01

    Hypoplastic left heart (HLH) occurs in at least 1 in 10 000 live births but may be more common in utero. Its causes are poorly understood but a number of affected cases are associated with chromosomal abnormalities. We set out to localize the breakpoints in a patient with sporadic HLH and a de novo translocation. Initial studies showed that the apparently simple 1q41;3q27.1 translocation was actually combined with a 4-Mb inversion, also de novo, of material within 1q41. We therefore localized all four breakpoints and found that no known transcription units were disrupted. However we present a case, based on functional considerations, synteny and position of highly conserved non-coding sequence elements, and the heterozygous Prox1+/− mouse phenotype (ventricular hypoplasia), for the involvement of dysregulation of the PROX1 gene in the aetiology of HLH in this case. Accordingly, we show that the spatial expression pattern of PROX1 in the developing human heart is consistent with a role in cardiac development. We suggest that dysregulation of PROX1 gene expression due to separation from its conserved upstream elements is likely to have caused the heart defects observed in this patient, and that PROX1 should be considered as a potential candidate gene for other cases of HLH. The relevance of another breakpoint separating the cardiac gene ESRRG from a conserved downstream element is also discussed. PMID:19471316

  10. Junctional Adhesion Molecule A Serves as a Receptor for Prototype and Field-Isolate Strains of Mammalian Reovirus

    PubMed Central

    Campbell, Jacquelyn A.; Schelling, Pierre; Wetzel, J. Denise; Johnson, Elizabeth M.; Forrest, J. Craig; Wilson, Greame A. R.; Aurrand-Lions, Michel; Imhof, Beat A.; Stehle, Thilo; Dermody, Terence S.

    2005-01-01

    Reovirus infections are initiated by the binding of viral attachment protein σ1 to receptors on the surface of host cells. The σ1 protein is an elongated fiber comprised of an N-terminal tail that inserts into the virion and a C-terminal head that extends from the virion surface. The prototype reovirus strains type 1 Lang/53 (T1L/53) and type 3 Dearing/55 (T3D/55) use junctional adhesion molecule A (JAM-A) as a receptor. The C-terminal half of the T3D/55 σ1 protein interacts directly with JAM-A, but the determinants of receptor-binding specificity have not been identified. In this study, we investigated whether JAM-A also mediates the attachment of the prototype reovirus strain type 2 Jones/55 (T2J/55) and a panel of field-isolate strains representing each of the three serotypes. Antibodies specific for JAM-A were capable of inhibiting infections of HeLa cells by T1L/53, T2J/55, and T3D/55, demonstrating that strains of all three serotypes use JAM-A as a receptor. To corroborate these findings, we introduced JAM-A or the structurally related JAM family members JAM-B and JAM-C into Chinese hamster ovary cells, which are poorly permissive for reovirus infection. Both prototype and field-isolate reovirus strains were capable of infecting cells transfected with JAM-A but not those transfected with JAM-B or JAM-C. A sequence analysis of the σ1-encoding S1 gene segment of the strains chosen for study revealed little conservation in the deduced σ1 amino acid sequences among the three serotypes. This contrasts markedly with the observed sequence variability within each serotype, which is confined to a small number of amino acids. Mapping of these residues onto the crystal structure of σ1 identified regions of conservation and variability, suggesting a likely mode of JAM-A binding via a conserved surface at the base of the σ1 head domain. PMID:15956543

  11. Conservation of Transcription Start Sites within Genes across a Bacterial Genus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shao, Wenjun; Price, Morgan N.; Deutschbauer, Adam M.

    Transcription start sites (TSSs) lying inside annotated genes, on the same or opposite strand, have been observed in diverse bacteria, but the function of these unexpected transcripts is unclear. Here, we use the metal-reducing bacterium Shewanella oneidensis MR-1 and its relatives to study the evolutionary conservation of unexpected TSSs. Using high-resolution tiling microarrays and 5'-end RNA sequencing, we identified 2,531 TSSs in S. oneidensis MR-1, of which 18% were located inside coding sequences (CDSs). Comparative transcriptome analysis with seven additional Shewanella species revealed that the majority (76%) of the TSSs within the upstream regions of annotated genes (gTSSs) were conserved.more » Thirty percent of the TSSs that were inside genes and on the sense strand (iTSSs) were also conserved. Sequence analysis around these iTSSs showed conserved promoter motifs, suggesting that many iTSS are under purifying selection. Furthermore, conserved iTSSs are enriched for regulatory motifs, suggesting that they are regulated, and they tend to eliminate polar effects, which confirms that they are functional. In contrast, the transcription of antisense TSSs located inside CDSs (aTSSs) was significantly less likely to be conserved (22%). However, aTSSs whose transcription was conserved often have conserved promoter motifs and drive the expression of nearby genes. Overall, our findings demonstrate that some internal TSSs are conserved and drive protein expression despite their unusual locations, but the majority are not conserved and may reflect noisy initiation of transcription rather than a biological function.« less

  12. Domain architecture conservation in orthologs

    PubMed Central

    2011-01-01

    Background As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs. Results The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent. Conclusions On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance. PMID:21819573

  13. The first genetic map of the American cranberry: exploration of synteny conservation and quantitative trait loci.

    PubMed

    Georgi, Laura; Johnson-Cicalese, Jennifer; Honig, Josh; Das, Sushma Parankush; Rajah, Veeran D; Bhattacharya, Debashish; Bassil, Nahla; Rowland, Lisa J; Polashock, James; Vorsa, Nicholi

    2013-03-01

    The first genetic map of cranberry (Vaccinium macrocarpon) has been constructed, comprising 14 linkage groups totaling 879.9 cM with an estimated coverage of 82.2 %. This map, based on four mapping populations segregating for field fruit-rot resistance, contains 136 distinct loci. Mapped markers include blueberry-derived simple sequence repeat (SSR) and cranberry-derived sequence-characterized amplified region markers previously used for fingerprinting cranberry cultivars. In addition, SSR markers were developed near cranberry sequences resembling genes involved in flavonoid biosynthesis or defense against necrotrophic pathogens, or conserved orthologous set (COS) sequences. The cranberry SSRs were developed from next-generation cranberry genomic sequence assemblies; thus, the positions of these SSRs on the genomic map provide information about the genomic location of the sequence scaffold from which they were derived. The use of SSR markers near COS and other functional sequences, plus 33 SSR markers from blueberry, facilitates comparisons of this map with maps of other plant species. Regions of the cranberry map were identified that showed conservation of synteny with Vitis vinifera and Arabidopsis thaliana. Positioned on this map are quantitative trait loci (QTL) for field fruit-rot resistance (FFRR), fruit weight, titratable acidity, and sound fruit yield (SFY). The SFY QTL is adjacent to one of the fruit weight QTL and may reflect pleiotropy. Two of the FFRR QTL are in regions of conserved synteny with grape and span defense gene markers, and the third FFRR QTL spans a flavonoid biosynthetic gene.

  14. [Comparative analysis of clustered regularly interspaced short palindromic repeats (CRISPRs) loci in the genomes of halophilic archaea].

    PubMed

    Zhang, Fan; Zhang, Bing; Xiang, Hua; Hu, Songnian

    2009-11-01

    Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a widespread system that provides acquired resistance against phages in bacteria and archaea. Here we aim to genome-widely analyze the CRISPR in extreme halophilic archaea, of which the whole genome sequences are available at present time. We used bioinformatics methods including alignment, conservation analysis, GC content and RNA structure prediction to analyze the CRISPR structures of 7 haloarchaeal genomes. We identified the CRISPR structures in 5 halophilic archaea and revealed a conserved palindromic motif in the flanking regions of these CRISPR structures. In addition, we found that the repeat sequences of large CRISPR structures in halophilic archaea were greatly conserved, and two types of predicted RNA secondary structures derived from the repeat sequences were likely determined by the fourth base of the repeat sequence. Our results support the proposal that the leader sequence may function as recognition site by having palindromic structures in flanking regions, and the stem-loop secondary structure formed by repeat sequences may function in mediating the interaction between foreign genetic elements and CAS-encoded proteins.

  15. A comprehensive analysis of three Asiatic black bear mitochondrial genomes (subspecies ussuricus, formosanus and mupinensis), with emphasis on the complete mtDNA sequence of Ursus thibetanus ussuricus (Ursidae).

    PubMed

    Hwang, Dae-Sik; Ki, Jang-Seu; Jeong, Dong-Hyuk; Kim, Bo-Hyun; Lee, Bae-Keun; Han, Sang-Hoon; Lee, Jae-Seong

    2008-08-01

    In the present paper, we describe the mitochondrial genome sequence of the Asiatic black bear (Ursus thibetanus ussuricus) with particular emphasis on the control region (CR), and compared with mitochondrial genomes on molecular relationships among the bears. The mitochondrial genome sequence of U. thibetanus ussuricus was 16,700 bp in size with mostly conserved structures (e.g. 13 protein-coding, two rRNA genes, 22 tRNA genes). The CR consisted of several typical conserved domains such as F, E, D, and C boxes, and a conserved sequence block. Nucleotide sequences and the repeated motifs in the CR were different among the bear species, and their copy numbers were also variable according to populations, even within F1 generations of U. thibetanus ussuricus. Comparative analyses showed that the CR D1 region was highly informative for the discrimination of the bear family. These findings suggest that nucleotide sequences of both repeated motifs and CR D1 in the bear family are good markers for species discriminations.

  16. Sequence, structure and function relationships in flaviviruses as assessed by evolutive aspects of its conserved non-structural protein domains.

    PubMed

    da Fonseca, Néli José; Lima Afonso, Marcelo Querino; Pedersolli, Natan Gonçalves; de Oliveira, Lucas Carrijo; Andrade, Dhiego Souto; Bleicher, Lucas

    2017-10-28

    Flaviviruses are responsible for serious diseases such as dengue, yellow fever, and zika fever. Their genomes encode a polyprotein which, after cleavage, results in three structural and seven non-structural proteins. Homologous proteins can be studied by conservation and coevolution analysis as detected in multiple sequence alignments, usually reporting positions which are strictly necessary for the structure and/or function of all members in a protein family or which are involved in a specific sub-class feature requiring the coevolution of residue sets. This study provides a complete conservation and coevolution analysis on all flaviviruses non-structural proteins, with results mapped on all well-annotated available sequences. A literature review on the residues found in the analysis enabled us to compile available information on their roles and distribution among different flaviviruses. Also, we provide the mapping of conserved and coevolved residues for all sequences currently in SwissProt as a supplementary material, so that particularities in different viruses can be easily analyzed. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Creation of a data base for sequences of ribosomal nucleic acids and detection of conserved restriction endonucleases sites through computerized processing.

    PubMed Central

    Patarca, R; Dorta, B; Ramirez, J L

    1982-01-01

    As part of a project pertaining the organization of ribosomal genes in Kinetoplastidae, we have created a data base for published sequences of ribosomal nucleic acids, with information in Spanish. As a first step in their processing, we have written a computer program which introduces the new feature of determining the length of the fragments produced after single or multiple digestion with any of the known restriction enzymes. With this information we have detected conserved SAU 3A sites: (i) at the 5' end of the 5.8S rRNA and at the 3' end of the small subunit rRNA, both included in similar larger sequences; (ii) in the 5.8S rRNA of vertebrates (a second one), which is not present in lower eukaryotes, showing a clear evolutive divergence; and, (iii) at the 5' terminal of the small subunit rRNA, included in a larger conserved sequence. The possible biological importance of these sequences is discussed. PMID:6278402

  18. Sequence- and Interactome-Based Prediction of Viral Protein Hotspots Targeting Host Proteins: A Case Study for HIV Nef

    PubMed Central

    Sarmady, Mahdi; Dampier, William; Tozeren, Aydin

    2011-01-01

    Virus proteins alter protein pathways of the host toward the synthesis of viral particles by breaking and making edges via binding to host proteins. In this study, we developed a computational approach to predict viral sequence hotspots for binding to host proteins based on sequences of viral and host proteins and literature-curated virus-host protein interactome data. We use a motif discovery algorithm repeatedly on collections of sequences of viral proteins and immediate binding partners of their host targets and choose only those motifs that are conserved on viral sequences and highly statistically enriched among binding partners of virus protein targeted host proteins. Our results match experimental data on binding sites of Nef to host proteins such as MAPK1, VAV1, LCK, HCK, HLA-A, CD4, FYN, and GNB2L1 with high statistical significance but is a poor predictor of Nef binding sites on highly flexible, hoop-like regions. Predicted hotspots recapture CD8 cell epitopes of HIV Nef highlighting their importance in modulating virus-host interactions. Host proteins potentially targeted or outcompeted by Nef appear crowding the T cell receptor, natural killer cell mediated cytotoxicity, and neurotrophin signaling pathways. Scanning of HIV Nef motifs on multiple alignments of hepatitis C protein NS5A produces results consistent with literature, indicating the potential value of the hotspot discovery in advancing our understanding of virus-host crosstalk. PMID:21738584

  19. Sequence-Level Mechanisms of Human Epigenome Evolution

    PubMed Central

    Prendergast, James G.D.; Chambers, Emily V.; Semple, Colin A.M.

    2014-01-01

    DNA methylation and chromatin states play key roles in development and disease. However, the extent of recent evolutionary divergence in the human epigenome and the influential factors that have shaped it are poorly understood. To determine the links between genome sequence and human epigenome evolution, we examined the divergence of DNA methylation and chromatin states following segmental duplication events in the human lineage. Chromatin and DNA methylation states were found to have been generally well conserved following a duplication event, with the evolution of the epigenome largely uncoupled from the total number of genetic changes in the surrounding DNA sequence. However, the epigenome at tissue-specific, distal regulatory regions was observed to be unusually prone to diverge following duplication, with particular sequence differences, altering known sequence motifs, found to be associated with divergence in patterns of DNA methylation and chromatin. Alu elements were found to have played a particularly prominent role in shaping human epigenome evolution, and we show that human-specific AluY insertion events are strongly linked to the evolution of the DNA methylation landscape and gene expression levels, including at key neurological genes in the human brain. Studying paralogous regions within the same sample enables the study of the links between genome and epigenome evolution while controlling for biological and technical variation. We show DNA methylation and chromatin divergence between duplicated regions are linked to the divergence of particular genetic motifs, with Alu elements having played a disproportionate role in the evolution of the epigenome in the human lineage. PMID:24966180

  20. Conserved antigenic sites between MERS-CoV and Bat-coronavirus are revealed through sequence analysis.

    PubMed

    Sharmin, Refat; Islam, Abul B M M K

    2016-01-01

    MERS-CoV is a newly emerged human coronavirus reported closely related with HKU4 and HKU5 Bat coronaviruses. Bat and MERS corona-viruses are structurally related. Therefore, it is of interest to estimate the degree of conserved antigenic sites among them. It is of importance to elucidate the shared antigenic-sites and extent of conservation between them to understand the evolutionary dynamics of MERS-CoV. Multiple sequence alignment of the spike (S), membrane (M), enveloped (E) and nucleocapsid (N) proteins was employed to identify the sequence conservation among MERS and Bat (HKU4, HKU5) coronaviruses. We used various in silico tools to predict the conserved antigenic sites. We found that MERS-CoV shared 30 % of its S protein antigenic sites with HKU4 and 70 % with HKU5 bat-CoV. Whereas 100 % of its E, M and N protein's antigenic sites are found to be conserved with those in HKU4 and HKU5. This sharing suggests that in case of pathogenicity MERS-CoV is more closely related to HKU5 bat-CoV than HKU4 bat-CoV. The conserved epitopes indicates their evolutionary relationship and ancestry of pathogenicity.

  1. T box transcription antitermination riboswitch: Influence of nucleotide sequence and orientation on tRNA binding by the antiterminator element

    PubMed Central

    Fauzi, Hamid; Agyeman, Akwasi; Hines, Jennifer V.

    2008-01-01

    Many bacteria utilize riboswitch transcription regulation to monitor and appropriately respond to cellular levels of important metabolites or effector molecules. The T box transcription antitermination riboswitch responds to cognate uncharged tRNA by specifically stabilizing an antiterminator element in the 5′-untranslated mRNA leader region and precluding formation of a thermodynamically more stable terminator element. Stabilization occurs when the tRNA acceptor end base pairs with the first four nucleotides in the seven nucleotide bulge of the highly conserved antiterminator element. The significance of the conservation of the antiterminator bulge nucleotides that do not base pair with the tRNA is unknown, but they are required for optimal function. In vitro selection was used to determine if the isolated antiterminator bulge context alone dictates the mode in which the tRNA acceptor end binds the bulge nucleotides. No sequence conservation beyond complementarity was observed and the location was not constrained to the first four bases of the bulge. The results indicate that formation of a structure that recognizes the tRNA acceptor end in isolation is not the determinant driving force for the high phylogenetic sequence conservation observed within the antiterminator bulge. Additional factors or T box leader features more likely influenced the phylogenetic sequence conservation. PMID:19152843

  2. Sequence conservation, HLA-E-Restricted peptide, and best-defined CTL/CD8+ epitopes in gag P24 (capsid) of HIV-1 subtype B

    NASA Astrophysics Data System (ADS)

    Prasetyo, Afiono Agung; Dharmawan, Ruben; Sari, Yulia; Sariyatun, Ratna

    2017-02-01

    Human immunodeficiency virus type 1 (HIV-1) remains a cause of global health problem. Continuous studies of HIV-1 genetic and immunological profiles are important to find strategies against the virus. This study aimed to conduct analysis of sequence conservation, HLA-E-restricted peptide, and best-defined CTL/CD8+ epitopes in p24 (capsid) of HIV-1 subtype B worldwide. The p24-coding sequences from 3,557 HIV subtype B isolates were aligned using MUSCLE and analysed. Some highly conserved regions (sequence conservation ≥95%) were observed. Two considerably long series of sequences with conservation of 100% was observed at base 349-356 and 550-557 of p24 (HXB2 numbering). The consensus from all aligned isolates was precisely the same as consensus B in the Los Alamos HIV Database. The HLA-E-restricted peptide in amino acid (aa) 14-22 of HIV-1 p24 (AISPRTLNA) was found in 55.9% (1,987/3,557) of HIV-1 subtype B worldwide. Forty-four best-defined CTL/CD8+ epitopes were observed, in which VKNWMTETL epitope (aa 181-189 of p24) restricted by B*4801 was the most frequent, as found in 94.9% of isolates. The results of this study would contribute information about HIV-1 subtype B and benefits for further works willing to develop diagnostic and therapeutic strategies against the virus.

  3. Huntingtin-interacting protein 1 (Hip1) and Hip1-related protein (Hip1R) bind the conserved sequence of clathrin light chains and thereby influence clathrin assembly in vitro and actin distribution in vivo.

    PubMed

    Chen, Chih-Ying; Brodsky, Frances M

    2005-02-18

    Clathrin heavy and light chains form triskelia, which assemble into polyhedral coats of membrane vesicles that mediate transport for endocytosis and organelle biogenesis. Light chain subunits regulate clathrin assembly in vitro by suppressing spontaneous self-assembly of the heavy chains. The residues that play this regulatory role are at the N terminus of a conserved 22-amino acid sequence that is shared by all vertebrate light chains. Here we show that these regulatory residues and others in the conserved sequence mediate light chain interaction with Hip1 and Hip1R. These related proteins were previously found to be enriched in clathrin-coated vesicles and to promote clathrin assembly in vitro. We demonstrate Hip1R binding preference for light chains associated with clathrin heavy chain and show that Hip1R stimulation of clathrin assembly in vitro is blocked by mutations in the conserved sequence of light chains that abolish interaction with Hip1 and Hip1R. In vivo overexpression of a fragment of clathrin light chain comprising the Hip1R-binding region affected cellular actin distribution. Together these results suggest that the roles of Hip1 and Hip1R in affecting clathrin assembly and actin distribution are mediated by their interaction with the conserved sequence of clathrin light chains.

  4. A conserved predicted pseudoknot in the NS2A-encoding sequence of West Nile and Japanese encephalitis flaviviruses suggests NS1' may derive from ribosomal frameshifting

    PubMed Central

    Firth, Andrew E; Atkins, John F

    2009-01-01

    Japanese encephalitis, West Nile, Usutu and Murray Valley encephalitis viruses form a tight subgroup within the larger Flavivirus genus. These viruses utilize a single-polyprotein expression strategy, resulting in ~10 mature proteins. Plotting the conservation at synonymous sites along the polyprotein coding sequence reveals strong conservation peaks at the very 5' end of the coding sequence, and also at the 5' end of the sequence encoding the NS2A protein. Such peaks are generally indicative of functionally important non-coding sequence elements. The second peak corresponds to a predicted stable pseudoknot structure whose biological importance is supported by compensatory mutations that preserve the structure. The pseudoknot is preceded by a conserved slippery heptanucleotide (Y CCU UUU), thus forming a classical stimulatory motif for -1 ribosomal frameshifting. We hypothesize, therefore, that the functional importance of the pseudoknot is to stimulate a portion of ribosomes to shift -1 nt into a short (45 codon), conserved, overlapping open reading frame, termed foo. Since cleavage at the NS1-NS2A boundary is known to require synthesis of NS2A in cis, the resulting transframe fusion protein is predicted to be NS1-NS2AN-term-FOO. We hypothesize that this may explain the origin of the previously identified NS1 'extension' protein in JEV-group flaviviruses, known as NS1'. PMID:19196463

  5. Application of cytochrome b DNA sequences for the authentication of endangered snake species.

    PubMed

    Wong, Ka-Lok; Wang, Jun; But, Paul Pui-Hay; Shaw, Pang-Chui

    2004-01-06

    In order to enforce the conservation program and curbing the illegal trading and consumption of endangered snake species, the value of cytochrome b sequence in the authentication of snake species was evaluated. As an illustration, DNA was extracted, selected cytochrome b DNA sequences amplified and sequenced from six snakes commonly consumed in Hong Kong. Cataloging with sequences available in public, a cytochrome b database containing 90 species of snakes was constructed. In this database, sequence homology between snakes ranged from 70.68 to 95.11%. On the other hand, intraspecific variation of three tested snakes was 0-0.98%. Using the database, we were able to determine the identity of six meat samples confiscated by the Agriculture, Fisheries and Conservation Department, HKSAR.

  6. Insights into the fold organization of TIM barrel from interaction energy based structure networks.

    PubMed

    Vijayabaskar, M S; Vishveshwara, Saraswathi

    2012-01-01

    There are many well-known examples of proteins with low sequence similarity, adopting the same structural fold. This aspect of sequence-structure relationship has been extensively studied both experimentally and theoretically, however with limited success. Most of the studies consider remote homology or "sequence conservation" as the basis for their understanding. Recently "interaction energy" based network formalism (Protein Energy Networks (PENs)) was developed to understand the determinants of protein structures. In this paper we have used these PENs to investigate the common non-covalent interactions and their collective features which stabilize the TIM barrel fold. We have also developed a method of aligning PENs in order to understand the spatial conservation of interactions in the fold. We have identified key common interactions responsible for the conservation of the TIM fold, despite high sequence dissimilarity. For instance, the central beta barrel of the TIM fold is stabilized by long-range high energy electrostatic interactions and low-energy contiguous vdW interactions in certain families. The other interfaces like the helix-sheet or the helix-helix seem to be devoid of any high energy conserved interactions. Conserved interactions in the loop regions around the catalytic site of the TIM fold have also been identified, pointing out their significance in both structural and functional evolution. Based on these investigations, we have developed a novel network based phylogenetic analysis for remote homologues, which can perform better than sequence based phylogeny. Such an analysis is more meaningful from both structural and functional evolutionary perspective. We believe that the information obtained through the "interaction conservation" viewpoint and the subsequently developed method of structure network alignment, can shed new light in the fields of fold organization and de novo computational protein design.

  7. Conservation of tubulin-binding sequences in TRPV1 throughout evolution.

    PubMed

    Sardar, Puspendu; Kumar, Abhishek; Bhandari, Anita; Goswami, Chandan

    2012-01-01

    Transient Receptor Potential Vanilloid sub type 1 (TRPV1), commonly known as capsaicin receptor can detect multiple stimuli ranging from noxious compounds, low pH, temperature as well as electromagnetic wave at different ranges. In addition, this receptor is involved in multiple physiological and sensory processes. Therefore, functions of TRPV1 have direct influences on adaptation and further evolution also. Availability of various eukaryotic genomic sequences in public domain facilitates us in studying the molecular evolution of TRPV1 protein and the respective conservation of certain domains, motifs and interacting regions that are functionally important. Using statistical and bioinformatics tools, our analysis reveals that TRPV1 has evolved about ∼420 million years ago (MYA). Our analysis reveals that specific regions, domains and motifs of TRPV1 has gone through different selection pressure and thus have different levels of conservation. We found that among all, TRP box is the most conserved and thus have functional significance. Our results also indicate that the tubulin binding sequences (TBS) have evolutionary significance as these stretch sequences are more conserved than many other essential regions of TRPV1. The overall distribution of positively charged residues within the TBS motifs is conserved throughout evolution. In silico analysis reveals that the TBS-1 and TBS-2 of TRPV1 can form helical structures and may play important role in TRPV1 function. Our analysis identifies the regions of TRPV1, which are important for structure-function relationship. This analysis indicates that tubulin binding sequence-1 (TBS-1) near the TRP-box forms a potential helix and the tubulin interactions with TRPV1 via TBS-1 have evolutionary significance. This interaction may be required for the proper channel function and regulation and may also have significance in the context of Taxol®-induced neuropathy.

  8. Genetic and structural analyses of cytochrome P450 hydroxylases in sex hormone biosynthesis: Sequential origin and subsequent coevolution.

    PubMed

    Goldstone, Jared V; Sundaramoorthy, Munirathinam; Zhao, Bin; Waterman, Michael R; Stegeman, John J; Lamb, David C

    2016-01-01

    Biosynthesis of steroid hormones in vertebrates involves three cytochrome P450 hydroxylases, CYP11A1, CYP17A1 and CYP19A1, which catalyze sequential steps in steroidogenesis. These enzymes are conserved in the vertebrates, but their origin and existence in other chordate subphyla (Tunicata and Cephalochordata) have not been clearly established. In this study, selected protein sequences of CYP11A1, CYP17A1 and CYP19A1 were compiled and analyzed using multiple sequence alignment and phylogenetic analysis. Our analyses show that cephalochordates have sequences orthologous to vertebrate CYP11A1, CYP17A1 or CYP19A1, and that echinoderms and hemichordates possess CYP11-like but not CYP19 genes. While the cephalochordate sequences have low identity with the vertebrate sequences, reflecting evolutionary distance, the data show apparent origin of CYP11 prior to the evolution of CYP19 and possibly CYP17, thus indicating a sequential origin of these functionally related steroidogenic CYPs. Co-occurrence of the three CYPs in early chordates suggests that the three genes may have coevolved thereafter, and that functional conservation should be reflected in functionally important residues in the proteins. CYP19A1 has the largest number of conserved residues while CYP11A1 sequences are less conserved. Structural analyses of human CYP11A1, CYP17A1 and CYP19A1 show that critical substrate binding site residues are highly conserved in each enzyme family. The results emphasize that the steroidogenic pathways producing glucocorticoids and reproductive steroids are several hundred million years old and that the catalytic structural elements of the enzymes have been conserved over the same period of time. Analysis of these elements may help to identify when precursor functions linked to these enzymes first arose. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants

    PubMed Central

    Barta, Endre; Sebestyén, Endre; Pálfy, Tamás B.; Tóth, Gábor; Ortutay, Csaba P.; Patthy, László

    2005-01-01

    DoOP (http://doop.abc.hu/) is a database of eukaryotic promoter sequences (upstream regions) aiming to facilitate the recognition of regulatory sites conserved between species. The annotated first exons of human and Arabidopsis thaliana genes were used as queries in BLAST searches to collect the most closely related orthologous first exon sequences from Chordata and Viridiplantae species. Up to 3000 bp DNA segments upstream from these first exons constitute the clusters in the chordate and plant sections of the Database of Orthologous Promoters. Release 1.0 of DoOP contains 21 061 chordate clusters from 284 different species and 7548 plant clusters from 269 different species. The database can be used to find and retrieve promoter sequences of a given gene from various species and it is also suitable to see the most trivial conserved sequence blocks in the orthologous upstream regions. Users can search DoOP with either sequence or text (annotation) to find promoter clusters of various genes. In addition to the sequence data, the positions of the conserved sequence blocks derived from multiple alignments, the positions of repetitive elements and the positions of transcription start sites known from the Eukaryotic Promoter Database (EPD) can be viewed graphically. PMID:15608291

  10. DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants.

    PubMed

    Barta, Endre; Sebestyén, Endre; Pálfy, Tamás B; Tóth, Gábor; Ortutay, Csaba P; Patthy, László

    2005-01-01

    DoOP (http://doop.abc.hu/) is a database of eukaryotic promoter sequences (upstream regions) aiming to facilitate the recognition of regulatory sites conserved between species. The annotated first exons of human and Arabidopsis thaliana genes were used as queries in BLAST searches to collect the most closely related orthologous first exon sequences from Chordata and Viridiplantae species. Up to 3000 bp DNA segments upstream from these first exons constitute the clusters in the chordate and plant sections of the Database of Orthologous Promoters. Release 1.0 of DoOP contains 21,061 chordate clusters from 284 different species and 7548 plant clusters from 269 different species. The database can be used to find and retrieve promoter sequences of a given gene from various species and it is also suitable to see the most trivial conserved sequence blocks in the orthologous upstream regions. Users can search DoOP with either sequence or text (annotation) to find promoter clusters of various genes. In addition to the sequence data, the positions of the conserved sequence blocks derived from multiple alignments, the positions of repetitive elements and the positions of transcription start sites known from the Eukaryotic Promoter Database (EPD) can be viewed graphically.

  11. Bioinformatics analysis of plant orthologous introns: identification of an intronic tRNA-like sequence.

    PubMed

    Akkuratov, Evgeny E; Walters, Lorraine; Saha-Mandal, Arnab; Khandekar, Sushant; Crawford, Erin; Zirbel, Craig L; Leisner, Scott; Prakash, Ashwin; Fedorova, Larisa; Fedorov, Alexei

    2014-09-10

    Orthologous introns have identical positions relative to the coding sequence in orthologous genes of different species. By analyzing the complete genomes of five plants we generated a database of 40,512 orthologous intron groups of dicotyledonous plants, 28,519 orthologous intron groups of angiosperms, and 15,726 of land plants (moss and angiosperms). Multiple sequence alignments of each orthologous intron group were obtained using the Mafft algorithm. The number of conserved regions in plant introns appeared to be hundreds of times fewer than that in mammals or vertebrates. Approximately three quarters of conserved intronic regions among angiosperms and dicots, in particular, correspond to alternatively-spliced exonic sequences. We registered only a handful of conserved intronic ncRNAs of flowering plants. However, the most evolutionarily conserved intronic region, which is ubiquitous for all plants examined in this study, including moss, possessed multiple structural features of tRNAs, which caused us to classify it as a putative tRNA-like ncRNA. Intronic sequences encoding tRNA-like structures are not unique to plants. Bioinformatics examination of the presence of tRNA inside introns revealed an unusually long-term association of four glycine tRNAs inside the Vac14 gene of fish, amniotes, and mammals. Copyright © 2014 Elsevier B.V. All rights reserved.

  12. High-Throughput Sequencing of Arabidopsis microRNAs: Evidence for Frequent Birth and Death of MIRNA Genes

    PubMed Central

    Fahlgren, Noah; Howell, Miya D.; Kasschau, Kristin D.; Chapman, Elisabeth J.; Sullivan, Christopher M.; Cumbie, Jason S.; Givan, Scott A.; Law, Theresa F.; Grant, Sarah R.; Dangl, Jeffery L.; Carrington, James C.

    2007-01-01

    In plants, microRNAs (miRNAs) comprise one of two classes of small RNAs that function primarily as negative regulators at the posttranscriptional level. Several MIRNA genes in the plant kingdom are ancient, with conservation extending between angiosperms and the mosses, whereas many others are more recently evolved. Here, we use deep sequencing and computational methods to identify, profile and analyze non-conserved MIRNA genes in Arabidopsis thaliana. 48 non-conserved MIRNA families, nearly all of which were represented by single genes, were identified. Sequence similarity analyses of miRNA precursor foldback arms revealed evidence for recent evolutionary origin of 16 MIRNA loci through inverted duplication events from protein-coding gene sequences. Interestingly, these recently evolved MIRNA genes have taken distinct paths. Whereas some non-conserved miRNAs interact with and regulate target transcripts from gene families that donated parental sequences, others have drifted to the point of non-interaction with parental gene family transcripts. Some young MIRNA loci clearly originated from one gene family but form miRNAs that target transcripts in another family. We suggest that MIRNA genes are undergoing relatively frequent birth and death, with only a subset being stabilized by integration into regulatory networks. PMID:17299599

  13. Conserved noncoding sequences conserve biological networks and influence genome evolution.

    PubMed

    Xie, Jianbo; Qian, Kecheng; Si, Jingna; Xiao, Liang; Ci, Dong; Zhang, Deqiang

    2018-05-01

    Comparative genomics approaches have identified numerous conserved cis-regulatory sequences near genes in plant genomes. Despite the identification of these conserved noncoding sequences (CNSs), our knowledge of their functional importance and selection remains limited. Here, we used a combination of DNA methylome analysis, microarray expression analyses, and functional annotation to study these sequences in the model tree Populus trichocarpa. Methylation in CG contexts and non-CG contexts was lower in CNSs, particularly CNSs in the 5'-upstream regions of genes, compared with other sites in the genome. We observed that CNSs are enriched in genes with transcription and binding functions, and this also associated with syntenic genes and those from whole-genome duplications, suggesting that cis-regulatory sequences play a key role in genome evolution. We detected a significant positive correlation between CNS number and protein interactions, suggesting that CNSs may have roles in the evolution and maintenance of biological networks. The divergence of CNSs indicates that duplication-degeneration-complementation drives the subfunctionalization of a proportion of duplicated genes from whole-genome duplication. Furthermore, population genomics confirmed that most CNSs are under strong purifying selection and only a small subset of CNSs shows evidence of adaptive evolution. These findings provide a foundation for future studies exploring these key genomic features in the maintenance of biological networks, local adaptation, and transcription.

  14. Chromosome ends: different sequences may provide conserved functions.

    PubMed

    Louis, Edward J; Vershinin, Alexander V

    2005-07-01

    The structures of specific chromosome regions, centromeres and telomeres, present a number of puzzles. As functions performed by these regions are ubiquitous and essential, their DNA, proteins and chromatin structure are expected to be conserved. Recent studies of centromeric DNA from human, Drosophila and plant species have demonstrated that a hidden universal centromere-specific sequence is highly unlikely. The DNA of telomeres is more conserved consisting of a tandemly repeated 6-8 bp Arabidopsis-like sequence in a majority of organisms as diverse as protozoan, fungi, mammals and plants. However, there are alternatives to short DNA repeats at the ends of chromosomes and for telomere elongation by telomerase. Here we focus on the similarities and diversity that exist among the structural elements, DNA sequences and proteins, that make up terminal domains (telomeres and subtelomeres), and how organisms use these in different ways to fulfil the functions of end-replication and end-protection. Copyright (c) 2005 Wiley Periodicals, Inc.

  15. Comparative genomic analysis of the false killer whale (Pseudorca crassidens) LMBR1 locus.

    PubMed

    Kim, Dae-Won; Choi, Sang-Haeng; Kim, Ryong Nam; Kim, Sun-Hong; Paik, Sang-Gi; Nam, Seong-Hyeuk; Kim, Dong-Wook; Kim, Aeri; Kang, Aram; Park, Hong-Seog

    2010-09-01

    The sequencing and comparative genomic analysis of LMBR1 loci in mammals or other species, including human, would be very important in understanding evolutionary genetic changes underlying the evolution of limb development. In this regard, comparative genomic annotation of the false killer whale LMBR1 locus could shed new light on the evolution of limb development. We sequenced two false killer whale BAC clones, corresponding to 156 kb and 144 kb, respectively, harboring the tightly linked RNF32, LMBR1, and NOM1 genes. Our annotation of the false killer whale LMBR1 gene showed that it consists of 17 exons (1473 bp), in contrast to 18 exons (1596 bp) in human, and it displays 93.1% and 95.6% nucleotide and amino acid sequence similarity, respectively, compared with the human gene. In particular, we discovered that exon 10, deleted in the false killer whale LMBR1 gene, is present only in primates, and this fact strongly implies that exon 10 might be crucial in determining primate-specific limb development. ZRS and TFBS sequences have been well conserved across 11 species, suggesting that these regions could be involved in an important function of limb development and limb patterning. The neighboring gene RNF32 showed several lineage-conserved exons, such as exons 2 through 9 conserved in eutherian mammals, exons 3 through 9 conserved in mammals, and exons 5 through 9 conserved in vertebrates. The other neighboring gene, NOM1, had undergone a substitution (ATG→GTA) at the start codon, giving rise to a 36 bp shorter N-terminal sequence compared with the human sequence. Our comparative analysis of the false killer whale LMBR1 genomic locus provides important clues regarding the genetic regions that may play crucial roles in limb development and patterning.

  16. Defining and predicting structurally conserved regions in protein superfamilies

    PubMed Central

    Huang, Ivan K.; Grishin, Nick V.

    2013-01-01

    Motivation: The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. Results: Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. Availability: The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. Contact: 91huangi@gmail.com or grishin@chop.swmed.edu Supplementary information: Supplementary data are available at Bioinformatics Online PMID:23193223

  17. AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

    PubMed Central

    2010-01-01

    Background Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly". Conclusions AlignMiner can be used to reliably detect divergent regions via several scoring methods that provide different levels of selectivity. Its predictions have been verified by experimental means. Hence, it is expected that its usage will save researchers' time and ensure an objective selection of the best-possible divergent region when closely related sequences are analysed. AlignMiner is freely available at http://www.scbi.uma.es/alignminer. PMID:20525162

  18. Global priorities for conservation across multiple dimensions of mammalian diversity

    PubMed Central

    Graham, Catherine H.; Costa, Gabriel C.; Hedges, S. Blair; Penone, Caterina; Radeloff, Volker C.; Rondinini, Carlo; Davidson, Ana D.

    2017-01-01

    Conservation priorities that are based on species distribution, endemism, and vulnerability may underrepresent biologically unique species as well as their functional roles and evolutionary histories. To ensure that priorities are biologically comprehensive, multiple dimensions of diversity must be considered. Further, understanding how the different dimensions relate to one another spatially is important for conservation prioritization, but the relationship remains poorly understood. Here, we use spatial conservation planning to (i) identify and compare priority regions for global mammal conservation across three key dimensions of biodiversity—taxonomic, phylogenetic, and traits—and (ii) determine the overlap of these regions with the locations of threatened species and existing protected areas. We show that priority areas for mammal conservation exhibit low overlap across the three dimensions, highlighting the need for an integrative approach for biodiversity conservation. Additionally, currently protected areas poorly represent the three dimensions of mammalian biodiversity. We identify areas of high conservation priority among and across the dimensions that should receive special attention for expanding the global protected area network. These high-priority areas, combined with areas of high priority for other taxonomic groups and with social, economic, and political considerations, provide a biological foundation for future conservation planning efforts. PMID:28674013

  19. Global priorities for conservation across multiple dimensions of mammalian diversity.

    PubMed

    Brum, Fernanda T; Graham, Catherine H; Costa, Gabriel C; Hedges, S Blair; Penone, Caterina; Radeloff, Volker C; Rondinini, Carlo; Loyola, Rafael; Davidson, Ana D

    2017-07-18

    Conservation priorities that are based on species distribution, endemism, and vulnerability may underrepresent biologically unique species as well as their functional roles and evolutionary histories. To ensure that priorities are biologically comprehensive, multiple dimensions of diversity must be considered. Further, understanding how the different dimensions relate to one another spatially is important for conservation prioritization, but the relationship remains poorly understood. Here, we use spatial conservation planning to ( i ) identify and compare priority regions for global mammal conservation across three key dimensions of biodiversity-taxonomic, phylogenetic, and traits-and ( ii ) determine the overlap of these regions with the locations of threatened species and existing protected areas. We show that priority areas for mammal conservation exhibit low overlap across the three dimensions, highlighting the need for an integrative approach for biodiversity conservation. Additionally, currently protected areas poorly represent the three dimensions of mammalian biodiversity. We identify areas of high conservation priority among and across the dimensions that should receive special attention for expanding the global protected area network. These high-priority areas, combined with areas of high priority for other taxonomic groups and with social, economic, and political considerations, provide a biological foundation for future conservation planning efforts.

  20. Insights into the phylogenetic positions of photosynthetic bacteria obtained from 5S rRNA and 16S rRNA sequence data

    NASA Technical Reports Server (NTRS)

    Fox, G. E.

    1985-01-01

    Comparisons of complete 16S ribosomal ribonucleic acid (rRNA) sequences established that the secondary structure of these molecules is highly conserved. Earlier work with 5S rRNA secondary structure revealed that when structural conservation exists the alignment of sequences is straightforward. The constancy of structure implies minimal functional change. Under these conditions a uniform evolutionary rate can be expected so that conditions are favorable for phylogenetic tree construction.

  1. Characteristics of the Lotus japonicus gene repertoire deduced from large-scale expressed sequence tag (EST) analysis.

    PubMed

    Asamizu, Erika; Nakamura, Yasukazu; Sato, Shusei; Tabata, Satoshi

    2004-02-01

    To perform a comprehensive analysis of genes expressed in a model legume, Lotus japonicus, a total of 74472 3'-end expressed sequence tags (EST) were generated from cDNA libraries produced from six different organs. Clustering of sequences was performed with an identity criterion of 95% for 50 bases, and a total of 20457 non-redundant sequences, 8503 contigs and 11954 singletons were generated. EST sequence coverage was analyzed by using the annotated L. japonicus genomic sequence and 1093 of the 1889 predicted protein-encoding genes (57.9%) were hit by the EST sequence(s). Gene content was compared to several plant species. Among the 8503 contigs, 471 were identified as sequences conserved only in leguminous species and these included several disease resistance-related genes. This suggested that in legumes, these genes may have evolved specifically to resist pathogen attack. The rate of gene sequence divergence was assessed by comparing similarity level and functional category based on the Gene Ontology (GO) annotation of Arabidopsis genes. This revealed that genes encoding ribosomal proteins, as well as those related to translation, photosynthesis, and cellular structure were more abundantly represented in the highly conserved class, and that genes encoding transcription factors and receptor protein kinases were abundantly represented in the less conserved class. To make the sequence information and the cDNA clones available to the research community, a Web database with useful services was created at http://www.kazusa.or.jp/en/plant/lotus/EST/.

  2. The Declining Cocoa Economy and the Atlantic Forest of Southern Bahia, Brazil: Conservation Attitudes of Cocoa Planters.

    ERIC Educational Resources Information Center

    Alger, Keith; Caldas, Marcellus

    1994-01-01

    Causes of the degradation of Brazilian Atlantic Forest in the southeastern cocoa region of the State of Bahia are investigated by means of a survey on cocoa planter's forest conservation attitudes. Policies encouraging private forest conservation, and development of forest-conserving agricultural alternatives for landless poor are recommended. (LZ)

  3. Determination of the promoter region of mouse ribosomal RNA gene by an in vitro transcription system.

    PubMed Central

    Yamamoto, O; Takakusa, N; Mishima, Y; Kominami, R; Muramatsu, M

    1984-01-01

    Sequences required for a faithful and efficient transcription of a cloned mouse ribosomal RNA gene (rDNA) are determined by testing a series of deletion mutants in an in vitro transcription system utilizing two kinds of mouse cellular extract. Deletion of sequences upstream of -40 or downstream of +52 causes only slight reduction in promoter activity as compared with the "wild-type" template. For upstream deletion mutants, the removal of a sequence between -40 and -35 causes a significant decrease in the capacity to direct efficient initiation. This decrease becomes more pronounced when the deletion reaches -32 and the sequence A-T-C-T-T-T, conserved among mouse, rat, and human rDNAs, is lost. Residual template activity is further reduced as more upstream sequence is deleted and finally becomes undetectable when the deletion is extended from -22 down to -17, corresponding to the loss of the conserved sequence T-A-T-T-G. As for downstream deletion mutants, the removal of the sequence downstream of +23 causes some (and further deletions up to +11 cause a more) serious decrease in template activity in vitro. These deletions involve other conserved sequences downstream of the transcription start site. However, the removal of the original transcription start site does not abolish the transcription initiation completely, provided that the whole upstream sequence is intact. Images PMID:6320178

  4. Myxobolus cerebralis internal transcribed spacer 1 (ITS-1) sequences support recent spread of the parasite to North America and within Europe

    USGS Publications Warehouse

    Whipps, Christopher M.; El-Matbouli, M.; Hedrick, R.P.; Blazer, V.; Kent, M.L.

    2004-01-01

    Molecular approaches for resolving relationships among the Myxozoa have relied mainly on small subunit (SSU) ribosomal DNA (rDNA) sequence analysis. This region of the gene is generally used for higher phylogenetic studies, and the conservative nature of this gene may make it inadequate for intraspecific comparisons. Previous intraspecific studies of Myxobolus cerebralis based on molecular analyses reported that the sequence of SSU rDNA and the internal transcribed spacer (ITS) were highly conserved in representatives of the parasite from North America and Europe. Considering that the ITS is usually a more variable region than the SSU, we reanalyzed available sequences on GenBank and obtained sequences from other M. cerebralis representatives from the states of California and West Virginia in the USA and from Germany and Russia. With the exception of 7 base pairs, most of the sequence designated as ITS-1 in GenBank was a highly conserved portion of the rDNA near the 3-prime end of the SSU region. Nonetheless, the additional ITS-1 sequences obtained from the available geographic representatives were well conserved. It is unlikely that we would have observed virtually identical ITS-1 sequences between European and American M. cerebralis samples had it spread naturally over time, particularly when compared to the variation seen between isolates of another myxozoan (Kudoa thyrsites) that has most likely spread naturally. These data further support the hypothesis that the current distribution of M. cerebralis in North America is a result of recent introductions followed by dispersal via anthropogenic means, largely through the stocking of infected trout for sport fishing.

  5. DsaV methyltransferase and its isoschizomers contain a conserved segment that is similar to the segment in Hhai methyltransferase that is in contact with DNA bases.

    PubMed Central

    Gopal, J; Yebra, M J; Bhagwat, A S

    1994-01-01

    The methyltransferase (MTase) in the DsaV restriction--modification system methylates within 5'-CCNGG sequences. We have cloned the gene for this MTase and determined its sequence. The predicted sequence of the MTase protein contains sequence motifs conserved among all cytosine-5 MTases and is most similar to other MTases that methylate CCNGG sequences, namely M.ScrFI and M.SsoII. All three MTases methylate the internal cytosine within their recognition sequence. The 'variable' region within the three enzymes that methylate CCNGG can be aligned with the sequences of two enzymes that methylate CCWGG sequences. Remarkably, two segments within this region contain significant similarity with the region of M.HhaI that is known to contact DNA bases. These alignments suggest that many cytosine-5 MTases are likely to interact with DNA using a similar structural framework. Images PMID:7971279

  6. smRNAome profiling to identify conserved and novel microRNAs in Stevia rebaudiana Bertoni

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) constitute a family of small RNA (sRNA) population that regulates the gene expression and plays an important role in plant development, metabolism, signal transduction and stress response. Extensive studies on miRNAs have been performed in different plants such as Arabidopsis thaliana, Oryza sativa etc. and volume of the miRNA database, mirBASE, has been increasing on day to day basis. Stevia rebaudiana Bertoni is an important perennial herb which accumulates high concentrations of diterpene steviol glycosides which contributes to its high indexed sweetening property with no calorific value. Several studies have been carried out for understanding molecular mechanism involved in biosynthesis of these glycosides, however, information about miRNAs has been lacking in S. rebaudiana. Deep sequencing of small RNAs combined with transcriptomic data is a powerful tool for identifying conserved and novel miRNAs irrespective of availability of genome sequence data. Results To identify miRNAs in S. rebaudiana, sRNA library was constructed and sequenced using Illumina genome analyzer II. A total of 30,472,534 reads representing 2,509,190 distinct sequences were obtained from sRNA library. Based on sequence similarity, we identified 100 miRNAs belonging to 34 highly conserved families. Also, we identified 12 novel miRNAs whose precursors were potentially generated from stevia EST and nucleotide sequences. All novel sequences have not been earlier described in other plant species. Putative target genes were predicted for most conserved and novel miRNAs. The predicted targets are mainly mRNA encoding enzymes regulating essential plant metabolic and signaling pathways. Conclusions This study led to the identification of 34 highly conserved miRNA families and 12 novel potential miRNAs indicating that specific miRNAs exist in stevia species. Our results provided information on stevia miRNAs and their targets building a foundation for future studies to understand their roles in key stevia traits. PMID:23116282

  7. smRNAome profiling to identify conserved and novel microRNAs in Stevia rebaudiana Bertoni.

    PubMed

    Mandhan, Vibha; Kaur, Jagdeep; Singh, Kashmir

    2012-11-01

    MicroRNAs (miRNAs) constitute a family of small RNA (sRNA) population that regulates the gene expression and plays an important role in plant development, metabolism, signal transduction and stress response. Extensive studies on miRNAs have been performed in different plants such as Arabidopsis thaliana, Oryza sativa etc. and volume of the miRNA database, mirBASE, has been increasing on day to day basis. Stevia rebaudiana Bertoni is an important perennial herb which accumulates high concentrations of diterpene steviol glycosides which contributes to its high indexed sweetening property with no calorific value. Several studies have been carried out for understanding molecular mechanism involved in biosynthesis of these glycosides, however, information about miRNAs has been lacking in S. rebaudiana. Deep sequencing of small RNAs combined with transcriptomic data is a powerful tool for identifying conserved and novel miRNAs irrespective of availability of genome sequence data. To identify miRNAs in S. rebaudiana, sRNA library was constructed and sequenced using Illumina genome analyzer II. A total of 30,472,534 reads representing 2,509,190 distinct sequences were obtained from sRNA library. Based on sequence similarity, we identified 100 miRNAs belonging to 34 highly conserved families. Also, we identified 12 novel miRNAs whose precursors were potentially generated from stevia EST and nucleotide sequences. All novel sequences have not been earlier described in other plant species. Putative target genes were predicted for most conserved and novel miRNAs. The predicted targets are mainly mRNA encoding enzymes regulating essential plant metabolic and signaling pathways. This study led to the identification of 34 highly conserved miRNA families and 12 novel potential miRNAs indicating that specific miRNAs exist in stevia species. Our results provided information on stevia miRNAs and their targets building a foundation for future studies to understand their roles in key stevia traits.

  8. Sequence analysis of the L protein of the Ebola 2014 outbreak: Insight into conserved regions and mutations.

    PubMed

    Ayub, Gohar; Waheed, Yasir

    2016-06-01

    The 2014 Ebola outbreak was one of the largest that have occurred; it started in Guinea and spread to Nigeria, Liberia and Sierra Leone. Phylogenetic analysis of the current virus species indicated that this outbreak is the result of a divergent lineage of the Zaire ebolavirus. The L protein of Ebola virus (EBOV) is the catalytic subunit of the RNA‑dependent RNA polymerase complex, which, with VP35, is key for the replication and transcription of viral RNA. Earlier sequence analysis demonstrated that the L protein of all non‑segmented negative‑sense (NNS) RNA viruses consists of six domains containing conserved functional motifs. The aim of the present study was to analyze the presence of these motifs in 2014 EBOV isolates, highlight their function and how they may contribute to the overall pathogenicity of the isolates. For this purpose, 81 2014 EBOV L protein sequences were aligned with 475 other NNS RNA viruses, including Paramyxoviridae and Rhabdoviridae viruses. Phylogenetic analysis of all EBOV outbreak L protein sequences was also performed. Analysis of the amino acid substitutions in the 2014 EBOV outbreak was conducted using sequence analysis. The alignment demonstrated the presence of previously conserved motifs in the 2014 EBOV isolates and novel residues. Notably, all the mutations identified in the 2014 EBOV isolates were tolerant, they were pathogenic with certain examples occurring within previously determined functional conserved motifs, possibly altering viral pathogenicity, replication and virulence. The phylogenetic analysis demonstrated that all sequences with the exception of the 2014 EBOV sequences were clustered together. The 2014 EBOV outbreak has acquired a great number of mutations, which may explain the reasons behind this unprecedented outbreak. Certain residues critical to the function of the polymerase remain conserved and may be targets for the development of antiviral therapeutic agents.

  9. Identification of Regulatory Elements That Control PPARγ Expression in Adipocyte Progenitors

    PubMed Central

    Chou, Wen-Ling; Galmozzi, Andrea; Partida, David; Kwan, Kevin; Yeung, Hui; Su, Andrew I.; Saez, Enrique

    2013-01-01

    Adipose tissue renewal and obesity-driven expansion of fat cell number are dependent on proliferation and differentiation of adipose progenitors that reside in the vasculature that develops in coordination with adipose depots. The transcriptional events that regulate commitment of progenitors to the adipose lineage are poorly understood. Because expression of the nuclear receptor PPARγ defines the adipose lineage, isolation of elements that control PPARγ expression in adipose precursors may lead to discovery of transcriptional regulators of early adipocyte determination. Here, we describe the identification and validation in transgenic mice of 5 highly conserved non-coding sequences from the PPARγ locus that can drive expression of a reporter gene in a manner that recapitulates the tissue-specific pattern of PPARγ expression. Surprisingly, these 5 elements appear to control PPARγ expression in adipocyte precursors that are associated with the vasculature of adipose depots, but not in mature adipocytes. Characterization of these five PPARγ regulatory sequences may enable isolation of the transcription factors that bind these cis elements and provide insight into the molecular regulation of adipose tissue expansion in normal and pathological states. PMID:24009687

  10. Transcriptome identification of putative genes involved in protein catabolism and innate immune response in human body louse (Pediculicidae: Pediculus humanus).

    PubMed

    Pedra, Joao H F; Brandt, Amanda; Li, Hong-Mei; Westerman, Rick; Romero-Severson, Jeanne; Pollack, Richard J; Murdock, Larry L; Pittendrigh, Barry R

    2003-11-01

    Genomics information relating to human body lice is surprisingly scarce, and this has constrained studies of their physiology, immunology and vector biology. To identify novel body louse genes, we used engorged adult lice to generate a cDNA library. Initially, 1152 clones were screened for inserts, edited for removal of vector sequences and base pairs of poor quality, and viewed for splicing variations, gene families and polymorphism. Computational methods identified 506 inferred open reading frames including the first predicted louse defensin. The inferred defensin aligns well with other insect defensins and has highly conserved cysteine residues, as are known for other defensin sequences. Two cysteine and five serine proteinases were categorized according to their inferred catalytic sites. We also discovered seven putative ubiquitin-pathway genes and four iron metabolizing deduced enzymes. Finally, glutathione-S-transferases and cytochrome P450 genes were among the detoxification enzymes found. Results from this first systematic effort to discover human body louse genes should promote further studies in Phthiraptera and lice.

  11. Detection of hyper-conserved regions in hepatitis B virus X gene potentially useful for gene therapy.

    PubMed

    González, Carolina; Tabernero, David; Cortese, Maria Francesca; Gregori, Josep; Casillas, Rosario; Riveiro-Barciela, Mar; Godoy, Cristina; Sopena, Sara; Rando, Ariadna; Yll, Marçal; Lopez-Martinez, Rosa; Quer, Josep; Esteban, Rafael; Buti, Maria; Rodríguez-Frías, Francisco

    2018-05-21

    To detect hyper-conserved regions in the hepatitis B virus (HBV) X gene ( HBX ) 5' region that could be candidates for gene therapy. The study included 27 chronic hepatitis B treatment-naive patients in various clinical stages (from chronic infection to cirrhosis and hepatocellular carcinoma, both HBeAg-negative and HBeAg-positive), and infected with HBV genotypes A-F and H. In a serum sample from each patient with viremia > 3.5 log IU/mL, the HBX 5' end region [nucleotide (nt) 1255-1611] was PCR-amplified and submitted to next-generation sequencing (NGS). We assessed genotype variants by phylogenetic analysis, and evaluated conservation of this region by calculating the information content of each nucleotide position in a multiple alignment of all unique sequences (haplotypes) obtained by NGS. Conservation at the HBx protein amino acid (aa) level was also analyzed. NGS yielded 1333069 sequences from the 27 samples, with a median of 4578 sequences/sample (2487-9279, IQR 2817). In 14/27 patients (51.8%), phylogenetic analysis of viral nucleotide haplotypes showed a complex mixture of genotypic variants. Analysis of the information content in the haplotype multiple alignments detected 2 hyper-conserved nucleotide regions, one in the HBX upstream non-coding region (nt 1255-1286) and the other in the 5' end coding region (nt 1519-1603). This last region coded for a conserved amino acid region (aa 63-76) that partially overlaps a Kunitz-like domain. Two hyper-conserved regions detected in the HBX 5' end may be of value for targeted gene therapy, regardless of the patients' clinical stage or HBV genotype.

  12. Genomic structure of the human D-site binding protein (DBP) gene

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shutler, G.; Glassco, T.; Kang, Xiaolin

    1996-06-15

    The human gene for the D-Site Binding Protein (DBP) has been sequenced and characterized. This gene is a member of the b/ZIP family of transcription factors and is one of three genes forming the PAR sub-family. DBP has been implicated in the diurnal regulation of a variety of liver-specific genes. Examination of the genomic structure of DBP reveals that the gene is divided into four exons and is contained within a relatively compact region of approximately 6 kb. These exons appear to correspond to functional divisions the DBP protein. Exon 1 contains a long 5{prime} UTR, and conservation between themore » rat and the human genes of the presence of small open reading frames within this region suggests that is may play a role in translational control. Exon 2 contains a limited region of similarity to the other PAR domain genes, which may be part of a potential activation domain. Exon 3 contains the PAR domain and differs by only 1 of 71 amino acids between rat and human. Exon 4, containing both the basic and the leucine zipper domains, is likewise highly conserved. The overall degree of homology between the rat and the human cDNA sequences is 82% for the nucleic acid sequence and 92% for the protein sequence. comparison of the rat and human proximal promoters reveals extensive sequence conservation, with two previously characterized DNA binding sites being conserved at the functional and sequence levels. 31 refs., 4 figs.« less

  13. Analysis of heterogeneity of Copia-like retrotransposons in the genome of cassava (Manihot esculenta Crantz).

    PubMed

    Gbadegesin, Micheal A; Beeching, John R

    2011-12-20

    Retrotransposons are ubiquitous in eukaryotic genomes and now proving to be useful genetic tools for genetic diversity and phylogenetic analyses, especially in plants. In order to assess the diversity of Ty1/Copia-like retrotransposons of cassava, we used PCR primers anchored on the conserved domains of reverse transcriptases (RTs) to amplify cassava Ty1/Copia-like RT. The PCR product was cloned and sequenced. Sequences analysis of the clones revealed the presence of 69 families of Ty1/Copia-like retrotransposon in the genome of cassava. Comparative analyses of the predicted amino acid sequences of these clones with those of other plants showed that retroelements of this class are very heterogeneous in cassava. Cassava is widely grown for its edible roots in the tropical and subtropical regions of the world. Cassava roots, though poor in protein, are rich in starch (makes up about 80% of the dry matter), vitamin C, carotenes, calcium and potassium. It has a great commercial importance as a source of starch and starch based products. Realizing the importance of cassava, it stands out as a crop to benefit from biotechnology development. Heterogeneity of Mecops (Manihot esculenta copia-like Retrotransposons) showed that they may be useful for genetic diversity and phylogenetic analyses of cassava germplasm.

  14. Human CD4 T cell epitopes selective for Vaccinia versus Variola virus.

    PubMed

    Probst, Alicia; Besse, Aurore; Favry, Emmanuel; Imbert, Gilles; Tanchou, Valérie; Castelli, Florence Anne; Maillere, Bernard

    2013-04-01

    Due to the high degree of sequence identity between Orthopoxvirus species, the specific B and T cell responses raised against these viruses are largely cross-reactive and poorly selective. We therefore searched for CD4 T cell epitopes present in the conserved parts of the Vaccinia genome (VACV) but absent from Variola viruses (VARV), with a view to identifying immunogenic sequences selective for VACV. We identified three long peptide fragments from the B7R, B10R and E7R proteins by in silico comparisons of the poxvirus genomes, and evaluated the recognition of these fragments by VACV-specific T cell lines derived from healthy donors. For the 12 CD4 T cell epitopes identified, we assessed their binding to common HLA-DR allotypes and their capacity to induce peptide-specific CD4 T-cell lines. Four peptides from B7R and B10R displayed a broad binding specificity for HLA-DR molecules and induced multiple T cell lines from healthy donors. Besides their absence from VARV, the two B10R peptide sequences were mutated in the Cowpox virus and completely absent from the Monkeypox genome. This work contributes to the development of differential diagnosis of poxvirus infections. Copyright © 2012 Elsevier Ltd. All rights reserved.

  15. Panax ginseng genome examination for ginsenoside biosynthesis.

    PubMed

    Xu, Jiang; Chu, Yang; Liao, Baosheng; Xiao, Shuiming; Yin, Qinggang; Bai, Rui; Su, He; Dong, Linlin; Li, Xiwen; Qian, Jun; Zhang, Jingjing; Zhang, Yujun; Zhang, Xiaoyan; Wu, Mingli; Zhang, Jie; Li, Guozheng; Zhang, Lei; Chang, Zhenzhan; Zhang, Yuebin; Jia, Zhengwei; Liu, Zhixiang; Afreh, Daniel; Nahurira, Ruth; Zhang, Lianjuan; Cheng, Ruiyang; Zhu, Yingjie; Zhu, Guangwei; Rao, Wei; Zhou, Chao; Qiao, Lirui; Huang, Zhihai; Cheng, Yung-Chi; Chen, Shilin

    2017-11-01

    Ginseng, which contains ginsenosides as bioactive compounds, has been regarded as an important traditional medicine for several millennia. However, the genetic background of ginseng remains poorly understood, partly because of the plant's large and complex genome composition. We report the entire genome sequence of Panax ginseng using next-generation sequencing. The 3.5-Gb nucleotide sequence contains more than 60% repeats and encodes 42 006 predicted genes. Twenty-two transcriptome datasets and mass spectrometry images of ginseng roots were adopted to precisely quantify the functional genes. Thirty-one genes were identified to be involved in the mevalonic acid pathway. Eight of these genes were annotated as 3-hydroxy-3-methylglutaryl-CoA reductases, which displayed diverse structures and expression characteristics. A total of 225 UDP-glycosyltransferases (UGTs) were identified, and these UGTs accounted for one of the largest gene families of ginseng. Tandem repeats contributed to the duplication and divergence of UGTs. Molecular modeling of UGTs in the 71st, 74th, and 94th families revealed a regiospecific conserved motif located at the N-terminus. Molecular docking predicted that this motif captures ginsenoside precursors. The ginseng genome represents a valuable resource for understanding and improving the breeding, cultivation, and synthesis biology of this key herb. © The Author 2017. Published by Oxford University Press.

  16. [Case control study of fractures-dislocations of ankle joint with conservative and operative treatment].

    PubMed

    Zhang, Song-Tu; Lin, Yi-Rong; Chen, Lian-Yuan

    2010-10-01

    To compare the clinical efficacy of grade III, IV supination-eversion fractures-dislocations of ankle joint between manipulative treatment and operative treatment. From September 2007 to December 2008, the clinical data of 60 patients with grade III, IV supination-eversion fractures-dislocations of ankle joint were retrospectively analyzed. There were 32 males and 28 females, ranging in age from 18 to 70 years with an average age of 38.17 years. All patients were respectively treated with manipulative treatment (conservative group, 30 cases) and operative treatment (operative group, 30 cases). The joint function was compared with Mazur standard; the reduction and shifting of fractures were observed with X-ray; the hospitalization day and the therapeutic cost were compared between two groups. All patients were followed up with an average of 15.27 months (ranged, 6 to 25 months). In conservative group, 16 cases got excellent result in joint function, 10 good, 3 fair, 1 poor; in operative group, 20 cases got excellent result, 8 good, 2 fair, 0 poor. In conservative group in the X-ray showed 25 cases obtained excellent and good reduction, 4 fair, 1 poor; and in operative group in the X-ray showed 28 cases obtained excellent and good reduction, 2 fair, 0 poor. There was no significant difference at the joint function and X-ray film after treatment between two groups (P > 0.05). The hospital day was respectively (7.87 +/- 3.34), (17.37 +/- 4.64) d in conservative group and operative group; and the therapeutic cost was respectively (2 506.67 +/- 649.10), (11 473.33 +/- 1 564.90) yuan. There was significant difference at hospital day and therapeutic cost between two groups (P < 0.05). Conservative treatment and operative treatment can both reach a very good result in treating grade III, IV supination-eversion fractures and dislocations of ankle joint. However, conservative treatment has advantage of high safety factor, low therapeutic cost, can reduce medical costs for patients.

  17. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations.

    PubMed

    Fuentes-Pardo, Angela P; Ruzzante, Daniel E

    2017-10-01

    Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology. © 2017 John Wiley & Sons Ltd.

  18. The complete mitochondrial genome of Lota lota (Gadiformes: Gadidae) from the Burqin River in China.

    PubMed

    Lu, Zhichuang; Zhang, Nan; Song, Na; Gao, Tianxiang

    2016-05-01

    In this study, the complete mitochondrial genome (mitogenome) sequence of Lota lota has been determined by long polymerase chain reaction and primer walking methods. The mitogenome is a circular molecule of 16,519 bp in length and contains 37 mitochondrial genes including 13 protein-coding genes, 2 ribosomal RNA (rRNA), 22 transfer RNA (tRNA) and a control region as other bony fishes. Within the control region, we identified the termination-associated sequence domain (TAS), the central conserved sequence block domains (CSB-F and CSB-D), and the conserved sequence block domains (CSB-1, CSB-2 and CSB-3).

  19. Hepatitis C virus genotypes in Singapore and Indonesia.

    PubMed

    Ng, W C; Guan, R; Tan, M F; Seet, B L; Lim, C A; Ngiam, C M; Sjaifoellah Noer, H M; Lesmana, L

    1995-01-01

    5' untranslated and partial core (C) region sequence of hepatitis C virus (HCV) in 21 Singaporean and 15 Indonesian isolates were amplified by reverse-transcription polymerase chain reaction and sequenced with the use of conserved primer sequences deduced from HCV genomes identified in other geographical regions. The HCV genotypes are predominantly that of Simmonds type 1 and less of type 2 and 3 with the latter genotype currently not detected in Indonesia. The 5' untranslated sequences are related to HCV-1. DK-7 (Denmark), US-11 (United States of America), HCV-J4, SA-10 (South Africa), T-3 (Taiwan), HCV-J6, HCV-J8, Eb-1 and Eb-8. When compared with the prototype HCV-1, insertions are found within the 5' untranslated region of Singaporean isolates and not in the Indonesians. There are Singaporean and Indonesian isolates that have sequences within the 5' untranslated region that differ slightly from each other. Microheterogeneity is observed in the core region of two Singaporeans and one Indonesian isolate. Finally, not all HCV isolates can be amplified with the conserved core sequence primers when compared with the ease with which these isolates can be amplified with 5' untranslated region conserved primers.

  20. Tissue-specific DNA methylation is conserved across human, mouse, and rat, and driven by primary sequence conservation.

    PubMed

    Zhou, Jia; Sears, Renee L; Xing, Xiaoyun; Zhang, Bo; Li, Daofeng; Rockweiler, Nicole B; Jang, Hyo Sik; Choudhary, Mayank N K; Lee, Hyung Joo; Lowdon, Rebecca F; Arand, Jason; Tabers, Brianne; Gu, C Charles; Cicero, Theodore J; Wang, Ting

    2017-09-12

    Uncovering mechanisms of epigenome evolution is an essential step towards understanding the evolution of different cellular phenotypes. While studies have confirmed DNA methylation as a conserved epigenetic mechanism in mammalian development, little is known about the conservation of tissue-specific genome-wide DNA methylation patterns. Using a comparative epigenomics approach, we identified and compared the tissue-specific DNA methylation patterns of rat against those of mouse and human across three shared tissue types. We confirmed that tissue-specific differentially methylated regions are strongly associated with tissue-specific regulatory elements. Comparisons between species revealed that at a minimum 11-37% of tissue-specific DNA methylation patterns are conserved, a phenomenon that we define as epigenetic conservation. Conserved DNA methylation is accompanied by conservation of other epigenetic marks including histone modifications. Although a significant amount of locus-specific methylation is epigenetically conserved, the majority of tissue-specific DNA methylation is not conserved across the species and tissue types that we investigated. Examination of the genetic underpinning of epigenetic conservation suggests that primary sequence conservation is a driving force behind epigenetic conservation. In contrast, evolutionary dynamics of tissue-specific DNA methylation are best explained by the maintenance or turnover of binding sites for important transcription factors. Our study extends the limited literature of comparative epigenomics and suggests a new paradigm for epigenetic conservation without genetic conservation through analysis of transcription factor binding sites.

  1. A Comparative Transcriptomic Analysis Reveals Conserved Features of Stem Cell Pluripotency in Planarians and Mammals

    PubMed Central

    Labbé, Roselyne M.; Irimia, Manuel; Currie, Ko W.; Lin, Alexander; Zhu, Shu Jun; Brown, David D.R.; Ross, Eric J.; Voisin, Veronique; Bader, Gary D.; Blencowe, Benjamin J.; Pearson, Bret J.

    2014-01-01

    Many long-lived species of animals require the function of adult stem cells throughout their lives. However, the transcriptomes of stem cells in invertebrates and vertebrates have not been compared, and consequently, ancestral regulatory circuits that control stem cell populations remain poorly defined. In this study, we have used data from high-throughput RNA sequencing to compare the transcriptomes of pluripotent adult stem cells from planarians with the transcriptomes of human and mouse pluripotent embryonic stem cells. From a stringently defined set of 4,432 orthologs shared between planarians, mice and humans, we identified 123 conserved genes that are ≥5-fold differentially expressed in stem cells from all three species. Guided by this gene set, we used RNAi screening in adult planarians to discover novel stem cell regulators, which we found to affect the stem cell-associated functions of tissue homeostasis, regeneration, and stem cell maintenance. Examples of genes that disrupted these processes included the orthologs of TBL3, PSD12, TTC27, and RACK1. From these analyses, we concluded that by comparing stem cell transcriptomes from diverse species, it is possible to uncover conserved factors that function in stem cell biology. These results provide insights into which genes comprised the ancestral circuitry underlying the control of stem cell self-renewal and pluripotency. PMID:22696458

  2. Central region component1, a novel synaptonemal complex component, is essential for meiotic recombination initiation in rice.

    PubMed

    Miao, Chunbo; Tang, Ding; Zhang, Honggen; Wang, Mo; Li, Yafei; Tang, Shuzhu; Yu, Hengxiu; Gu, Minghong; Cheng, Zhukuan

    2013-08-01

    In meiosis, homologous recombination entails programmed DNA double-strand break (DSB) formation and synaptonemal complex (SC) assembly coupled with the DSB repair. Although SCs display extensive structural conservation among species, their components identified are poorly conserved at the sequence level. Here, we identified a novel SC component, designated central region component1 (CRC1), in rice (Oryza sativa). CRC1 colocalizes with ZEP1, the rice SC transverse filament protein, to the central region of SCs in a mutually dependent fashion. Consistent with this colocalization, CRC1 interacts with ZEP1 in yeast two-hybrid assays. CRC1 is orthologous to Saccharomyces cerevisiae pachytene checkpoint2 (Pch2) and Mus musculus THYROID receptor-interacting protein13 (TRIP13) and may be a conserved SC component. Additionally, we provide evidence that CRC1 is essential for meiotic DSB formation. CRC1 interacts with homologous pairing aberration in rice meiosis1 (PAIR1) in vitro, suggesting that these proteins act as a complex to promote DSB formation. PAIR2, the rice ortholog of budding yeast homolog pairing1, is required for homologous chromosome pairing. We found that CRC1 is also essential for the recruitment of PAIR2 onto meiotic chromosomes. The roles of CRC1 identified here have not been reported for Pch2 or TRIP13.

  3. CENTRAL REGION COMPONENT1, a Novel Synaptonemal Complex Component, Is Essential for Meiotic Recombination Initiation in Rice[C][W

    PubMed Central

    Miao, Chunbo; Tang, Ding; Zhang, Honggen; Wang, Mo; Li, Yafei; Tang, Shuzhu; Yu, Hengxiu; Gu, Minghong; Cheng, Zhukuan

    2013-01-01

    In meiosis, homologous recombination entails programmed DNA double-strand break (DSB) formation and synaptonemal complex (SC) assembly coupled with the DSB repair. Although SCs display extensive structural conservation among species, their components identified are poorly conserved at the sequence level. Here, we identified a novel SC component, designated CENTRAL REGION COMPONENT1 (CRC1), in rice (Oryza sativa). CRC1 colocalizes with ZEP1, the rice SC transverse filament protein, to the central region of SCs in a mutually dependent fashion. Consistent with this colocalization, CRC1 interacts with ZEP1 in yeast two-hybrid assays. CRC1 is orthologous to Saccharomyces cerevisiae pachytene checkpoint2 (Pch2) and Mus musculus THYROID RECEPTOR-INTERACTING PROTEIN13 (TRIP13) and may be a conserved SC component. Additionally, we provide evidence that CRC1 is essential for meiotic DSB formation. CRC1 interacts with HOMOLOGOUS PAIRING ABERRATION IN RICE MEIOSIS1 (PAIR1) in vitro, suggesting that these proteins act as a complex to promote DSB formation. PAIR2, the rice ortholog of budding yeast homolog pairing1, is required for homologous chromosome pairing. We found that CRC1 is also essential for the recruitment of PAIR2 onto meiotic chromosomes. The roles of CRC1 identified here have not been reported for Pch2 or TRIP13. PMID:23943860

  4. Strong minor groove base conservation in sequence logos implies DNA distortion or base flipping during replication and transcription initiation | Center for Cancer Research

    Cancer.gov

    Dubbed "Tom's T" by Dhruba Chattoraj, the unusually conserved thymine at position +7 in bacteriophage P1 plasmid RepA DNA binding sites rises above repressor and acceptor sequence logos. The T appears to represent base flipping prior to helix opening in this DNA replication initation protein.

  5. DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats

    PubMed Central

    de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

    2015-01-01

    Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. PMID:26481363

  6. Comparative analysis on the structural features of the 5' flanking region of κ-casein genes from six different species

    PubMed Central

    Gerencsér, Ákos; Barta, Endre; Boa, Simon; Kastanis, Petros; Bösze, Zsuzsanna; Whitelaw, C Bruce A

    2002-01-01

    κ-casein plays an essential role in the formation, stabilisation and aggregation of milk micelles. Control of κ-casein expression reflects this essential role, although an understanding of the mechanisms involved lags behind that of the other milk protein genes. We determined the 5'-flanking sequences for the murine, rabbit and human κ-casein genes and compared them to the published ruminant sequences. The most conserved region was not the proximal promoter region but an approximately 400 bp long region centred 800 bp upstream of the TATA box. This region contained two highly conserved MGF/STAT5 sites with common spacing relative to each other. In this region, six conserved short stretches of similarity were also found which did not correspond to known transcription factor consensus sites. On the contrary to ruminant and human 5' regulatory sequences, the rabbit and murine 5'-flanking regions did not harbour any kind of repetitive elements. We generated a phylogenetic tree of the six species based on multiple alignment of the κ-casein sequences. This study identified conserved candidate transcriptional regulatory elements within the κ-casein gene promoter. PMID:11929628

  7. CODEHOP (COnsensus-DEgenerate Hybrid Oligonucleotide Primer) PCR primer design

    PubMed Central

    Rose, Timothy M.; Henikoff, Jorja G.; Henikoff, Steven

    2003-01-01

    We have developed a new primer design strategy for PCR amplification of distantly related gene sequences based on consensus-degenerate hybrid oligonucleotide primers (CODEHOPs). An interactive program has been written to design CODEHOP PCR primers from conserved blocks of amino acids within multiply-aligned protein sequences. Each CODEHOP consists of a pool of related primers containing all possible nucleotide sequences encoding 3–4 highly conserved amino acids within a 3′ degenerate core. A longer 5′ non-degenerate clamp region contains the most probable nucleotide predicted for each flanking codon. CODEHOPs are used in PCR amplification to isolate distantly related sequences encoding the conserved amino acid sequence. The primer design software and the CODEHOP PCR strategy have been utilized for the identification and characterization of new gene orthologs and paralogs in different plant, animal and bacterial species. In addition, this approach has been successful in identifying new pathogen species. The CODEHOP designer (http://blocks.fhcrc.org/codehop.html) is linked to BlockMaker and the Multiple Alignment Processor within the Blocks Database World Wide Web (http://blocks.fhcrc.org). PMID:12824413

  8. Sequence conservation from human to prokaryotes of Surf1, a protein involved in cytochrome c oxidase assembly, deficient in Leigh syndrome.

    PubMed

    Poyau, A; Buchet, K; Godinot, C

    1999-12-03

    The human SURF1 gene encoding a protein involved in cytochrome c oxidase (COX) assembly, is mutated in most patients presenting Leigh syndrome associated with COX deficiency. Proteins homologous to the human Surf1 have been identified in nine eukaryotes and six prokaryotes using database alignment tools, structure prediction and/or cDNA sequencing. Their sequence comparison revealed a remarkable Surf1 conservation during evolution and put forward at least four highly conserved domains that should be essential for Surf1 function. In Paracoccus denitrificans, the Surf1 homologue is found in the quinol oxidase operon, suggesting that Surf1 is associated with a primitive quinol oxidase which belongs to the same superfamily as cytochrome oxidase.

  9. Biodiversity Conservation and Conservation Biotechnology Tools

    USDA-ARS?s Scientific Manuscript database

    This special issue is dedicated to the in vitro tools and methods used to conserve the genetic diversity of rare and threatened species from around the world. Species that are on the brink of extinction, due to the rapid loss of genetic diversity and habitat, come mainly from resource poor areas the...

  10. Phylogenetic distribution of plant snoRNA families.

    PubMed

    Patra Bhattacharya, Deblina; Canzler, Sebastian; Kehr, Stephanie; Hertel, Jana; Grosse, Ivo; Stadler, Peter F

    2016-11-24

    Small nucleolar RNAs (snoRNAs) are one of the most ancient families amongst non-protein-coding RNAs. They are ubiquitous in Archaea and Eukarya but absent in bacteria. Their main function is to target chemical modifications of ribosomal RNAs. They fall into two classes, box C/D snoRNAs and box H/ACA snoRNAs, which are clearly distinguished by conserved sequence motifs and the type of chemical modification that they govern. Similarly to microRNAs, snoRNAs appear in distinct families of homologs that affect homologous targets. In animals, snoRNAs and their evolution have been studied in much detail. In plants, however, their evolution has attracted comparably little attention. In order to chart the phylogenetic distribution of individual snoRNA families in plants, we applied a sophisticated approach for identifying homologs of known plant snoRNAs across the plant kingdom. In response to the relatively fast evolution of snoRNAs, information on conserved sequence boxes, target sequences, and secondary structure is combined to identify additional snoRNAs. We identified 296 families of snoRNAs in 24 species and traced their evolution throughout the plant kingdom. Many of the plant snoRNA families comprise paralogs. We also found that targets are well-conserved for most snoRNA families. The sequence conservation of snoRNAs is sufficient to establish homologies between phyla. The degree of this conservation tapers off, however, between land plants and algae. Plant snoRNAs are frequently organized in highly conserved spatial clusters. As a resource for further investigations we provide carefully curated and annotated alignments for each snoRNA family under investigation.

  11. Complete genomic sequence of Powassan virus: evaluation of genetic elements in tick-borne versus mosquito-borne flaviviruses.

    PubMed

    Mandl, C W; Holzmann, H; Kunz, C; Heinz, F X

    1993-05-01

    The complete nucleotide sequence of the positive-stranded RNA genome of the tick-borne flavivirus Powassan (10,839 nucleotides) was elucidated and the amino acid sequence of all viral proteins was derived. Based on this sequence as well as serological data, Powassan virus represents the most divergent member of the tick-borne serocomplex within the genus flaviviruses, family Flaviviridae. The primary nucleotide sequence and potential RNA secondary structures of the Powassan virus genome as well as the protein sequences and the reactivities of the virion with a panel of monoclonal antibodies were compared to other tick-borne and mosquito-borne flaviviruses. These analyses corroborated significant differences between tick-borne and mosquito-borne flaviviruses, but also emphasized structural elements that are conserved among both vector groups. The comparisons among tick-borne flaviviruses revealed conserved sequence elements that might represent important determinants of the tick-borne flavivirus phenotype.

  12. Nucleotide sequence of a complementary DNA encoding pea cytosolic copper/zinc superoxide dismutase. [Pisum sativum L

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    White, D.A.; Zilinskas, B.A.

    1991-08-01

    The authors now report the nucleotide sequence of the cytosolic Cu/Zn SOD cloned from a {lambda}gt11 cDNA library constructed from mRNA extracted from leaves of 7- to 10-d pea seedlings (Pisum sativum L.). The clone was isolated using a 22-base synthetic oligonucleotide complementary to the amino acid sequence CGIIGLQG. This sequence, found at the protein's carboxy terminus, is highly conserved among plant cytosolic Cu/Zn SODs but not chloroplastic Cu/Zn SODs. The 738-base pair sequence contains an open reading frame specifying 152 codons and a predicted M{sub r} of 18,024 D. The deduced amino acid sequence is highly homologous (79-82% identity)more » with the sequences of other known plant cytosolic Cu/Zn SODs but less highly conserved (63-65%) when compared with several chloroplastic Cu/Zn SODs including pea (10).« less

  13. Remarkable sequence conservation of the last intron in the PKD1 gene.

    PubMed

    Rodova, Marianna; Islam, M Rafiq; Peterson, Kenneth R; Calvet, James P

    2003-10-01

    The last intron of the PKD1 gene (intron 45) was found to have exceptionally high sequence conservation across four mammalian species: human, mouse, rat, and dog. This conservation did not extend to the comparable intron in pufferfish. Pairwise comparisons for intron 45 showed 91% identity (human vs. dog) to 100% identity (mouse vs. rat) for an average for all four species of 94% identity. In contrast, introns 43 and 44 of the PKD1 gene had average pairwise identities of 57% and 54%, and exons 43, 44, and 45 and the coding region of exon 46 had average pairwise identities of 80%, 84%, 82%, and 80%. Intron 45 is 90 to 95 bp in length, with the major region of sequence divergence being in a central 4-bp to 9-bp variable region. RNA secondary structure analysis of intron 45 predicts a branching stem-loop structure in which the central variable region lies in one loop and the putative branch point sequence lies in another loop, suggesting that the intron adopts a specific stem-loop structure that may be important for its removal. Although intron 45 appears to conform to the class of small, G-triplet-containing introns that are spliced by a mechanism utilizing intron definition, its high sequence conservation may be a reflection of constraints imposed by a unique mechanism that coordinates splicing of this last PKD1 intron with polyadenylation.

  14. Close evolutionary relatedness among functionally distantly related members of the (alpha/beta)8-barrel glycosyl hydrolases suggested by the similarity of their fifth conserved sequence region.

    PubMed

    Janecek, S

    1995-12-11

    A short conserved sequence equivalent to the fifth conserved sequence region of alpha-amylases (173_LPDLD, Aspergillus oryzae alpha-amylase) comprising the calcium-ligand aspartate, Asp-175, was identified in the amino acid sequences of several members of the family of (alpha/beta)8-barrel glycosyl hydrolases. Despite the fact that the aspartate is not invariantly conserved, the stretch can be easily recognised in all sequences to be positioned 26-28 amino acid residues in front of the well-known catalytic aspartate (Asp-206, A. oryzae alpha-amylase) located in the beta 4-strand of the barrel. The identification of this region revealed remarkable similarities between some alpha-amylases (those from Bacillus megaterium, Bacillus subtilis and Dictyoglomus thermophilum) on the one hand and several different enzyme specificities (such as oligo-1,6-glucosidase, amylomaltase and neopullulanase, respectively) on the other hand. The most interesting example was offered by B. subtilis alpha-amylase and potato amylomaltase with the regions LYDWN and LYDWK, respectively. These observations support the idea that all members of the family of glycosyl hydrolases adopting the structure of the alpha-amylase-type (alpha/beta)8-barrel are mutually closely related and the strict evolutionary borders separating the individual enzyme specificities can be hardly defined.

  15. Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi.

    PubMed

    Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M

    2010-12-15

    Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.

  16. Unusual Intron Conservation near Tissue-Regulated Exons Found by Splicing Microarrays

    PubMed Central

    Sugnet, Charles W; Srinivasan, Karpagam; Clark, Tyson A; O'Brien, Georgeann; Cline, Melissa S; Wang, Hui; Williams, Alan; Kulp, David; Blume, John E; Haussler, David; Ares, Manuel

    2006-01-01

    Alternative splicing contributes to both gene regulation and protein diversity. To discover broad relationships between regulation of alternative splicing and sequence conservation, we applied a systems approach, using oligonucleotide microarrays designed to capture splicing information across the mouse genome. In a set of 22 adult tissues, we observe differential expression of RNA containing at least two alternative splice junctions for about 40% of the 6,216 alternative events we could detect. Statistical comparisons identify 171 cassette exons whose inclusion or skipping is different in brain relative to other tissues and another 28 exons whose splicing is different in muscle. A subset of these exons is associated with unusual blocks of intron sequence whose conservation in vertebrates rivals that of protein-coding exons. By focusing on sets of exons with similar regulatory patterns, we have identified new sequence motifs implicated in brain and muscle splicing regulation. Of note is a motif that is strikingly similar to the branchpoint consensus but is located downstream of the 5′ splice site of exons included in muscle. Analysis of three paralogous membrane-associated guanylate kinase genes reveals that each contains a paralogous tissue-regulated exon with a similar tissue inclusion pattern. While the intron sequences flanking these exons remain highly conserved among mammalian orthologs, the paralogous flanking intron sequences have diverged considerably, suggesting unusually complex evolution of the regulation of alternative splicing in multigene families. PMID:16424921

  17. QTL Mapping of Sex Determination Loci Supports an Ancient Pathway in Ants and Honey Bees.

    PubMed

    Miyakawa, Misato O; Mikheyev, Alexander S

    2015-11-01

    Sex determination mechanisms play a central role in life-history characteristics, affecting mating systems, sex ratios, inbreeding tolerance, etc. Downstream components of sex determination pathways are highly conserved, but upstream components evolve rapidly. Evolutionary dynamics of sex determination remain poorly understood, particularly because mechanisms appear so diverse. Here we investigate the origins and evolution of complementary sex determination (CSD) in ants and bees. The honey bee has a well-characterized CSD locus, containing tandemly arranged homologs of the transformer gene [complementary sex determiner (csd) and feminizer (fem)]. Such tandem paralogs appear frequently in aculeate hymenopteran genomes. However, only comparative genomic, but not functional, data support a broader role for csd/fem in sex determination, and whether species other than the honey bee use this pathway remains controversial. Here we used a backcross to test whether csd/fem acts as a CSD locus in an ant (Vollenhovia emeryi). After sequencing and assembling the genome, we computed a linkage map, and conducted a quantitative trait locus (QTL) analysis of diploid male production using 68 diploid males and 171 workers. We found two QTLs on separate linkage groups (CsdQTL1 and CsdQTL2) that jointly explained 98.0% of the phenotypic variance. CsdQTL1 included two tandem transformer homologs. These data support the prediction that the same CSD mechanism has indeed been conserved for over 100 million years. CsdQTL2 had no similarity to CsdQTL1 and included a 236-kb region with no obvious CSD gene candidates, making it impossible to conclusively characterize it using our data. The sequence of this locus was conserved in at least one other ant genome that diverged >75 million years ago. By applying QTL analysis to ants for the first time, we support the hypothesis that elements of hymenopteran CSD are ancient, but also show that more remains to be learned about the diversity of CSD mechanisms.

  18. The LINE-1 DNA sequences in four mammalian orders predict proteins that conserve homologies to retrovirus proteins.

    PubMed Central

    Fanning, T; Singer, M

    1987-01-01

    Recent work suggests that one or more members of the highly repeated LINE-1 (L1) DNA family found in all mammals may encode one or more proteins. Here we report the sequence of a portion of an L1 cloned from the domestic cat (Felis catus). These data permit comparison of the L1 sequences in four mammalian orders (Carnivore, Lagomorph, Rodent and Primate) and the comparison supports the suggested coding potential. In two separate, noncontiguous regions in the carboxy terminal half of the proteins predicted from the DNA sequences, there are several strongly conserved segments. In one region, these share homology with known or suspected reverse transcriptases, as described by others in rodents and primates. In the second region, closer to the carboxy terminus, the strongly conserved segments are over 90% homologous among the four orders. One of the latter segments is cysteine rich and resembles the putative metal binding domains of nucleic acid binding proteins, including those of TFIIIA and retroviruses. PMID:3562227

  19. PknB remains an essential and a conserved target for drug development in susceptible and MDR strains of M. Tuberculosis.

    PubMed

    Gupta, Anamika; Pal, Sudhir K; Pandey, Divya; Fakir, Najneen A; Rathod, Sunita; Sinha, Dhiraj; SivaKumar, S; Sinha, Pallavi; Periera, Mycal; Balgam, Shilpa; Sekar, Gomathi; UmaDevi, K R; Anupurba, Shampa; Nema, Vijay

    2017-08-18

    The Mycobacterium tuberculosis (M.tb) protein kinase B (PknB) which is now proved to be essential for the growth and survival of M.tb, is a transmembrane protein with a potential to be a good drug target. However it is not known if this target remains conserved in otherwise resistant isolates from clinical origin. The present study describes the conservation analysis of sequences covering the inhibitor binding domain of PknB to assess if it remains conserved in susceptible and resistant clinical strains of mycobacteria picked from three different geographical areas of India. A total of 116 isolates from North, South and West India were used in the study with a variable profile of their susceptibilities towards streptomycin, isoniazid, rifampicin, ethambutol and ofloxacin. Isolates were also spoligotyped in order to find if the conservation pattern of pknB gene remain consistent or differ with different spoligotypes. The impact of variation as found in the study was analyzed using Molecular dynamics simulations. The sequencing results with 115/116 isolates revealed the conserved nature of pknB sequences irrespective of their susceptibility status and spoligotypes. The only variation found was in one strains wherein pnkB sequence had G to A mutation at 664 position translating into a change of amino acid, Valine to Isoleucine. After analyzing the impact of this sequence variation using Molecular dynamics simulations, it was observed that the variation is causing no significant change in protein structure or the inhibitor binding. Hence, the study endorses that PknB is an ideal target for drug development and there is no pre-existing or induced resistance with respect to the sequences involved in inhibitor binding. Also if the mutation that we are reporting for the first time is found again in subsequent work, it should be checked with phenotypic profile before drawing the conclusion that it would affect the activity in any way. Bioinformatics analysis in our study says that it has no significant effect on the binding and hence the activity of the protein.

  20. Analysis of evolutionary conservation patterns and their influence on identifying protein functional sites.

    PubMed

    Fang, Chun; Noguchi, Tamotsu; Yamana, Hayato

    2014-10-01

    Evolutionary conservation information included in position-specific scoring matrix (PSSM) has been widely adopted by sequence-based methods for identifying protein functional sites, because all functional sites, whether in ordered or disordered proteins, are found to be conserved at some extent. However, different functional sites have different conservation patterns, some of them are linear contextual, some of them are mingled with highly variable residues, and some others seem to be conserved independently. Every value in PSSMs is calculated independently of each other, without carrying the contextual information of residues in the sequence. Therefore, adopting the direct output of PSSM for prediction fails to consider the relationship between conservation patterns of residues and the distribution of conservation scores in PSSMs. In order to demonstrate the importance of combining PSSMs with the specific conservation patterns of functional sites for prediction, three different PSSM-based methods for identifying three kinds of functional sites have been analyzed. Results suggest that, different PSSM-based methods differ in their capability to identify different patterns of functional sites, and better combining PSSMs with the specific conservation patterns of residues would largely facilitate the prediction.

  1. Analysis of expression in the Anopheles gambiae developing testes reveals rapidly evolving lineage-specific genes in mosquitoes.

    PubMed

    Krzywinska, Elzbieta; Krzywinski, Jaroslaw

    2009-07-06

    Male mosquitoes do not feed on blood and are not involved in delivery of pathogens to humans. Consequently, they are seldom the subjects of research, which results in a very poor understanding of their biology. To gain insights into male developmental processes we sought to identify genes transcribed exclusively in the reproductive tissues of male Anopheles gambiae pupae. Using a cDNA subtraction strategy, five male-specifically or highly male-biased expressed genes were isolated, four of which remain unannotated in the An. gambiae genome. Spatial and temporal expression patterns suggest that each of these genes is involved in the mid-late stages of spermatogenesis. Their sequences are rapidly evolving; however, two genes possess clear homologs in a wide range of taxa and one of these probably acts in a sperm motility control mechanism conserved in many organisms, including humans. The other three genes have no match to sequences from non-mosquito taxa, thus can be regarded as orphans. RNA in situ hybridization demonstrated that one of the orphans is transcribed in spermatids, which suggests its involvement in sperm maturation. Two other orphans have unknown functions. Expression analysis of orthologs of all five genes indicated that male-biased transcription was not conserved in the majority of cases in Aedes and Culex. Discovery of testis-expressed orphan genes in mosquitoes opens new prospects for the development of innovative control methods. The orphan encoded proteins may represent unique targets of selective anti-mosquito sterilizing agents that will not affect non-target organisms.

  2. Positive selection and functional divergence of farnesyl pyrophosphate synthase genes in plants.

    PubMed

    Qian, Jieying; Liu, Yong; Chao, Naixia; Ma, Chengtong; Chen, Qicong; Sun, Jian; Wu, Yaosheng

    2017-02-04

    Farnesyl pyrophosphate synthase (FPS) belongs to the short-chain prenyltransferase family, and it performs a conserved and essential role in the terpenoid biosynthesis pathway. However, its classification, evolutionary history, and the forces driving the evolution of FPS genes in plants remain poorly understood. Phylogeny and positive selection analysis was used to identify the evolutionary forces that led to the functional divergence of FPS in plants, and recombinant detection was undertaken using the Genetic Algorithm for Recombination Detection (GARD) method. The dataset included 68 FPS variation pattern sequences (2 gymnosperms, 10 monocotyledons, 54 dicotyledons, and 2 outgroups). This study revealed that the FPS gene was under positive selection in plants. No recombinant within the FPS gene was found. Therefore, it was inferred that the positive selection of FPS had not been influenced by a recombinant episode. The positively selected sites were mainly located in the catalytic center and functional areas, which indicated that the 98S and 234D were important positively selected sites for plant FPS in the terpenoid biosynthesis pathway. They were located in the FPS conserved domain of the catalytic site. We inferred that the diversification of FPS genes was associated with functional divergence and could be driven by positive selection. It was clear that protein sequence evolution via positive selection was able to drive adaptive diversification in plant FPS proteins. This study provides information on the classification and positive selection of plant FPS genes, and the results could be useful for further research on the regulation of triterpenoid biosynthesis.

  3. OST-HTH: a novel predicted RNA-binding domain

    PubMed Central

    2010-01-01

    Background The mechanism by which the arthropod Oskar and vertebrate TDRD5/TDRD7 proteins nucleate or organize structurally related ribonucleoprotein (RNP) complexes, the polar granule and nuage, is poorly understood. Using sequence profile searches we identify a novel domain in these proteins that is widely conserved across eukaryotes and bacteria. Results Using contextual information from domain architectures, sequence-structure superpositions and available functional information we predict that this domain is likely to adopt the winged helix-turn-helix fold and bind RNA with a potential specificity for dsRNA. We show that in eukaryotes this domain is often combined in the same polypeptide with protein-protein- or lipid- interaction domains that might play a role in anchoring these proteins to specific cytoskeletal structures. Conclusions Thus, proteins with this domain might have a key role in the recognition and localization of dsRNA, including miRNAs, rasiRNAs and piRNAs hybridized to their targets. In other cases, this domain is fused to ubiquitin-binding, E3 ligase and ubiquitin-like domains indicating a previously under-appreciated role for ubiquitination in regulating the assembly and stability of nuage-like RNP complexes. Both bacteria and eukaryotes encode a conserved family of proteins that combines this predicted RNA-binding domain with a previously uncharacterized domain (DUF88). We present evidence that it is an RNAse belonging to the superfamily that includes the 5'->3' nucleases, PIN and NYN domains and might be recruited to degrade certain RNAs. Reviewers This article was reviewed by Sandor Pongor and Arcady Mushegian. PMID:20302647

  4. Conservation of an Intact vif Gene of Human Immunodeficiency Virus Type 1 during Maternal-Fetal Transmission

    PubMed Central

    Yedavalli, Venkat R. K.; Chappey, Colombe; Matala, Erik; Ahmad, Nafees

    1998-01-01

    The human immunodeficiency virus type 1 (HIV-1) vif gene is conserved among most lentiviruses, suggesting that vif is important for natural infection. To determine whether an intact vif gene is positively selected during mother-to-infant transmission, we analyzed vif sequences from five infected mother-infant pairs following perinatal transmission. The coding potential of the vif open reading frame directly derived from uncultured peripheral blood mononuclear cell DNA was maintained in most of the 78,912 bp sequenced. We found that 123 of the 137 clones analyzed showed an 89.8% frequency of intact vif open reading frames. There was a low degree of heterogeneity of vif genes within mothers, within infants, and between epidemiologically linked mother-infant pairs. The distances between vif sequences were greater in epidemiologically unlinked individuals than in epidemiologically linked mother-infant pairs. Furthermore, the epidemiologically linked mother-infant pair vif sequences displayed similar patterns that were not seen in vif sequences from epidemiologically unlinked individuals. The functional domains, including the two cysteines at positions 114 and 133, a serine phosphorylation site at position 144, and the C-terminal basic amino acids essential for vif protein function, were highly conserved in most of the sequences. Phylogenetic analyses of 137 mother-infant pair vif sequences and 187 other available vif sequences from HIV-1 databases revealed distinct clusters for vif sequences from each mother-infant pair and for other vif sequences. Taken together, these findings suggest that vif plays an important role in HIV-1 infection and replication in mothers and their perinatally infected infants. PMID:9445004

  5. DNA-DNA hybridization values and their relationship to whole-genome sequence similarities.

    PubMed

    Goris, Johan; Konstantinidis, Konstantinos T; Klappenbach, Joel A; Coenye, Tom; Vandamme, Peter; Tiedje, James M

    2007-01-01

    DNA-DNA hybridization (DDH) values have been used by bacterial taxonomists since the 1960s to determine relatedness between strains and are still the most important criterion in the delineation of bacterial species. Since the extent of hybridization between a pair of strains is ultimately governed by their respective genomic sequences, we examined the quantitative relationship between DDH values and genome sequence-derived parameters, such as the average nucleotide identity (ANI) of common genes and the percentage of conserved DNA. A total of 124 DDH values were determined for 28 strains for which genome sequences were available. The strains belong to six important and diverse groups of bacteria for which the intra-group 16S rRNA gene sequence identity was greater than 94 %. The results revealed a close relationship between DDH values and ANI and between DNA-DNA hybridization and the percentage of conserved DNA for each pair of strains. The recommended cut-off point of 70 % DDH for species delineation corresponded to 95 % ANI and 69 % conserved DNA. When the analysis was restricted to the protein-coding portion of the genome, 70 % DDH corresponded to 85 % conserved genes for a pair of strains. These results reveal extensive gene diversity within the current concept of "species". Examination of reciprocal values indicated that the level of experimental error associated with the DDH method is too high to reveal the subtle differences in genome size among the strains sampled. It is concluded that ANI can accurately replace DDH values for strains for which genome sequences are available.

  6. Nucleotide sequence of a cluster of early and late genes in a conserved segment of the vaccinia virus genome.

    PubMed Central

    Plucienniczak, A; Schroeder, E; Zettlmeissl, G; Streeck, R E

    1985-01-01

    The nucleotide sequence of a 7.6 kb vaccinia DNA segment from a genomic region conserved among different orthopox virus has been determined. This segment contains a tight cluster of 12 partly overlapping open reading frames most of which can be correlated with previously identified early and late proteins and mRNAs. Regulatory signals used by vaccinia virus have been studied. Presumptive promoter regions are rich in A, T and carry the consensus sequences TATA and AATAA spaced at 20-24 base pairs. Tandem repeats of a CTATTC consensus sequence are proposed to be involved in the termination of early transcription. PMID:2987815

  7. Genome-wide discovery and differential regulation of conserved and novel microRNAs in chickpea via deep sequencing.

    PubMed

    Jain, Mukesh; Chevala, V V S Narayana; Garg, Rohini

    2014-11-01

    MicroRNAs (miRNAs) are essential components of complex gene regulatory networks that orchestrate plant development. Although several genomic resources have been developed for the legume crop chickpea, miRNAs have not been discovered until now. For genome-wide discovery of miRNAs in chickpea (Cicer arietinum), we sequenced the small RNA content from seven major tissues/organs employing Illumina technology. About 154 million reads were generated, which represented more than 20 million distinct small RNA sequences. We identified a total of 440 conserved miRNAs in chickpea based on sequence similarity with known miRNAs in other plants. In addition, 178 novel miRNAs were identified using a miRDeep pipeline with plant-specific scoring. Some of the conserved and novel miRNAs with significant sequence similarity were grouped into families. The chickpea miRNAs targeted a wide range of mRNAs involved in diverse cellular processes, including transcriptional regulation (transcription factors), protein modification and turnover, signal transduction, and metabolism. Our analysis revealed several miRNAs with differential spatial expression. Many of the chickpea miRNAs were expressed in a tissue-specific manner. The conserved and differential expression of members of the same miRNA family in different tissues was also observed. Some of the same family members were predicted to target different chickpea mRNAs, which suggested the specificity and complexity of miRNA-mediated developmental regulation. This study, for the first time, reveals a comprehensive set of conserved and novel miRNAs along with their expression patterns and putative targets in chickpea, and provides a framework for understanding regulation of developmental processes in legumes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Experimental Biology.

  8. Identification and characterization of microRNAs in Phaseolus vulgaris by high-throughput sequencing

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) are endogenously encoded small RNAs that post-transcriptionally regulate gene expression. MiRNAs play essential roles in almost all plant biological processes. Currently, few miRNAs have been identified in the model food legume Phaseolus vulgaris (common bean). Recent advances in next generation sequencing technologies have allowed the identification of conserved and novel miRNAs in many plant species. Here, we used Illumina's sequencing by synthesis (SBS) technology to identify and characterize the miRNA population of Phaseolus vulgaris. Results Small RNA libraries were generated from roots, flowers, leaves, and seedlings of P. vulgaris. Based on similarity to previously reported plant miRNAs,114 miRNAs belonging to 33 conserved miRNA families were identified. Stem-loop precursors and target gene sequences for several conserved common bean miRNAs were determined from publicly available databases. Less conserved miRNA families and species-specific common bean miRNA isoforms were also characterized. Moreover, novel miRNAs based on the small RNAs were found and their potential precursors were predicted. In addition, new target candidates for novel and conserved miRNAs were proposed. Finally, we studied organ-specific miRNA family expression levels through miRNA read frequencies. Conclusions This work represents the first massive-scale RNA sequencing study performed in Phaseolus vulgaris to identify and characterize its miRNA population. It significantly increases the number of miRNAs, precursors, and targets identified in this agronomically important species. The miRNA expression analysis provides a foundation for understanding common bean miRNA organ-specific expression patterns. The present study offers an expanded picture of P. vulgaris miRNAs in relation to those of other legumes. PMID:22394504

  9. Genomic dissection of conserved transcriptional regulation in intestinal epithelial cells

    PubMed Central

    Camp, J. Gray; Weiser, Matthew; Cocchiaro, Jordan L.; Kingsley, David M.; Furey, Terrence S.; Sheikh, Shehzad Z.; Rawls, John F.

    2017-01-01

    The intestinal epithelium serves critical physiologic functions that are shared among all vertebrates. However, it is unknown how the transcriptional regulatory mechanisms underlying these functions have changed over the course of vertebrate evolution. We generated genome-wide mRNA and accessible chromatin data from adult intestinal epithelial cells (IECs) in zebrafish, stickleback, mouse, and human species to determine if conserved IEC functions are achieved through common transcriptional regulation. We found evidence for substantial common regulation and conservation of gene expression regionally along the length of the intestine from fish to mammals and identified a core set of genes comprising a vertebrate IEC signature. We also identified transcriptional start sites and other putative regulatory regions that are differentially accessible in IECs in all 4 species. Although these sites rarely showed sequence conservation from fish to mammals, surprisingly, they drove highly conserved IEC expression in a zebrafish reporter assay. Common putative transcription factor binding sites (TFBS) found at these sites in multiple species indicate that sequence conservation alone is insufficient to identify much of the functionally conserved IEC regulatory information. Among the rare, highly sequence-conserved, IEC-specific regulatory regions, we discovered an ancient enhancer upstream from her6/HES1 that is active in a distinct population of Notch-positive cells in the intestinal epithelium. Together, these results show how combining accessible chromatin and mRNA datasets with TFBS prediction and in vivo reporter assays can reveal tissue-specific regulatory information conserved across 420 million years of vertebrate evolution. We define an IEC transcriptional regulatory network that is shared between fish and mammals and establish an experimental platform for studying how evolutionarily distilled regulatory information commonly controls IEC development and physiology. PMID:28850571

  10. DoOPSearch: a web-based tool for finding and analysing common conserved motifs in the promoter regions of different chordate and plant genes

    PubMed Central

    Sebestyén, Endre; Nagy, Tibor; Suhai, Sándor; Barta, Endre

    2009-01-01

    Background The comparative genomic analysis of a large number of orthologous promoter regions of the chordate and plant genes from the DoOP databases shows thousands of conserved motifs. Most of these motifs differ from any known transcription factor binding site (TFBS). To identify common conserved motifs, we need a specific tool to be able to search amongst them. Since conserved motifs from the DoOP databases are linked to genes, the result of such a search can give a list of genes that are potentially regulated by the same transcription factor(s). Results We have developed a new tool called DoOPSearch for the analysis of the conserved motifs in the promoter regions of chordate or plant genes. We used the orthologous promoters of the DoOP database to extract thousands of conserved motifs from different taxonomic groups. The advantage of this approach is that different sets of conserved motifs might be found depending on how broad the taxonomic coverage of the underlying orthologous promoter sequence collection is (consider e.g. primates vs. mammals or Brassicaceae vs. Viridiplantae). The DoOPSearch tool allows the users to search these motif collections or the promoter regions of DoOP with user supplied query sequences or any of the conserved motifs from the DoOP database. To find overrepresented gene ontologies, the gene lists obtained can be analysed further using a modified version of the GeneMerge program. Conclusion We present here a comparative genomics based promoter analysis tool. Our system is based on a unique collection of conserved promoter motifs characteristic of different taxonomic groups. We offer both a command line and a web-based tool for searching in these motif collections using user specified queries. These can be either short promoter sequences or consensus sequences of known transcription factor binding sites. The GeneMerge analysis of the search results allows the user to identify statistically overrepresented Gene Ontology terms that might provide a clue on the function of the motifs and genes. PMID:19534755

  11. Synteny conservation between the Prunus genome and both the present and ancestral Arabidopsis genomes

    PubMed Central

    Jung, Sook; Main, Dorrie; Staton, Margaret; Cho, Ilhyung; Zhebentyayeva, Tatyana; Arús, Pere; Abbott, Albert

    2006-01-01

    Background Due to the lack of availability of large genomic sequences for peach or other Prunus species, the degree of synteny conservation between the Prunus species and Arabidopsis has not been systematically assessed. Using the recently available peach EST sequences that are anchored to Prunus genetic maps and to peach physical map, we analyzed the extent of conserved synteny between the Prunus and the Arabidopsis genomes. The reconstructed pseudo-ancestral Arabidopsis genome, existed prior to the proposed recent polyploidy event, was also utilized in our analysis to further elucidate the evolutionary relationship. Results We analyzed the synteny conservation between the Prunus and the Arabidopsis genomes by comparing 475 peach ESTs that are anchored to Prunus genetic maps and their Arabidopsis homologs detected by sequence similarity. Microsyntenic regions were detected between all five Arabidopsis chromosomes and seven of the eight linkage groups of the Prunus reference map. An additional 1097 peach ESTs that are anchored to 431 BAC contigs of the peach physical map and their Arabidopsis homologs were also analyzed. Microsyntenic regions were detected in 77 BAC contigs. The syntenic regions from both data sets were short and contained only a couple of conserved gene pairs. The synteny between peach and Arabidopsis was fragmentary; all the Prunus linkage groups containing syntenic regions matched to more than two different Arabidopsis chromosomes, and most BAC contigs with multiple conserved syntenic regions corresponded to multiple Arabidopsis chromosomes. Using the same peach EST datasets and their Arabidopsis homologs, we also detected conserved syntenic regions in the pseudo-ancestral Arabidopsis genome. In many cases, the gene order and content of peach regions was more conserved in the ancestral genome than in the present Arabidopsis region. Statistical significance of each syntenic group was calculated using simulated Arabidopsis genome. Conclusion We report here the result of the first extensive analysis of the conserved microsynteny using DNA sequences across the Prunus genome and their Arabidopsis homologs. Our study also illustrates that both the ancestral and present Arabidopsis genomes can provide a useful resource for marker saturation and candidate gene search, as well as elucidating evolutionary relationships between species. PMID:16615871

  12. Identification of two essential aspartates for polymerase activity in parainfluenza virus L protein by a minireplicon system expressing secretory luciferase.

    PubMed

    Matsumoto, Yusuke; Ohta, Keisuke; Yumine, Natsuko; Goto, Hideo; Nishio, Machiko

    2015-11-01

    Gene expression of nonsegmented negative-strand RNA viruses (nsNSVs) such as parainfluenza viruses requires the RNA synthesis activity of their polymerase L protein; however, the detailed mechanism of this process is poorly understood. In this study, a parainfluenza minireplicon assay expressing secretory Gaussia luciferase (Gluc) was established to analyze large protein (L) activity. Measurement of Gluc expression in the culture medium of cells transfected with the minigenome and viral polymerase components enabled quick and concise calculation of L activity. By comparing the amino acid sequences in conserved region III (CRIII), a putative polymerase-active domain of the L protein, two strictly conserved aspartates were identified in all families of nsNSV. A series of L mutants from human parainfluenza virus type 2 and parainfluenza virus type 5 showed that these aspartates are necessary for reporter gene expression. It was also confirmed that these aspartates are important for the production of viral mRNA and antigenome cRNA, but not for a polymerase-complex formation. These findings suggest that these two aspartates are key players in the nucleotidyl transfer reaction using two metal ions. © 2015 The Societies and Wiley Publishing Asia Pty Ltd.

  13. Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures

    PubMed Central

    2012-01-01

    Background The NCBI Conserved Domain Database (CDD) consists of a collection of multiple sequence alignments of protein domains that are at various stages of being manually curated into evolutionary hierarchies based on conserved and divergent sequence and structural features. These domain models are annotated to provide insights into the relationships between sequence, structure and function via web-based BLAST searches. Results Here we automate the generation of conserved domain (CD) hierarchies using a combination of heuristic and Markov chain Monte Carlo (MCMC) sampling procedures and starting from a (typically very large) multiple sequence alignment. This procedure relies on statistical criteria to define each hierarchy based on the conserved and divergent sequence patterns associated with protein functional-specialization. At the same time this facilitates the sequence and structural annotation of residues that are functionally important. These statistical criteria also provide a means to objectively assess the quality of CD hierarchies, a non-trivial task considering that the protein subgroups are often very distantly related—a situation in which standard phylogenetic methods can be unreliable. Our aim here is to automatically generate (typically sub-optimal) hierarchies that, based on statistical criteria and visual comparisons, are comparable to manually curated hierarchies; this serves as the first step toward the ultimate goal of obtaining optimal hierarchical classifications. A plot of runtimes for the most time-intensive (non-parallelizable) part of the algorithm indicates a nearly linear time complexity so that, even for the extremely large Rossmann fold protein class, results were obtained in about a day. Conclusions This approach automates the rapid creation of protein domain hierarchies and thus will eliminate one of the most time consuming aspects of conserved domain database curation. At the same time, it also facilitates protein domain annotation by identifying those pattern residues that most distinguish each protein domain subgroup from other related subgroups. PMID:22726767

  14. JDet: interactive calculation and visualization of function-related conservation patterns in multiple sequence alignments and structures.

    PubMed

    Muth, Thilo; García-Martín, Juan A; Rausell, Antonio; Juan, David; Valencia, Alfonso; Pazos, Florencio

    2012-02-15

    We have implemented in a single package all the features required for extracting, visualizing and manipulating fully conserved positions as well as those with a family-dependent conservation pattern in multiple sequence alignments. The program allows, among other things, to run different methods for extracting these positions, combine the results and visualize them in protein 3D structures and sequence spaces. JDet is a multiplatform application written in Java. It is freely available, including the source code, at http://csbg.cnb.csic.es/JDet. The package includes two of our recently developed programs for detecting functional positions in protein alignments (Xdet and S3Det), and support for other methods can be added as plug-ins. A help file and a guided tutorial for JDet are also available.

  15. Rapid functional diversification in the structurally conserved ELAV family of neuronal RNA binding proteins

    PubMed Central

    Samson, Marie-Laure

    2008-01-01

    Background The Drosophila gene embryonic lethal abnormal visual system (elav) is the prototype of a gene family present in all metazoans. Its members encode structurally conserved neuronal proteins with three RNA Recognition Motifs (RRM) but they paradoxically act at diverse levels of post-transcriptional regulation. In an attempt to understand the history of this family, we searched for orthologs in eleven completely sequenced genomes, including those of humans, D. melanogaster and C. elegans, for which cDNAs are available. Results We analyzed 23 orthologs/paralogs of elav, and found evidence of gain/loss of gene copy number. For one set of genes, including elav itself, the coding sequences are free of introns and their products most resemble ELAV. The remaining genes show remarkable conservation of their exon organization, and their products most resemble FNE and RBP9, proteins encoded by the two elav paralogs of Drosophila. Remarkably, three of the conserved exon junctions are both close to structural elements, involved respectively in protein-RNA interactions and in the regulation of sub-cellular localization, and in the vicinity of diverse sequence variations. Conclusion The data indicate that the essential elav gene of Drosophila is newly emerged, restricted to dipterans and of retrotransposed origin. We propose that the conserved exon junctions constitute potential sites for sequence/function modifications, and that RRM binding proteins, whose function relies upon plastic RNA-protein interactions, may have played an important role in brain evolution. PMID:18715504

  16. Sequence analysis of serum albumins reveals the molecular evolution of ligand recognition properties.

    PubMed

    Fanali, Gabriella; Ascenzi, Paolo; Bernardi, Giorgio; Fasano, Mauro

    2012-01-01

    Serum albumin (SA) is a circulating protein providing a depot and carrier for many endogenous and exogenous compounds. At least seven major binding sites have been identified by structural and functional investigations mainly in human SA. SA is conserved in vertebrates, with at least 49 entries in protein sequence databases. The multiple sequence analysis of this set of entries leads to the definition of a cladistic tree for the molecular evolution of SA orthologs in vertebrates, thus showing the clustering of the considered species, with lamprey SAs (Lethenteron japonicum and Petromyzon marinus) in a separate outgroup. Sequence analysis aimed at searching conserved domains revealed that most SA sequences are made up by three repeated domains (about 600 residues), as extensively characterized for human SA. On the contrary, lamprey SAs are giant proteins (about 1400 residues) comprising seven repeated domains. The phylogenetic analysis of the SA family reveals a stringent correlation with the taxonomic classification of the species available in sequence databases. A focused inspection of the sequences of ligand binding sites in SA revealed that in all sites most residues involved in ligand binding are conserved, although the versatility towards different ligands could be peculiar of higher organisms. Moreover, the analysis of molecular links between the different sites suggests that allosteric modulation mechanisms could be restricted to higher vertebrates.

  17. A field ornithologist’s guide to genomics: Practical considerations for ecology and conservation

    USGS Publications Warehouse

    Oyler-McCance, Sara J.; Oh, Kevin; Langin, Kathryn; Aldridge, Cameron L.

    2016-01-01

    Vast improvements in sequencing technology have made it practical to simultaneously sequence millions of nucleotides distributed across the genome, opening the door for genomic studies in virtually any species. Ornithological research stands to benefit in three substantial ways. First, genomic methods enhance our ability to parse and simultaneously analyze both neutral and non-neutral genomic regions, thus providing insight into adaptive evolution and divergence. Second, the sheer quantity of sequence data generated by current sequencing platforms allows increased precision and resolution in analyses. Third, high-throughput sequencing can benefit applications that focus on a small number of loci that are otherwise prohibitively expensive, time-consuming, and technically difficult using traditional sequencing methods. These advances have improved our ability to understand evolutionary processes like speciation and local adaptation, but they also offer many practical applications in the fields of population ecology, migration tracking, conservation planning, diet analyses, and disease ecology. This review provides a guide for field ornithologists interested in incorporating genomic approaches into their research program, with an emphasis on techniques related to ecology and conservation. We present a general overview of contemporary genomic approaches and methods, as well as important considerations when selecting a genomic technique. We also discuss research questions that are likely to benefit from utilizing high-throughput sequencing instruments, highlighting select examples from recent avian studies.

  18. DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats.

    PubMed

    de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas

    2015-11-16

    Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Molecular characterisation of Atlantic salmon paramyxovirus (ASPV): A novel paramyxovirus associated with proliferative gill inflammation

    USGS Publications Warehouse

    Falk, K.; Batts, W.N.; Kvellestad, A.; Kurath, G.; Wiik-Nielsen, J.; Winton, J.R.

    2008-01-01

    Atlantic salmon paramyxovirus (ASPV) was isolated in 1995 from gills of farmed Atlantic salmon suffering from proliferative gill inflammation. The complete genome sequence of ASPV was determined, revealing a genome 16,968 nucleotides in length consisting of six non-overlapping genes coding for the nucleo- (N), phospho- (P), matrix- (M), fusion- (F), haemagglutinin-neuraminidase- (HN) and large polymerase (L) proteins in the order 3???-N-P-M-F-HN-L-5???. The various conserved features related to virus replication found in most paramyxoviruses were also found in ASPV. These include: conserved and complementary leader and trailer sequences, tri-nucleotide intergenic regions and highly conserved transcription start and stop signal sequences. The P gene expression strategy of ASPV was like that of the respiro-, morbilli- and henipaviruses, which express the P and C proteins from the primary transcript and edit a portion of the mRNA to encode V and W proteins. Sequence similarities among various features related to virus replication, pairwise comparisons of all deduced ASPV protein sequences with homologous regions from other members of the family Paramyxoviridae, and phylogenetic analyses of these amino acid sequences suggested that ASPV was a novel member of the sub-family Paramyxovirinae, most closely related to the respiroviruses. ?? 2008 Elsevier B.V. All rights reserved.

  20. A Bioinformatic Pipeline for Monitoring of the Mutational Stability of Viral Drug Targets with Deep-Sequencing Technology.

    PubMed

    Kravatsky, Yuri; Chechetkin, Vladimir; Fedoseeva, Daria; Gorbacheva, Maria; Kravatskaya, Galina; Kretova, Olga; Tchurikov, Nickolai

    2017-11-23

    The efficient development of antiviral drugs, including efficient antiviral small interfering RNAs (siRNAs), requires continuous monitoring of the strict correspondence between a drug and the related highly variable viral DNA/RNA target(s). Deep sequencing is able to provide an assessment of both the general target conservation and the frequency of particular mutations in the different target sites. The aim of this study was to develop a reliable bioinformatic pipeline for the analysis of millions of short, deep sequencing reads corresponding to selected highly variable viral sequences that are drug target(s). The suggested bioinformatic pipeline combines the available programs and the ad hoc scripts based on an original algorithm of the search for the conserved targets in the deep sequencing data. We also present the statistical criteria for the threshold of reliable mutation detection and for the assessment of variations between corresponding data sets. These criteria are robust against the possible sequencing errors in the reads. As an example, the bioinformatic pipeline is applied to the study of the conservation of RNA interference (RNAi) targets in human immunodeficiency virus 1 (HIV-1) subtype A. The developed pipeline is freely available to download at the website http://virmut.eimb.ru/. Brief comments and comparisons between VirMut and other pipelines are also presented.

  1. Genome-wide analysis of putative peroxiredoxin in unicellular and filamentous cyanobacteria.

    PubMed

    Cui, Hongli; Wang, Yipeng; Wang, Yinchu; Qin, Song

    2012-11-16

    Cyanobacteria are photoautotrophic prokaryotes with wide variations in genome sizes and ecological habitats. Peroxiredoxin (PRX) is an important protein that plays essential roles in protecting own cells against reactive oxygen species (ROS). PRXs have been identified from mammals, fungi and higher plants. However, knowledge on cyanobacterial PRXs still remains obscure. With the availability of 37 sequenced cyanobacterial genomes, we performed a comprehensive comparative analysis of PRXs and explored their diversity, distribution, domain structure and evolution. Overall 244 putative prx genes were identified, which were abundant in filamentous diazotrophic cyanobacteria, Acaryochloris marina MBIC 11017, and unicellular cyanobacteria inhabiting freshwater and hot-springs, while poor in all Prochlorococcus and marine Synechococcus strains. Among these putative genes, 25 open reading frames (ORFs) encoding hypothetical proteins were identified as prx gene family members and the others were already annotated as prx genes. All 244 putative PRXs were classified into five major subfamilies (1-Cys, 2-Cys, BCP, PRX5_like, and PRX-like) according to their domain structures. The catalytic motifs of the cyanobacterial PRXs were similar to those of eukaryotic PRXs and highly conserved in all but the PRX-like subfamily. Classical motif (CXXC) of thioredoxin was detected in protein sequences from the PRX-like subfamily. Phylogenetic tree constructed of catalytic domains coincided well with the domain structures of PRXs and the phylogenies based on 16s rRNA. The distribution of genes encoding PRXs in different unicellular and filamentous cyanobacteria especially those sub-families like PRX-like or 1-Cys PRX correlate with the genome size, eco-physiology, and physiological properties of the organisms. Cyanobacterial and eukaryotic PRXs share similar conserved motifs, indicating that cyanobacteria adopt similar catalytic mechanisms as eukaryotes. All cyanobacterial PRX proteins share highly similar structures, implying that these genes may originate from a common ancestor. In this study, a general framework of the sequence-structure-function connections of the PRXs was revealed, which may facilitate functional investigations of PRXs in various organisms.

  2. Genome-wide analysis of putative peroxiredoxin in unicellular and filamentous cyanobacteria

    PubMed Central

    2012-01-01

    Background Cyanobacteria are photoautotrophic prokaryotes with wide variations in genome sizes and ecological habitats. Peroxiredoxin (PRX) is an important protein that plays essential roles in protecting own cells against reactive oxygen species (ROS). PRXs have been identified from mammals, fungi and higher plants. However, knowledge on cyanobacterial PRXs still remains obscure. With the availability of 37 sequenced cyanobacterial genomes, we performed a comprehensive comparative analysis of PRXs and explored their diversity, distribution, domain structure and evolution. Results Overall 244 putative prx genes were identified, which were abundant in filamentous diazotrophic cyanobacteria, Acaryochloris marina MBIC 11017, and unicellular cyanobacteria inhabiting freshwater and hot-springs, while poor in all Prochlorococcus and marine Synechococcus strains. Among these putative genes, 25 open reading frames (ORFs) encoding hypothetical proteins were identified as prx gene family members and the others were already annotated as prx genes. All 244 putative PRXs were classified into five major subfamilies (1-Cys, 2-Cys, BCP, PRX5_like, and PRX-like) according to their domain structures. The catalytic motifs of the cyanobacterial PRXs were similar to those of eukaryotic PRXs and highly conserved in all but the PRX-like subfamily. Classical motif (CXXC) of thioredoxin was detected in protein sequences from the PRX-like subfamily. Phylogenetic tree constructed of catalytic domains coincided well with the domain structures of PRXs and the phylogenies based on 16s rRNA. Conclusions The distribution of genes encoding PRXs in different unicellular and filamentous cyanobacteria especially those sub-families like PRX-like or 1-Cys PRX correlate with the genome size, eco-physiology, and physiological properties of the organisms. Cyanobacterial and eukaryotic PRXs share similar conserved motifs, indicating that cyanobacteria adopt similar catalytic mechanisms as eukaryotes. All cyanobacterial PRX proteins share highly similar structures, implying that these genes may originate from a common ancestor. In this study, a general framework of the sequence-structure-function connections of the PRXs was revealed, which may facilitate functional investigations of PRXs in various organisms. PMID:23157370

  3. Deep sequencing of small RNA libraries from human prostate epithelial and stromal cells reveal distinct pattern of microRNAs primarily predicted to target growth factors.

    PubMed

    Singh, Savita; Zheng, Yun; Jagadeeswaran, Guru; Ebron, Jey Sabith; Sikand, Kavleen; Gupta, Sanjay; Sunker, Ramanjulu; Shukla, Girish C

    2016-02-28

    Complex epithelial and stromal cell interactions are required during the development and progression of prostate cancer. Regulatory small non-coding microRNAs (miRNAs) participate in the spatiotemporal regulation of messenger RNA (mRNA) and regulation of translation affecting a large number of genes involved in prostate carcinogenesis. In this study, through deep-sequencing of size fractionated small RNA libraries we profiled the miRNAs of prostate epithelial (PrEC) and stromal (PrSC) cells. Over 50 million reads were obtained for PrEC in which 860,468 were unique sequences. Similarly, nearly 76 million reads for PrSC were obtained in which over 1 million were unique reads. Expression of many miRNAs of broadly conserved and poorly conserved miRNA families were identified. Sixteen highly expressed miRNAs with significant change in expression in PrSC than PrEC were further analyzed in silico. ConsensusPathDB showed the target genes of these miRNAs were significantly involved in adherence junction, cell adhesion, EGRF, TGF-β and androgen signaling. Let-7 family of tumor-suppressor miRNAs expression was highly pervasive in both, PrEC and PrSC cells. In addition, we have also identified several miRNAs that are unique to PrEC or PrSC cells and their predicted putative targets are a group of transcription factors. This study provides perspective on the miRNA expression in PrEC and PrSC, and reveals a global trend in miRNA interactome. We conclude that the most abundant miRNAs are potential regulators of development and differentiation of the prostate gland by targeting a set of growth factors. Additionally, high level expression of the most members of let-7 family miRNAs suggests their role in the fine tuning of the growth and proliferation of prostate epithelial and stromal cells. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  4. Characterization and Evolution of Conserved MicroRNA through Duplication Events in Date Palm (Phoenix dactylifera)

    PubMed Central

    Yang, Yaodong; Mason, Annaliese S.; Lei, Xintao; Ma, Zilong

    2013-01-01

    MicroRNAs (miRNAs) are important regulators of gene expression at the post-transcriptional level in a wide range of species. Highly conserved miRNAs regulate ancestral transcription factors common to all plants, and control important basic processes such as cell division and meristem function. We selected 21 conserved miRNA families to analyze the distribution and maintenance of miRNAs. Recently, the first genome sequence in Palmaceae was released: date palm (Phoenix dactylifera). We conducted a systematic miRNA analysis in date palm, computationally identifying and characterizing the distribution and duplication of conserved miRNAs in this species compared to other published plant genomes. A total of 81 miRNAs belonging to 18 miRNA families were identified in date palm. The majority of miRNAs in date palm and seven other well-studied plant species were located in intergenic regions and located 4 to 5 kb away from the nearest protein-coding genes. Sequence comparison showed that 67% of date palm miRNA members were present in duplicated segments, and that 135 pairs of miRNA-containing segments were duplicated in Arabidopsis, tomato, orange, rice, apple, poplar and soybean with a high similarity of non coding sequences between duplicated segments, indicating genomic duplication was a major force for expansion of conserved miRNAs. Duplicated miRNA pairs in date palm showed divergence in pre-miRNA sequence and in number of promoters, implying that these duplicated pairs may have undergone divergent evolution. Comparisons between date palm and the seven other plant species for the gain/loss of miR167 loci in an ancient segment shared between monocots and dicots suggested that these conserved miRNAs were highly influenced by and diverged as a result of genomic duplication events. PMID:23951162

  5. Characterization and evolution of conserved MicroRNA through duplication events in date palm (Phoenix dactylifera).

    PubMed

    Xiao, Yong; Xia, Wei; Yang, Yaodong; Mason, Annaliese S; Lei, Xintao; Ma, Zilong

    2013-01-01

    MicroRNAs (miRNAs) are important regulators of gene expression at the post-transcriptional level in a wide range of species. Highly conserved miRNAs regulate ancestral transcription factors common to all plants, and control important basic processes such as cell division and meristem function. We selected 21 conserved miRNA families to analyze the distribution and maintenance of miRNAs. Recently, the first genome sequence in Palmaceae was released: date palm (Phoenix dactylifera). We conducted a systematic miRNA analysis in date palm, computationally identifying and characterizing the distribution and duplication of conserved miRNAs in this species compared to other published plant genomes. A total of 81 miRNAs belonging to 18 miRNA families were identified in date palm. The majority of miRNAs in date palm and seven other well-studied plant species were located in intergenic regions and located 4 to 5 kb away from the nearest protein-coding genes. Sequence comparison showed that 67% of date palm miRNA members were present in duplicated segments, and that 135 pairs of miRNA-containing segments were duplicated in Arabidopsis, tomato, orange, rice, apple, poplar and soybean with a high similarity of non coding sequences between duplicated segments, indicating genomic duplication was a major force for expansion of conserved miRNAs. Duplicated miRNA pairs in date palm showed divergence in pre-miRNA sequence and in number of promoters, implying that these duplicated pairs may have undergone divergent evolution. Comparisons between date palm and the seven other plant species for the gain/loss of miR167 loci in an ancient segment shared between monocots and dicots suggested that these conserved miRNAs were highly influenced by and diverged as a result of genomic duplication events.

  6. The Trichoplax Genome and the Nature of Placozoans

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Srivastava, Mansi; Begovic, Emina; Chapman, Jarrod

    2008-08-01

    Placozoans are arguably the simplest free-living animals, possibly evoking an early stage in metazoan evolution, yet their biology is poorly understood. Here we report the sequencing and analysis of the {approx}98 million base pair nuclear genome of the placozoan Trichoplax adhaerens. Whole genome phylogenetic analysis suggests that placozoans belong to a 'eumetazoan' clade that includes cnidarians and bilaterians, with sponges as the earliest diverging animals. The compact genome exhibits conserved gene content, gene structure, and synteny relative to the human and other complex eumetazoan genomes. Despite the apparent cellular and organismal simplicity of Trichoplax, its genome encodes a rich arraymore » of transcription factor and signaling pathway genes that are typically associated with diverse cell types and developmental processes in eumetazoans, motivating further searches for cryptic cellular complexity and/or as yet unobserved life history stages.« less

  7. The rise of the ruling reptiles and ecosystem recovery from the Permo-Triassic mass extinction.

    PubMed

    Ezcurra, Martín D; Butler, Richard J

    2018-06-13

    One of the key faunal transitions in Earth history occurred after the Permo-Triassic mass extinction ( ca 252.2 Ma), when the previously obscure archosauromorphs (which include crocodylians, dinosaurs and birds) become the dominant terrestrial vertebrates. Here, we place all known middle Permian-early Late Triassic archosauromorph species into an explicit phylogenetic context, and quantify biodiversity change through this interval. Our results indicate the following sequence of diversification: a morphologically conservative and globally distributed post-extinction 'disaster fauna'; a major but cryptic and poorly sampled phylogenetic diversification with significantly elevated evolutionary rates; and a marked increase in species counts, abundance, and disparity contemporaneous with global ecosystem stabilization some 5 million years after the extinction. This multiphase event transformed global ecosystems, with far-reaching consequences for Mesozoic and modern faunas. © 2018 The Author(s).

  8. Systematic palaeontology of the Perisphinctoidea in the Jurassic/Cretaceous boundary interval at Le Chouet (Drôme, France), and its implications for biostratigraphy

    NASA Astrophysics Data System (ADS)

    Frau, Camille; Bulot, Luc G.; Wimbledon, William A. P.; Ifrim, Christina

    2016-06-01

    This study describes ammonite taxa of the Perisphinctoidea in the Jurassic/Cretaceous boundary interval at Le Chouet (Drôme, France). Emphasis is placed on new and poorly known Himalayitidae, Neocomitidae and Olcostephanidae from the lower part of the Jacobi Zone auctorum. Significant results relate the introduction of Lopeziceras gen. nov., grouping himalayitid-like forms with two rows of tubercles, and Praedalmasiceras gen. nov., grouping the early Berriasian Dalmasiceras taxa. Study of the ontogenetic sequences of both genera show that they were derived from late Tithonian Himalayitidae. This supports the distinction between the subfamilies Himalayitinae and Dalmasiceratidae subfam. nov. Content, variation, dimorphism and vertical range of the Neocomitidae Berriasella, Pseudoneocomites, Elenaella and Delphinella are discussed. A conservative use of the Olcostephanidae Proniceras is followed herein.

  9. The La-related protein 1-specific domain repurposes HEAT-like repeats to directly bind a 5'TOP sequence.

    PubMed

    Lahr, Roni M; Mack, Seshat M; Héroux, Annie; Blagden, Sarah P; Bousquet-Antonelli, Cécile; Deragon, Jean-Marc; Berman, Andrea J

    2015-09-18

    La-related protein 1 (LARP1) regulates the stability of many mRNAs. These include 5'TOPs, mTOR-kinase responsive mRNAs with pyrimidine-rich 5' UTRs, which encode ribosomal proteins and translation factors. We determined that the highly conserved LARP1-specific C-terminal DM15 region of human LARP1 directly binds a 5'TOP sequence. The crystal structure of this DM15 region refined to 1.86 Å resolution has three structurally related and evolutionarily conserved helix-turn-helix modules within each monomer. These motifs resemble HEAT repeats, ubiquitous helical protein-binding structures, but their sequences are inconsistent with consensus sequences of known HEAT modules, suggesting this structure has been repurposed for RNA interactions. A putative mTORC1-recognition sequence sits within a flexible loop C-terminal to these repeats. We also present modelling of pyrimidine-rich single-stranded RNA onto the highly conserved surface of the DM15 region. These studies lay the foundation necessary for proceeding toward a structural mechanism by which LARP1 links mTOR signalling to ribosome biogenesis. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Analysis of the intergenic region of tomato spotted wilt Tospovirus medium RNA segment.

    PubMed

    Bhat, A I; Pappu, S S; Pappu, H R; Deom, C M; Culbreath, A K

    1999-06-01

    The intergenic region (IGR) of the medium (M) RNA of tomato spotted wilt Tospovirus (TSWV) isolates naturally infecting peanut (groundnut), pepper, potato, stokesia, tobacco and watermelon in Georgia (GA) and a peanut isolate from Florida (FL) was cloned and sequenced. The IGR sequences were compared with one another and with respective M RNA IGRs of TSWV isolates from Brazil and Japan and other tospoviruses. The length of M IGR of GA and FL isolates varied from 271 to 277 nucleotides. The M IGRs of TSWV from potato and stokesia, and tobacco and watermelon were identical with each other in their length and sequence. IGR sequences were more conserved (95-100%) among the populations of TSWV from GA and FL, than when compared with those of TSWV isolates from other countries (83-94%). The conserved motif (CAAACTTTGG) present in the IGRs of both M and small (S) RNAs of a Brazilian isolate of TSWV was also conserved in the isolates studied. Cluster analysis of the IGR sequences showed that all GA and FL isolates are closely clustered and are distinct from the TSWV isolates from other countries as well as from other tospoviruses.

  11. The La-related protein 1-specific domain repurposes HEAT-like repeats to directly bind a 5'TOP sequence

    DOE PAGES

    Lahr, Roni M.; Mack, Seshat M.; Heroux, Annie; ...

    2015-07-22

    La-related protein 1 (LARP1) regulates the stability of many mRNAs. These include 5'TOPs, mTOR-kinase responsive mRNAs with pyrimidine-rich 5' UTRs, which encode ribosomal proteins and translation factors. We determined that the highly conserved LARP1-specific C-terminal DM15 region of human LARP1 directly binds a 5'TOP sequence. The crystal structure of this DM15 region refined to 1.86 Å resolution has three structurally related and evolutionarily conserved helix-turn-helix modules within each monomer. These motifs resemble HEAT repeats, ubiquitous helical protein-binding structures, but their sequences are inconsistent with consensus sequences of known HEAT modules, suggesting this structure has been repurposed for RNA interactions. Amore » putative mTORC1-recognition sequence sits within a flexible loop C-terminal to these repeats. We also present modelling of pyrimidine-rich single-stranded RNA onto the highly conserved surface of the DM15 region. Ultimately, these studies lay the foundation necessary for proceeding toward a structural mechanism by which LARP1 links mTOR signalling to ribosome biogenesis.« less

  12. Insilico profiling of microRNAs in Korean ginseng (Panax ginseng Meyer)

    PubMed Central

    Mathiyalagan, Ramya; Subramaniyam, Sathiyamoorthy; Natarajan, Sathishkumar; Kim, Yeon Ju; Sun, Myung Suk; Kim, Se Young; Kim, Yu-Jin; Yang, Deok Chun

    2013-01-01

    MicroRNAs (miRNAs) are a class of recently discovered non-coding small RNA molecules, on average approximately 21 nucleotides in length, which underlie numerous important biological roles in gene regulation in various organisms. The miRNA database (release 18) has 18,226 miRNAs, which have been deposited from different species. Although miRNAs have been identified and validated in many plant species, no studies have been reported on discovering miRNAs in Panax ginseng Meyer, which is a traditionally known medicinal plant in oriental medicine, also known as Korean ginseng. It has triterpene ginseng saponins called ginsenosides, which are responsible for its various pharmacological activities. Predicting conserved miRNAs by homology-based analysis with available expressed sequence tag (EST) sequences can be powerful, if the species lacks whole genome sequence information. In this study by using the EST based computational approach, 69 conserved miRNAs belonging to 44 miRNA families were identified in Korean ginseng. The digital gene expression patterns of predicted conserved miRNAs were analyzed by deep sequencing using small RNA sequences of flower buds, leaves, and lateral roots. We have found that many of the identified miRNAs showed tissue specific expressions. Using the insilico method, 346 potential targets were identified for the predicted 69 conserved miRNAs by searching the ginseng EST database, and the predicted targets were mainly involved in secondary metabolic processes, responses to biotic and abiotic stress, and transcription regulator activities, as well as a variety of other metabolic processes. PMID:23717176

  13. [Analysis of genotype and phenotype correlation of MYH7-V878A mutation among ethnic Han Chinese pedigrees affected with hypertrophic cardiomyopathy].

    PubMed

    Wang, Bo; Guo, Ruiqi; Zuo, Lei; Shao, Hong; Liu, Ying; Wang, Yu; Ju, Yan; Sun, Chao; Wang, Lifeng; Zhang, Yanmin; Liu, Liwen

    2017-08-10

    To analyze the phenotype-genotype correlation of MYH7-V878A mutation. Exonic amplification and high-throughput sequencing of 96-cardiovascular disease-related genes were carried out on probands from 210 pedigrees affected with hypertrophic cardiomyopathy (HCM). For the probands, their family members, and 300 healthy volunteers, the identified MYH7-V878A mutation was verified by Sanger sequencing. Information of the HCM patients and their family members, including clinical data, physical examination, echocardiography (UCG), electrocardiography (ECG), and conserved sequence of the mutation among various species were analyzed. A MYH7-V878A mutation was detected in five HCM pedigrees containing 31 family members. Fourteen members have carried the mutation, among whom 11 were diagnosed with HCM, while 3 did not meet the diagnostic criteria. Some of the fourteen members also carried other mutations. Family members not carrying the mutation had normal UCG and ECG. No MYH7-V878A mutation was found among the 300 healthy volunteers. Analysis of sequence conservation showed that the amino acid is located in highly conserved regions among various species. MYH7-V878A is a hot spot among ethnic Han Chinese with a high penetrance. Functional analysis of the conserved sequences suggested that the mutation may cause significant alteration of the function. MYH7-V878A has a significant value for the early diagnosis of HCM.

  14. Redescription of Marstonia comalensis (Pilsbry & Ferriss, 1906), a poorly known and possibly threatened freshwater gastropod from the Edwards Plateau region (Texas)

    PubMed Central

    Hershler, Robert; Liu, Hsiu-Ping

    2011-01-01

    Abstract Marstonia comalensis, a poorly known nymphophiline gastropod (originally described from Comal Creek, Texas) that has often been confused with Cincinnatia integra, is re-described and the generic placement of this species, which was recently allocated to Marstonia based on unpublished evidence, is confirmed by anatomical study. Marstonia comalensis is a large congener having an ovate-conic, openly umbilicate shell and penis having a short filament and oblique, squarish lobe bearing a narrow gland along its distal edge. It is well differentiated morphologically from congeners having similar shells and penes and is also genetically divergent relative to those congeners that have been sequenced (mtCOI divergence 3.0–8.5%). A Bayesian analysis of a small COI dataset resolved Marstonia comalensis in a poorly supported sub-clade together with Marstonia hershleri, Marstonia lustrica and Marstonia pachyta. The predominantly new records presented herein indicate that Marstonia comalensis was historically distributed in the upper portions of the Brazos, Colorado, Guadalupe and Nueces River basins, south-central Texas. The species has been live collected at only 12 localities and only two of these have been re-visited since 1993. These data suggest that the conservation status of this snail, which has a critically imperiled (G1) NatureServe ranking and was recently proposed for federal listing, needs to be re-assessed. PMID:21594148

  15. Dominant Sequences of Human Major Histocompatibility Complex Conserved Extended Haplotypes from HLA-DQA2 to DAXX

    PubMed Central

    Larsen, Charles E.; Alford, Dennis R.; Trautwein, Michael R.; Jalloh, Yanoh K.; Tarnacki, Jennifer L.; Kunnenkeri, Sushruta K.; Fici, Dolores A.; Yunis, Edmond J.; Awdeh, Zuheir L.; Alper, Chester A.

    2014-01-01

    We resequenced and phased 27 kb of DNA within 580 kb of the MHC class II region in 158 population chromosomes, most of which were conserved extended haplotypes (CEHs) of European descent or contained their centromeric fragments. We determined the single nucleotide polymorphism and deletion-insertion polymorphism alleles of the dominant sequences from HLA-DQA2 to DAXX for these CEHs. Nine of 13 CEHs remained sufficiently intact to possess a dominant sequence extending at least to DAXX, 230 kb centromeric to HLA-DPB1. We identified the regions centromeric to HLA-DQB1 within which single instances of eight “common” European MHC haplotypes previously sequenced by the MHC Haplotype Project (MHP) were representative of those dominant CEH sequences. Only two MHP haplotypes had a dominant CEH sequence throughout the centromeric and extended class II region and one MHP haplotype did not represent a known European CEH anywhere in the region. We identified the centromeric recombination transition points of other MHP sequences from CEH representation to non-representation. Several CEH pairs or groups shared sequence identity in small blocks but had significantly different (although still conserved for each separate CEH) sequences in surrounding regions. These patterns partly explain strong calculated linkage disequilibrium over only short (tens to hundreds of kilobases) distances in the context of a finite number of observed megabase-length CEHs comprising half a population's haplotypes. Our results provide a clearer picture of European CEH class II allelic structure and population haplotype architecture, improved regional CEH markers, and raise questions concerning regional recombination hotspots. PMID:25299700

  16. Targeted next-generation sequencing in chronic lymphocytic leukemia: a high-throughput yet tailored approach will facilitate implementation in a clinical setting.

    PubMed

    Sutton, Lesley-Ann; Ljungström, Viktor; Mansouri, Larry; Young, Emma; Cortese, Diego; Navrkalova, Veronika; Malcikova, Jitka; Muggen, Alice F; Trbusek, Martin; Panagiotidis, Panagiotis; Davi, Frederic; Belessi, Chrysoula; Langerak, Anton W; Ghia, Paolo; Pospisilova, Sarka; Stamatopoulos, Kostas; Rosenquist, Richard

    2015-03-01

    Next-generation sequencing has revealed novel recurrent mutations in chronic lymphocytic leukemia, particularly in patients with aggressive disease. Here, we explored targeted re-sequencing as a novel strategy to assess the mutation status of genes with prognostic potential. To this end, we utilized HaloPlex targeted enrichment technology and designed a panel including nine genes: ATM, BIRC3, MYD88, NOTCH1, SF3B1 and TP53, which have been linked to the prognosis of chronic lymphocytic leukemia, and KLHL6, POT1 and XPO1, which are less characterized but were found to be recurrently mutated in various sequencing studies. A total of 188 chronic lymphocytic leukemia patients with poor prognostic features (unmutated IGHV, n=137; IGHV3-21 subset #2, n=51) were sequenced on the HiSeq 2000 and data were analyzed using well-established bioinformatics tools. Using a conservative cutoff of 10% for the mutant allele, we found that 114/180 (63%) patients carried at least one mutation, with mutations in ATM, BIRC3, NOTCH1, SF3B1 and TP53 accounting for 149/177 (84%) of all mutations. We selected 155 mutations for Sanger validation (variant allele frequency, 10-99%) and 93% (144/155) of mutations were confirmed; notably, all 11 discordant variants had a variant allele frequency between 11-27%, hence at the detection limit of conventional Sanger sequencing. Technical precision was assessed by repeating the entire HaloPlex procedure for 63 patients; concordance was found for 77/82 (94%) mutations. In summary, this study demonstrates that targeted next-generation sequencing is an accurate and reproducible technique potentially suitable for routine screening, eventually as a stand-alone test without the need for confirmation by Sanger sequencing. Copyright© Ferrata Storti Foundation.

  17. Sequence Diversity Diagram for comparative analysis of multiple sequence alignments.

    PubMed

    Sakai, Ryo; Aerts, Jan

    2014-01-01

    The sequence logo is a graphical representation of a set of aligned sequences, commonly used to depict conservation of amino acid or nucleotide sequences. Although it effectively communicates the amount of information present at every position, this visual representation falls short when the domain task is to compare between two or more sets of aligned sequences. We present a new visual presentation called a Sequence Diversity Diagram and validate our design choices with a case study. Our software was developed using the open-source program called Processing. It loads multiple sequence alignment FASTA files and a configuration file, which can be modified as needed to change the visualization. The redesigned figure improves on the visual comparison of two or more sets, and it additionally encodes information on sequential position conservation. In our case study of the adenylate kinase lid domain, the Sequence Diversity Diagram reveals unexpected patterns and new insights, for example the identification of subgroups within the protein subfamily. Our future work will integrate this visual encoding into interactive visualization tools to support higher level data exploration tasks.

  18. T cell receptor repertoires of mice and humans are clustered in similarity networks around conserved public CDR3 sequences

    PubMed Central

    Madi, Asaf; Poran, Asaf; Shifrut, Eric; Reich-Zeliger, Shlomit; Greenstein, Erez; Zaretsky, Irena; Arnon, Tomer; Laethem, Francois Van; Singer, Alfred; Lu, Jinghua; Sun, Peter D; Cohen, Irun R; Friedman, Nir

    2017-01-01

    Diversity of T cell receptor (TCR) repertoires, generated by somatic DNA rearrangements, is central to immune system function. However, the level of sequence similarity of TCR repertoires within and between species has not been characterized. Using network analysis of high-throughput TCR sequencing data, we found that abundant CDR3-TCRβ sequences were clustered within networks generated by sequence similarity. We discovered a substantial number of public CDR3-TCRβ segments that were identical in mice and humans. These conserved public sequences were central within TCR sequence-similarity networks. Annotated TCR sequences, previously associated with self-specificities such as autoimmunity and cancer, were linked to network clusters. Mechanistically, CDR3 networks were promoted by MHC-mediated selection, and were reduced following immunization, immune checkpoint blockade or aging. Our findings provide a new view of T cell repertoire organization and physiology, and suggest that the immune system distributes its TCR sequences unevenly, attending to specific foci of reactivity. DOI: http://dx.doi.org/10.7554/eLife.22057.001 PMID:28731407

  19. Beta-globin locus activation regions: conservation of organization, structure, and function.

    PubMed Central

    Li, Q L; Zhou, B; Powers, P; Enver, T; Stamatoyannopoulos, G

    1990-01-01

    The human beta-globin locus activation region (LAR) comprises four erythroid-specific DNase I hypersensitive sites (I-IV) thought to be largely responsible for activating the beta-globin domain and facilitating high-level erythroid-specific globin gene expression. We identified the goat beta-globin LAR, determined 10.2 kilobases of its sequence, and demonstrated its function in transgenic mice. The human and goat LARs share 6.5 kilobases of homologous sequences that are as highly conserved as the epsilon-globin gene promoters. Furthermore, the overall spatial organization of the two LARs has been conserved. These results suggest that the functionally relevant regions of the LAR are large and that in addition to their primary structure, the spatial relationship of the conserved elements is important for LAR function. Images PMID:2236034

  20. Environmental globalization, organizational form, and expected benefits from protected areas in Central America

    Treesearch

    Max J. Pfeffer; John W. Schelhas; Catherine Meola

    2006-01-01

    Environmental globalization has led to the implementation of conservation efforts like the creation of protected areas that often promote the interests of core countries in poorer regions. The creation of protected areas in poor areas frequently creates tensions between human needs like - food and shelter and environmental conservation. Support for such conservation...

  1. Physical mapping of repetitive DNA suggests 2n reduction in Amazon turtles Podocnemis (Testudines: Podocnemididae)

    PubMed Central

    Cavalcante, Manoella Gemaque; Bastos, Carlos Eduardo Matos Carvalho; Nagamachi, Cleusa Yoshiko; Pieczarka, Julio Cesar; Vicari, Marcelo Ricardo; Noronha, Renata Coelho Rodrigues

    2018-01-01

    Cytogenetic studies show that there is great karyotypic diversity in order Testudines (2n = 26–68), and that this may be mainly attributed to the presence/absence of microchromosomes. Members of the Podocnemididae family have the smallest diploid numbers of this order (2n = 26–28), which may be a derived condition of the group. Diverse studies suggest that repetitive-DNA-rich sites generally act as hotspots for double-strand breaks and chromosomal reorganization. In this context, we used fluorescent in situ hybridization (FISH) to map telomeric sequences (TTAGGG)n, 45S rDNA, and the genes encoding histones H1 and H3 in two species of genus Podocnemis. We also observed conservation of the 45S rDNA and H1 histone sequences (probable case of conserved synteny), but multiple conserved and non-conserved clusters of H3 genes, which colocalized with the interstitial telomeric sequences in the Podocnemis genome. Our results suggest that fusions have occurred between macro and microchromosomes or between microchromosomes, leading to the observed reduction in diploid number in the family Podocnemididae. PMID:29813087

  2. Physical mapping of repetitive DNA suggests 2n reduction in Amazon turtles Podocnemis (Testudines: Podocnemididae).

    PubMed

    Cavalcante, Manoella Gemaque; Bastos, Carlos Eduardo Matos Carvalho; Nagamachi, Cleusa Yoshiko; Pieczarka, Julio Cesar; Vicari, Marcelo Ricardo; Noronha, Renata Coelho Rodrigues

    2018-01-01

    Cytogenetic studies show that there is great karyotypic diversity in order Testudines (2n = 26-68), and that this may be mainly attributed to the presence/absence of microchromosomes. Members of the Podocnemididae family have the smallest diploid numbers of this order (2n = 26-28), which may be a derived condition of the group. Diverse studies suggest that repetitive-DNA-rich sites generally act as hotspots for double-strand breaks and chromosomal reorganization. In this context, we used fluorescent in situ hybridization (FISH) to map telomeric sequences (TTAGGG)n, 45S rDNA, and the genes encoding histones H1 and H3 in two species of genus Podocnemis. We also observed conservation of the 45S rDNA and H1 histone sequences (probable case of conserved synteny), but multiple conserved and non-conserved clusters of H3 genes, which colocalized with the interstitial telomeric sequences in the Podocnemis genome. Our results suggest that fusions have occurred between macro and microchromosomes or between microchromosomes, leading to the observed reduction in diploid number in the family Podocnemididae.

  3. Conservation of the glycoprotein B homologs of the Kaposi’s sarcoma-associated herpesvirus (KSHV/HHV8) and Old World primate rhadinoviruses of chimpanzees and macaques

    PubMed Central

    Bruce, A. Gregory; Horst, Jeremy A.; Rose, Timothy M.

    2016-01-01

    The envelope-associated glycoprotein B (gB) is highly conserved within the Herpesviridae and plays a critical role in viral entry. We analyzed the evolutionary conservation of sequence and structural motifs within the Kaposi’s sarcoma-associated herpesvirus (KSHV) gB and homologs of Old World primate rhadinoviruses belonging to the distinct RV1 and RV2 rhadinovirus lineages. In addition to gB homologs of rhadinoviruses infecting the pig-tailed and rhesus macaques, we cloned and sequenced gB homologs of RV1 and RV2 rhadinoviruses infecting chimpanzees. A structural model of the KSHV gB was determined, and functional motifs and sequence variants were mapped to the model structure. Conserved domains and motifs were identified, including an “RGD” motif that plays a critical role in KSHV binding and entry through the cellular integrin αVβ3. The RGD motif was only detected in RV1 rhadinoviruses suggesting an important difference in cell tropism between the two rhadinovirus lineages. PMID:27070755

  4. Sample sequencing of vascular plants demonstrates widespread conservation and divergence of microRNAs.

    PubMed

    Chávez Montes, Ricardo A; de Fátima Rosas-Cárdenas, Flor; De Paoli, Emanuele; Accerbi, Monica; Rymarquis, Linda A; Mahalingam, Gayathri; Marsch-Martínez, Nayelli; Meyers, Blake C; Green, Pamela J; de Folter, Stefan

    2014-04-23

    Small RNAs are pivotal regulators of gene expression that guide transcriptional and post-transcriptional silencing mechanisms in eukaryotes, including plants. Here we report a comprehensive atlas of sRNA and miRNA from 3 species of algae and 31 representative species across vascular plants, including non-model plants. We sequence and quantify sRNAs from 99 different tissues or treatments across species, resulting in a data set of over 132 million distinct sequences. Using miRBase mature sequences as a reference, we identify the miRNA sequences present in these libraries. We apply diverse profiling methods to examine critical sRNA and miRNA features, such as size distribution, tissue-specific regulation and sequence conservation between species, as well as to predict putative new miRNA sequences. We also develop database resources, computational analysis tools and a dedicated website, http://smallrna.udel.edu/. This study provides new insights on plant sRNAs and miRNAs, and a foundation for future studies.

  5. Diatom centromeres suggest a mechanism for nuclear DNA acquisition

    DOE PAGES

    Diner, Rachel E.; Noddings, Chari M.; Lian, Nathan C.; ...

    2017-07-18

    Centromeres are essential for cell division and growth in all eukaryotes, and knowledge of their sequence and structure guides the development of artificial chromosomes for functional cellular biology studies. Centromeric proteins are conserved among eukaryotes; however, centromeric DNA sequences are highly variable. We combined forward and reverse genetic approaches with chromatin immunoprecipitation to identify centromeres of the model diatom Phaeodactylum tricornutum. We observed 25 unique centromere sequences typically occurring once per chromosome, a finding that helps to resolve nuclear genome organization and indicates monocentric regional centromeres. Diatom centromere sequences contain low-GC content regions but lack repeats or other conserved sequencemore » features. Native and foreign sequences with similar GC content to P. tricornutum centromeres can maintain episomes and recruit the diatom centromeric histone protein CENH3, suggesting nonnative sequences can also function as diatom centromeres. Thus, simple sequence requirements may enable DNA from foreign sources to persist in the nucleus as extrachromosomal episomes, revealing a potential mechanism for organellar and foreign DNA acquisition.« less

  6. Diatom centromeres suggest a mechanism for nuclear DNA acquisition

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Diner, Rachel E.; Noddings, Chari M.; Lian, Nathan C.

    Centromeres are essential for cell division and growth in all eukaryotes, and knowledge of their sequence and structure guides the development of artificial chromosomes for functional cellular biology studies. Centromeric proteins are conserved among eukaryotes; however, centromeric DNA sequences are highly variable. We combined forward and reverse genetic approaches with chromatin immunoprecipitation to identify centromeres of the model diatom Phaeodactylum tricornutum. We observed 25 unique centromere sequences typically occurring once per chromosome, a finding that helps to resolve nuclear genome organization and indicates monocentric regional centromeres. Diatom centromere sequences contain low-GC content regions but lack repeats or other conserved sequencemore » features. Native and foreign sequences with similar GC content to P. tricornutum centromeres can maintain episomes and recruit the diatom centromeric histone protein CENH3, suggesting nonnative sequences can also function as diatom centromeres. Thus, simple sequence requirements may enable DNA from foreign sources to persist in the nucleus as extrachromosomal episomes, revealing a potential mechanism for organellar and foreign DNA acquisition.« less

  7. Validation of Skeletal Muscle cis-Regulatory Module Predictions Reveals Nucleotide Composition Bias in Functional Enhancers

    PubMed Central

    Kwon, Andrew T.; Chou, Alice Yi; Arenillas, David J.; Wasserman, Wyeth W.

    2011-01-01

    We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs) using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions. PMID:22144875

  8. Hop stunt viroid: molecular cloning and nucleotide sequence of the complete cDNA copy.

    PubMed Central

    Ohno, T; Takamatsu, N; Meshi, T; Okada, Y

    1983-01-01

    The complete cDNA of hop stunt viroid (HSV) has been cloned by the method of Okayama and Berg (Mol.Cell.Biol.2,161-170. (1982] and the complete nucleotide sequence has been established. The covalently closed circular single-stranded HSV RNA consists of 297 nucleotides. The secondary structure predicted for HSV contains 67% of its residues base-paired. The native HSV can possess an extended rod-like structure characteristic of viroids previously established. The central region of the native HSV has a similar structure to the conserved region found in all viroids sequenced so far except for avocado sunblotch viroid. The sequence homologous to the 5'-end of U1a RNA is also found in the sequence of HSV but not in the central conserved region. Images PMID:6312412

  9. Genomewide Function Conservation and Phylogeny in the Herpesviridae

    PubMed Central

    Albà, M. Mar; Das, Rhiju; Orengo, Christine A.; Kellam, Paul

    2001-01-01

    The Herpesviridae are a large group of well-characterized double-stranded DNA viruses for which many complete genome sequences have been determined. We have extracted protein sequences from all predicted open reading frames of 19 herpesvirus genomes. Sequence comparison and protein sequence clustering methods have been used to construct herpesvirus protein homologous families. This resulted in 1692 proteins being clustered into 243 multiprotein families and 196 singleton proteins. Predicted functions were assigned to each homologous family based on genome annotation and published data and each family classified into seven broad functional groups. Phylogenetic profiles were constructed for each herpesvirus from the homologous protein families and used to determine conserved functions and genomewide phylogenetic trees. These trees agreed with molecular-sequence-derived trees and allowed greater insight into the phylogeny of ungulate and murine gammaherpesviruses. PMID:11156614

  10. A highly conserved N-terminal sequence for teleost vitellogenin with potential value to the biochemistry, molecular biology and pathology of vitellogenesis

    USGS Publications Warehouse

    Folmar, L.D.; Denslow, N.D.; Wallace, R.A.; LaFleur, G.; Gross, T.S.; Bonomelli, S.; Sullivan, C.V.

    1995-01-01

    N-terminal amino acid sequences for vitellogenin (Vtg) from six species of teleost fish (striped bass, mummichog, pinfish, brown bullhead, medaka, yellow perch and the sturgeon) are compared with published N-terminal Vtg sequences for the lamprey, clawed frog and domestic chicken. Striped bass and mummichog had 100% identical amino acids between positions 7 and 21, while pinfish, brown bullhead, sturgeon, lamprey, Xenopus and chicken had 87%, 93%, 60%, 47%, 47-60%) for four transcripts and had 40% identical, respectively, with striped bass for the same positions. Partial sequences obtained for medaka and yellow perch were 100% identical between positions 5 to 10. The potential utility of this conserved sequence for studies on the biochemistry, molecular biology and pathology of vitellogenesis is discussed.

  11. Analysis of Variability in HIV-1 Subtype A Strains in Russia Suggests a Combination of Deep Sequencing and Multitarget RNA Interference for Silencing of the Virus.

    PubMed

    Kretova, Olga V; Chechetkin, Vladimir R; Fedoseeva, Daria M; Kravatsky, Yuri V; Sosin, Dmitri V; Alembekov, Ildar R; Gorbacheva, Maria A; Gashnikova, Natalya M; Tchurikov, Nickolai A

    2017-02-01

    Any method for silencing the activity of the HIV-1 retrovirus should tackle the extremely high variability of HIV-1 sequences and mutational escape. We studied sequence variability in the vicinity of selected RNA interference (RNAi) targets from isolates of HIV-1 subtype A in Russia, and we propose that using artificial RNAi is a potential alternative to traditional antiretroviral therapy. We prove that using multiple RNAi targets overcomes the variability in HIV-1 isolates. The optimal number of targets critically depends on the conservation of the target sequences. The total number of targets that are conserved with a probability of 0.7-0.8 should exceed at least 2. Combining deep sequencing and multitarget RNAi may provide an efficient approach to cure HIV/AIDS.

  12. Sequence analysis of dolphin ferritin H and L subunits and possible iron-dependent translational control of dolphin ferritin gene

    PubMed Central

    Takaesu, Azusa; Watanabe, Kiyotaka; Takai, Shinji; Sasaki, Yukako; Orino, Koichi

    2008-01-01

    Background Iron-storage protein, ferritin plays a central role in iron metabolism. Ferritin has dual function to store iron and segregate iron for protection of iron-catalyzed reactive oxygen species. Tissue ferritin is composed of two kinds of subunits (H: heavy chain or heart-type subunit; L: light chain or liver-type subunit). Ferritin gene expression is controlled at translational level in iron-dependent manner or at transcriptional level in iron-independent manner. However, sequencing analysis of marine mammalian ferritin subunits has not yet been performed fully. The purpose of this study is to reveal cDNA-derived amino acid sequences of cetacean ferritin H and L subunits, and demonstrate the possibility of expression of these subunits, especially H subunit, by iron. Methods Sequence analyses of cetacean ferritin H and L subunits were performed by direct sequencing of polymerase chain reaction (PCR) fragments from cDNAs generated via reverse transcription-PCR of leukocyte total RNA prepared from blood samples of six different dolphin species (Pseudorca crassidens, Lagenorhynchus obliquidens, Grampus griseus, Globicephala macrorhynchus, Tursiops truncatus, and Delphinapterus leucas). The putative iron-responsive element sequence in the 5'-untranslated region of the six different dolphin species was revealed by direct sequencing of PCR fragments obtained using leukocyte genomic DNA. Results Dolphin H and L subunits consist of 182 and 174 amino acids, respectively, and amino acid sequence identities of ferritin subunits among these dolphins are highly conserved (H: 99–100%, (99→98) ; L: 98–100%). The conserved 28 bp IRE sequence was located -144 bp upstream from the initiation codon in the six different dolphin species. Conclusion These results indicate that six different dolphin species have conserved ferritin sequences, and suggest that these genes are iron-dependently expressed. PMID:18954429

  13. Biosystematics and Conservation: A Case Study with Two Enigmatic and Uncommon Species of Crassula from New Zealand

    PubMed Central

    De Lange, P. J.; Heenan, P. B.; Keeling, D. J.; Murray, B. G.; Smissen, R.; Sykes, W. R.

    2008-01-01

    Background and Aims Crassula hunua and C. ruamahanga have been taxonomically controversial. Here their distinctiveness is assessed so that their taxonomic and conservation status can be clarified. Methods Populations of these two species were analysed using morphological, chromosomal and DNA sequence data. Key Results It proved impossible to differentiate between these two species using 12 key morphological characters. Populations were found to be chromosomally variable with 11 different chromosome numbers ranging from 2n = 42 to 2n = 100. Meiotic behaviour and levels of pollen stainability were both variable. Phylogenetic analyses showed that differences exist in both nuclear and plastid DNA sequences between individual plants, sometimes from the same population. Conclusions The results suggest that these plants are a species complex that has evolved through interspecific hybridization and polyploidy. Their high levels of chromosomal and DNA sequence variation present a problem for their conservation. PMID:18055560

  14. Deep sequencing of foot-and-mouth disease virus reveals RNA sequences involved in genome packaging.

    PubMed

    Logan, Grace; Newman, Joseph; Wright, Caroline F; Lasecka-Dykes, Lidia; Haydon, Daniel T; Cottam, Eleanor M; Tuthill, Tobias J

    2017-10-18

    Non-enveloped viruses protect their genomes by packaging them into an outer shell or capsid of virus-encoded proteins. Packaging and capsid assembly in RNA viruses can involve interactions between capsid proteins and secondary structures in the viral genome as exemplified by the RNA bacteriophage MS2 and as proposed for other RNA viruses of plants, animals and human. In the picornavirus family of non-enveloped RNA viruses, the requirements for genome packaging remain poorly understood. Here we show a novel and simple approach to identify predicted RNA secondary structures involved in genome packaging in the picornavirus foot-and-mouth disease virus (FMDV). By interrogating deep sequencing data generated from both packaged and unpackaged populations of RNA we have determined multiple regions of the genome with constrained variation in the packaged population. Predicted secondary structures of these regions revealed stem loops with conservation of structure and a common motif at the loop. Disruption of these features resulted in attenuation of virus growth in cell culture due to a reduction in assembly of mature virions. This study provides evidence for the involvement of predicted RNA structures in picornavirus packaging and offers a readily transferable methodology for identifying packaging requirements in many other viruses. Importance In order to transmit their genetic material to a new host, non-enveloped viruses must protect their genomes by packaging them into an outer shell or capsid of virus-encoded proteins. For many non-enveloped RNA viruses the requirements for this critical part of the viral life cycle remain poorly understood. We have identified RNA sequences involved in genome packaging of the picornavirus foot-and-mouth disease virus. This virus causes an economically devastating disease of livestock affecting both the developed and developing world. The experimental methods developed to carry out this work are novel, simple and transferable to the study of packaging signals in other RNA viruses. Improved understanding of RNA packaging may lead to novel vaccine approaches or targets for antiviral drugs with broad spectrum activity. Copyright © 2017 Logan et al.

  15. Microsatellites for Lindera species

    Treesearch

    Craig S. Echt; D. Deemer; T.L. Kubisiak; C.D. Nelson

    2006-01-01

    Microsatellite markers were developed for conservation genetic studies of Lindera melissifolia (pondberry), a federally endangered shrub of southern bottomland ecosystems. Microsatellite sequences were obtained from DNA libraries that were enriched for the (AC)n simple sequence repeat motif. From 35 clone sequences, 20 primer...

  16. Conservation and diversification of the miR166 family in soybean and potential roles of newly identified miR166s.

    PubMed

    Li, Xuyan; Xie, Xin; Li, Ji; Cui, Yuhai; Hou, Yanming; Zhai, Lulu; Wang, Xiao; Fu, Yanli; Liu, Ranran; Bian, Shaomin

    2017-02-01

    microRNA166 (miR166) is a highly conserved family of miRNAs implicated in a wide range of cellular and physiological processes in plants. miR166 family generally comprises multiple miR166 members in plants, which might exhibit functional redundancy and specificity. The soybean miR166 family consists of 21 members according to the miRBase database. However, the evolutionary conservation and functional diversification of miR166 family members in soybean remain poorly understood. We identified five novel miR166s in soybean by data mining approach, thus enlarging the size of miR166 family from 21 to 26 members. Phylogenetic analyses of the 26 miR166s and their precursors indicated that soybean miR166 family exhibited both evolutionary conservation and diversification, and ten pairs of miR166 precursors with high sequence identity were individually grouped into a discrete clade in the phylogenetic tree. The analysis of genomic organization and evolution of MIR166 gene family revealed that eight segmental duplications and four tandem duplications might occur during evolution of the miR166 family in soybean. The cis-elements in promoters of MIR166 family genes and their putative targets pointed to their possible contributions to the functional conservation and diversification. The targets of soybean miR166s were predicted, and the cleavage of ATHB14-LIKE transcript was experimentally validated by RACE PCR. Further, the expression patterns of the five newly identified MIR166s and 12 target genes were examined during seed development and in response to abiotic stresses, which provided important clues for dissecting their functions and isoform specificity. This study enlarged the size of soybean miR166 family from 21 to 26 members, and the 26 soybean miR166s exhibited evolutionary conservation and diversification. These findings have laid a foundation for elucidating functional conservation and diversification of miR166 family members, especially during seed development or under abiotic stresses.

  17. Location of a major antigenic site involved in Ross River virus neutralization.

    PubMed

    Vrati, S; Fernon, C A; Dalgarno, L; Weir, R C

    1988-02-01

    The location of a major antigenic domain involved in the neutralization of an alphavirus, Ross River virus, has been defined in terms of its position in the amino acid sequence of the E2 glycoprotein. The domain encompasses three topographically close epitopes which were identified using three E2-specific neutralizing monoclonal antibodies in competitive binding assays. Nucleotide sequencing of the structural protein genes of monoclonal antibody-selected antigenic variants showed that for each variant there was a single nucleotide change in the E2 gene leading to a nonconservative amino acid substitution in E2. Changes were at positions 216, 234, and 246-251 in the amino acid sequence. The epitopes are in a region of E2 which, though not strongly conserved as to sequence among Ross River virus, Semliki Forest virus, and Sindbis virus, is conserved in its hydropathy profile among the three alphaviruses. The epitopes lie between two asparagine-linked glycosylation sites (residues 200 and 262) in E2. They are conserved as to position between the mouse virulent T48 strain and the mouse avirulent NB5092 strain.

  18. Conformational Preference of ‘CαNN’ Short Peptide Motif towards Recognition of Anions

    PubMed Central

    Banerjee, Raja

    2013-01-01

    Among several ‘anion binding motifs’, the recently described ‘CαNN’ motif occurring in the loop regions preceding a helix, is conserved through evolution both in sequence and its conformation. To establish the significance of the conserved sequence and their intrinsic affinity for anions, a series of peptides containing the naturally occurring ‘CαNN’ motif at the N-terminus of a designed helix, have been modeled and studied in a context free system using computational techniques. Appearance of a single interacting site with negative binding free-energy for both the sulfate and phosphate ions, as evidenced in docking experiments, establishes that the ‘CαNN’ segment has an intrinsic affinity for anions. Molecular Dynamics (MD) simulation studies reveal that interaction with anion triggers a conformational switch from non-helical to helical state at the ‘CαNN’ segment, which extends the length of the anchoring-helix by one turn at the N-terminus. Computational experiments substantiate the significance of sequence/structural context and justify the conserved nature of the ‘CαNN’ sequence for anion recognition through “local” interaction. PMID:23516403

  19. Cloning and characterization of a heme oxygenase-2 gene from alfalfa (Medicago sativa L.).

    PubMed

    Fu, Guang-Qing; Jin, Qi-Jiang; Lin, Yu-Ting; Feng, Jian-Fei; Nie, Li; Shen, Wen-Biao; Zheng, Tian-Qing

    2011-11-01

    Heme oxygenase (HO, EC 1.14.99.3) catalyzes the oxidation of heme and performs vital roles in plant development and stress responses. Two HO isozymes exist in plants. Between these, HO-1 is an oxidative stress-response protein, and HO-2 usually exhibited constitutive expression. Although alfalfa HO-1 gene (MsHO1) has been investigated previously, HO2 is still poorly understood. In this study, we report the cloning and characterization of HO2 gene, MsHO2, from alfalfa (Medica sativa L.). The full-length cDNA of MsHO2 contains an ORF of 870 bp and encodes for 290 amino acid residues with a predicted molecular mass of 33.3 kDa. Similar to MsHO1, MsHO2 also appears to have an N-terminal transit peptide sequence for chloroplast import. Many conserved residues in plant HO were also conserved in MsHO2. However, unlike HO-1, the conserved histidine (His) required for heme-iron binding and HO activity was replaced by tyrosine (Tyr) in MsHO2. Further biochemical activity analysis of purified mature MsHO2 showed no HO activity, suggesting that MsHO2 may not be a true HO in nature. Semi-quantitative RT-PCR confirmed its maximum expression in the germinating seeds. Importantly, the expression levels of MsHO2 were up-regulated under sodium nitroprusside (SNP) and H(2)O(2) (especially) treatment, respectively.

  20. Structure-function analysis of mouse Sry reveals dual essential roles of the C-terminal polyglutamine tract in sex determination.

    PubMed

    Zhao, Liang; Ng, Ee Ting; Davidson, Tara-Lynne; Longmuss, Enya; Urschitz, Johann; Elston, Marlee; Moisyadi, Stefan; Bowles, Josephine; Koopman, Peter

    2014-08-12

    The mammalian sex-determining factor SRY comprises a conserved high-mobility group (HMG) box DNA-binding domain and poorly conserved regions outside the HMG box. Mouse Sry is unusual in that it includes a C-terminal polyglutamine (polyQ) tract that is absent in nonrodent SRY proteins, and yet, paradoxically, is essential for male sex determination. To dissect the molecular functions of this domain, we generated a series of Sry mutants, and studied their biochemical properties in cell lines and transgenic mouse embryos. Sry protein lacking the polyQ domain was unstable, due to proteasomal degradation. Replacing this domain with irrelevant sequences stabilized the protein but failed to restore Sry's ability to up-regulate its key target gene SRY-box 9 (Sox9) and its sex-determining function in vivo. These functions were restored only when a VP16 transactivation domain was substituted. We conclude that the polyQ domain has important roles in protein stabilization and transcriptional activation, both of which are essential for male sex determination in mice. Our data disprove the hypothesis that the conserved HMG box domain is the only functional domain of Sry, and highlight an evolutionary paradox whereby mouse Sry has evolved a novel bifunctional module to activate Sox9 directly, whereas SRY proteins in other taxa, including humans, seem to lack this ability, presumably making them dependent on partner proteins(s) to provide this function.

  1. Anxa4 Genes are Expressed in Distinct Organ Systems in Xenopus laevis and tropicalis But are Functionally Conserved

    PubMed Central

    Massé, Karine L; Collins, Robert J; Bhamra, Surinder; Seville, Rachel A

    2007-01-01

    Anxa4 belongs to the multigenic annexin family of proteins which are characterized by their ability to interact with membranes in a calcium-dependent manner. Defined as a marker for polarized epithelial cells, Anxa4 is believed to be involved in many cellular processes but its functions in vivo are still poorly understood. Previously, we cloned Xanx4 in Xenopus laevis (now referred to as anxa4a) and demonstrated its role during organogenesis of the pronephros, providing the first evidence of a specific function for this protein during the development of a vertebrate. Here, we describe the strict conservation of protein sequence and functional domains of anxa4 during vertebrate evolution. We also identify the paralog of anxa4a, anxa4b and show its specific temporal and spatial expression pattern is different from anxa4a. We show that anxa4 orthologs in X. laevis and tropicalis display expression domains in different organ systems. Whilst the anxa4a gene is mainly expressed in the kidney, Xt anxa4 is expressed in the liver. Finally, we demonstrate Xt anxa4 and anxa4a can display conserved function during kidney organogenesis, despite the fact that Xt anxa4 transcripts are not expressed in this domain. This study highlights the divergence of expression of homologous genes during Xenopus evolution and raises the potential problems of using X. tropicalis promoters in X. laevis. PMID:19279706

  2. Transposon identification using profile HMMs

    PubMed Central

    2010-01-01

    Background Transposons are "jumping genes" that account for large quantities of repetitive content in genomes. They are known to affect transcriptional regulation in several different ways, and are implicated in many human diseases. Transposons are related to microRNAs and viruses, and many genes, pseudogenes, and gene promoters are derived from transposons or have origins in transposon-induced duplication. Modeling transposon-derived genomic content is difficult because they are poorly conserved. Profile hidden Markov models (profile HMMs), widely used for protein sequence family modeling, are rarely used for modeling DNA sequence families. The algorithm commonly used to estimate the parameters of profile HMMs, Baum-Welch, is prone to prematurely converge to local optima. The DNA domain is especially problematic for the Baum-Welch algorithm, since it has only four letters as opposed to the twenty residues of the amino acid alphabet. Results We demonstrate with a simulation study and with an application to modeling the MIR family of transposons that two recently introduced methods, Conditional Baum-Welch and Dynamic Model Surgery, achieve better estimates of the parameters of profile HMMs across a range of conditions. Conclusions We argue that these new algorithms expand the range of potential applications of profile HMMs to many important DNA sequence family modeling problems, including that of searching for and modeling the virus-like transposons that are found in all known genomes. PMID:20158867

  3. Exome sequencing reveals a thrombopoietin ligand mutation in a Micronesian family with autosomal recessive aplastic anemia.

    PubMed

    Dasouki, Majed J; Rafi, Syed K; Olm-Shipman, Adam J; Wilson, Nathan R; Abhyankar, Sunil; Ganter, Brigitte; Furness, L Mike; Fang, Jianwen; Calado, Rodrigo T; Saadi, Irfan

    2013-11-14

    We recently identified 2 siblings afflicted with idiopathic, autosomal recessive aplastic anemia. Whole-exome sequencing identified a novel homozygous missense mutation in thrombopoietin (THPO, c.112C>T) in both affected siblings. This mutation encodes an arginine to cysteine substitution at residue 38 or residue 17 excluding the 21-amino acid signal peptide of THPO receptor binding domain (RBD). THPO has 4 conserved cysteines in its RBD that form 2 disulfide bonds. Our in silico modeling predicts that introduction of a fifth cysteine may disrupt normal disulfide bonding to cause poor receptor binding. In functional assays, the mutant-THPO-containing media shows two- to threefold reduced ability to sustain UT7-TPO cells, which require THPO for proliferation. Both parents and a sibling with heterozygous R17C change have reduced platelet counts, whereas a sibling with wild-type sequence has normal platelet count. Thus, the R17C partial loss-of-function allele results in aplastic anemia in the homozygous state and mild thrombocytopenia in the heterozygous state in our family. Together with the recent identification of THPO receptor (MPL) mutations and the effects of THPO agonists in aplastic anemia, our results have clinical implications in the diagnosis and treatment of patients with aplastic anemia and highlight a role for the THPO-MPL pathway in hematopoiesis in vivo.

  4. Genomic Tools for Evolution and Conservation in the Chimpanzee: Pan troglodytes ellioti Is a Genetically Distinct Population

    PubMed Central

    Myers, Simon; Hellenthal, Garrett; Nerrienet, Eric; Bontrop, Ronald E.; Freeman, Colin; Donnelly, Peter; Mundy, Nicholas I.

    2012-01-01

    In spite of its evolutionary significance and conservation importance, the population structure of the common chimpanzee, Pan troglodytes, is still poorly understood. An issue of particular controversy is whether the proposed fourth subspecies of chimpanzee, Pan troglodytes ellioti, from parts of Nigeria and Cameroon, is genetically distinct. Although modern high-throughput SNP genotyping has had a major impact on our understanding of human population structure and demographic history, its application to ecological, demographic, or conservation questions in non-human species has been extremely limited. Here we apply these tools to chimpanzee population structure, using ∼700 autosomal SNPs derived from chimpanzee genomic data and a further ∼100 SNPs from targeted re-sequencing. We demonstrate conclusively the existence of P. t. ellioti as a genetically distinct subgroup. We show that there is clear differentiation between the verus, troglodytes, and ellioti populations at the SNP and haplotype level, on a scale that is greater than that separating continental human populations. Further, we show that only a small set of SNPs (10–20) is needed to successfully assign individuals to these populations. Tellingly, use of only mitochondrial DNA variation to classify individuals is erroneous in 4 of 54 cases, reinforcing the dangers of basing demographic inference on a single locus and implying that the demographic history of the species is more complicated than that suggested analyses based solely on mtDNA. In this study we demonstrate the feasibility of developing economical and robust tests of individual chimpanzee origin as well as in-depth studies of population structure. These findings have important implications for conservation strategies and our understanding of the evolution of chimpanzees. They also act as a proof-of-principle for the use of cheap high-throughput genomic methods for ecological questions. PMID:22396655

  5. Genomic tools for evolution and conservation in the chimpanzee: Pan troglodytes ellioti is a genetically distinct population.

    PubMed

    Bowden, Rory; MacFie, Tammie S; Myers, Simon; Hellenthal, Garrett; Nerrienet, Eric; Bontrop, Ronald E; Freeman, Colin; Donnelly, Peter; Mundy, Nicholas I

    2012-01-01

    In spite of its evolutionary significance and conservation importance, the population structure of the common chimpanzee, Pan troglodytes, is still poorly understood. An issue of particular controversy is whether the proposed fourth subspecies of chimpanzee, Pan troglodytes ellioti, from parts of Nigeria and Cameroon, is genetically distinct. Although modern high-throughput SNP genotyping has had a major impact on our understanding of human population structure and demographic history, its application to ecological, demographic, or conservation questions in non-human species has been extremely limited. Here we apply these tools to chimpanzee population structure, using ∼700 autosomal SNPs derived from chimpanzee genomic data and a further ∼100 SNPs from targeted re-sequencing. We demonstrate conclusively the existence of P. t. ellioti as a genetically distinct subgroup. We show that there is clear differentiation between the verus, troglodytes, and ellioti populations at the SNP and haplotype level, on a scale that is greater than that separating continental human populations. Further, we show that only a small set of SNPs (10-20) is needed to successfully assign individuals to these populations. Tellingly, use of only mitochondrial DNA variation to classify individuals is erroneous in 4 of 54 cases, reinforcing the dangers of basing demographic inference on a single locus and implying that the demographic history of the species is more complicated than that suggested analyses based solely on mtDNA. In this study we demonstrate the feasibility of developing economical and robust tests of individual chimpanzee origin as well as in-depth studies of population structure. These findings have important implications for conservation strategies and our understanding of the evolution of chimpanzees. They also act as a proof-of-principle for the use of cheap high-throughput genomic methods for ecological questions.

  6. Identical phosphatase mechanisms achieved through distinct modes of binding phosphoprotein substrate

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pazy, Y.; Motaleb, M.A.; Guarnieri, M.T.

    2010-04-05

    Two-component signal transduction systems are widespread in prokaryotes and control numerous cellular processes. Extensive investigation of sensor kinase and response regulator proteins from many two-component systems has established conserved sequence, structural, and mechanistic features within each family. In contrast, the phosphatases which catalyze hydrolysis of the response regulator phosphoryl group to terminate signal transduction are poorly understood. Here we present structural and functional characterization of a representative of the CheC/CheX/FliY phosphatase family. The X-ray crystal structure of Borrelia burgdorferi CheX complexed with its CheY3 substrate and the phosphoryl analogue BeF{sub 3}{sup -} reveals a binding orientation between a response regulatormore » and an auxiliary protein different from that shared by every previously characterized example. The surface of CheY3 containing the phosphoryl group interacts directly with a long helix of CheX which bears the conserved (E - X{sub 2} - N) motif. Conserved CheX residues Glu96 and Asn99, separated by a single helical turn, insert into the CheY3 active site. Structural and functional data indicate that CheX Asn99 and CheY3 Thr81 orient a water molecule for hydrolytic attack. The catalytic residues of the CheX-CheY3 complex are virtually superimposable on those of the Escherichia coli CheZ phosphatase complexed with CheY, even though the active site helices of CheX and CheZ are oriented nearly perpendicular to one other. Thus, evolution has found two structural solutions to achieve the same catalytic mechanism through different helical spacing and side chain lengths of the conserved acid/amide residues in CheX and CheZ.« less

  7. The most conserved genome segments for life detection on Earth and other planets.

    PubMed

    Isenbarger, Thomas A; Carr, Christopher E; Johnson, Sarah Stewart; Finney, Michael; Church, George M; Gilbert, Walter; Zuber, Maria T; Ruvkun, Gary

    2008-12-01

    On Earth, very simple but powerful methods to detect and classify broad taxa of life by the polymerase chain reaction (PCR) are now standard practice. Using DNA primers corresponding to the 16S ribosomal RNA gene, one can survey a sample from any environment for its microbial inhabitants. Due to massive meteoritic exchange between Earth and Mars (as well as other planets), a reasonable case can be made for life on Mars or other planets to be related to life on Earth. In this case, the supremely sensitive technologies used to study life on Earth, including in extreme environments, can be applied to the search for life on other planets. Though the 16S gene has become the standard for life detection on Earth, no genome comparisons have established that the ribosomal genes are, in fact, the most conserved DNA segments across the kingdoms of life. We present here a computational comparison of full genomes from 13 diverse organisms from the Archaea, Bacteria, and Eucarya to identify genetic sequences conserved across the widest divisions of life. Our results identify the 16S and 23S ribosomal RNA genes as well as other universally conserved nucleotide sequences in genes encoding particular classes of transfer RNAs and within the nucleotide binding domains of ABC transporters as the most conserved DNA sequence segments across phylogeny. This set of sequences defines a core set of DNA regions that have changed the least over billions of years of evolution and provides a means to identify and classify divergent life, including ancestrally related life on other planets.

  8. Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology

    Treesearch

    Richard Cronn; Aaron Liston; Matthew Parks; David S. Gernandt; Rongkun Shen; Todd Mockler

    2008-01-01

    Organellar DNA sequences are widely used in evolutionary and population genetic studies; however, the conservative nature of chloroplast gene and genome evolution often limits phylogenetic resolution and statistical power. To gain maximal access to the historical record contained within chloroplast genomes, we have adapted multiplex sequencing-by-synthesis (MSBS) to...

  9. In-silico and in-vivo analyses of EST databases unveil conserved miRNAs from Carthamus tinctorius and Cynara cardunculus

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) are small RNAs (21-24 bp) providing an RNA-based system of gene regulation highly conserved in plants and animals. In plants, miRNAs control mRNA degradation or restrain translation, affecting development and responses to stresses. Plant miRNAs show imperfect but extensive complementarity to mRNA targets, making their computational prediction possible, useful when data mining is applied on different species. In this study we used a comparative approach to identify both miRNAs and their targets, in artichoke and safflower. Results Two complete expressed sequence tags (ESTs) datasets from artichoke (3.6·104 entries) and safflower (4.2·104), were analysed with a bioinformatic pipeline and in vitro experiments, identifying 17 potential miRNAs. For each EST, using RNAhybrid program and 953 non redundant miRNA mature sequences, available in mirBase as reference, we searched matching putative targets. 8730 out of 42011 ESTs from safflower and 7145 of 36323 ESTs from artichoke showed at least one predicted miRNA target. BLAST analysis showed that 75% of all ESTs shared at least a common homologous region (E-value < 10-4) and about 50% of these displayed 400 bp or longer aligned sequences as conserved homologous/orthologous (COS) regions. 960 and 890 ESTs of safflower and artichoke organized in COS shared 79 different miRNA targets, considered functionally conserved, and statistically significant when compared with random sequences (signal to noise ratio > 2 and specificity ≥ 0.85). Four highly significant miRNAs selected from in silico data were experimentally validated in globe artichoke leaves. Conclusions Mature miRNAs and targets were predicted within EST sequences of safflower and artichoke. Most of the miRNA targets appeared highly/moderately conserved, highlighting an important and conserved function. In this study we introduce a stringent parameter for the comparative sequence analysis, represented by the identification of the same target in the COS region. After statistical analysis 79 targets, found on the COS regions and belonging to 60 miRNA families, have a signal to noise ratio > 2, with ≥ 0.85 specificity. The putative miRNAs identified belong to 55 dicotyledon plants and to 24 families only in monocotyledon. PMID:22536958

  10. Evolutionary conservation analysis increases the colocalization of predicted exonic splicing enhancers in the BRCA1 gene with missense sequence changes and in-frame deletions, but not polymorphisms

    PubMed Central

    Pettigrew, Christopher; Wayte, Nicola; Lovelock, Paul K; Tavtigian, Sean V; Chenevix-Trench, Georgia; Spurdle, Amanda B; Brown, Melissa A

    2005-01-01

    Introduction Aberrant pre-mRNA splicing can be more detrimental to the function of a gene than changes in the length or nature of the encoded amino acid sequence. Although predicting the effects of changes in consensus 5' and 3' splice sites near intron:exon boundaries is relatively straightforward, predicting the possible effects of changes in exonic splicing enhancers (ESEs) remains a challenge. Methods As an initial step toward determining which ESEs predicted by the web-based tool ESEfinder in the breast cancer susceptibility gene BRCA1 are likely to be functional, we have determined their evolutionary conservation and compared their location with known BRCA1 sequence variants. Results Using the default settings of ESEfinder, we initially detected 669 potential ESEs in the coding region of the BRCA1 gene. Increasing the threshold score reduced the total number to 464, while taking into consideration the proximity to splice donor and acceptor sites reduced the number to 211. Approximately 11% of these ESEs (23/211) either are identical at the nucleotide level in human, primates, mouse, cow, dog and opossum Brca1 (conserved) or are detectable by ESEfinder in the same position in the Brca1 sequence (shared). The frequency of conserved and shared predicted ESEs between human and mouse is higher in BRCA1 exons (2.8 per 100 nucleotides) than in introns (0.6 per 100 nucleotides). Of conserved or shared putative ESEs, 61% (14/23) were predicted to be affected by sequence variants reported in the Breast Cancer Information Core database. Applying the filters described above increased the colocalization of predicted ESEs with missense changes, in-frame deletions and unclassified variants predicted to be deleterious to protein function, whereas they decreased the colocalization with known polymorphisms or unclassified variants predicted to be neutral. Conclusion In this report we show that evolutionary conservation analysis may be used to improve the specificity of an ESE prediction tool. This is the first report on the prediction of the frequency and distribution of ESEs in the BRCA1 gene, and it is the first reported attempt to predict which ESEs are most likely to be functional and therefore which sequence variants in ESEs are most likely to be pathogenic. PMID:16280041

  11. Analysis of drug binding pockets and repurposing opportunities for twelve essential enzymes of ESKAPE pathogens

    PubMed Central

    Naz, Sadia; Ngo, Tony; Farooq, Umar

    2017-01-01

    Background The rapid increase in antibiotic resistance by various bacterial pathogens underlies the significance of developing new therapies and exploring different drug targets. A fraction of bacterial pathogens abbreviated as ESKAPE by the European Center for Disease Prevention and Control have been considered a major threat due to the rise in nosocomial infections. Here, we compared putative drug binding pockets of twelve essential and mostly conserved metabolic enzymes in numerous bacterial pathogens including those of the ESKAPE group and Mycobacterium tuberculosis. The comparative analysis will provide guidelines for the likelihood of transferability of the inhibitors from one species to another. Methods Nine bacterial species including six ESKAPE pathogens, Mycobacterium tuberculosis along with Mycobacterium smegmatis and Eschershia coli, two non-pathogenic bacteria, have been selected for drug binding pocket analysis of twelve essential enzymes. The amino acid sequences were obtained from Uniprot, aligned using ICM v3.8-4a and matched against the Pocketome encyclopedia. We used known co-crystal structures of selected target enzyme orthologs to evaluate the location of their active sites and binding pockets and to calculate a matrix of pairwise sequence identities across each target enzyme across the different species. This was used to generate sequence maps. Results High sequence identity of enzyme binding pockets, derived from experimentally determined co-crystallized structures, was observed among various species. Comparison at both full sequence level and for drug binding pockets of key metabolic enzymes showed that binding pockets are highly conserved (sequence similarity up to 100%) among various ESKAPE pathogens as well as Mycobacterium tuberculosis. Enzymes orthologs having conserved binding sites may have potential to interact with inhibitors in similar way and might be helpful for design of similar class of inhibitors for a particular species. The derived pocket alignments and distance-based maps provide guidelines for drug discovery and repurposing. In addition they also provide recommendations for the relevant model bacteria that may be used for initial drug testing. Discussion Comparing ligand binding sites through sequence identity calculation could be an effective approach to identify conserved orthologs as drug binding pockets have shown higher level of conservation among various species. By using this approach we could avoid the problems associated with full sequence comparison. We identified essential metabolic enzymes among ESKAPE pathogens that share high sequence identity in their putative drug binding pockets (up to 100%), of which known inhibitors can potentially antagonize these identical pockets in the various species in a similar manner. PMID:28948099

  12. Analysis of drug binding pockets and repurposing opportunities for twelve essential enzymes of ESKAPE pathogens.

    PubMed

    Naz, Sadia; Ngo, Tony; Farooq, Umar; Abagyan, Ruben

    2017-01-01

    The rapid increase in antibiotic resistance by various bacterial pathogens underlies the significance of developing new therapies and exploring different drug targets. A fraction of bacterial pathogens abbreviated as ESKAPE by the European Center for Disease Prevention and Control have been considered a major threat due to the rise in nosocomial infections. Here, we compared putative drug binding pockets of twelve essential and mostly conserved metabolic enzymes in numerous bacterial pathogens including those of the ESKAPE group and Mycobacterium tuberculosis . The comparative analysis will provide guidelines for the likelihood of transferability of the inhibitors from one species to another. Nine bacterial species including six ESKAPE pathogens, Mycobacterium tuberculosis along with Mycobacterium smegmatis and Eschershia coli , two non-pathogenic bacteria, have been selected for drug binding pocket analysis of twelve essential enzymes. The amino acid sequences were obtained from Uniprot, aligned using ICM v3.8-4a and matched against the Pocketome encyclopedia. We used known co-crystal structures of selected target enzyme orthologs to evaluate the location of their active sites and binding pockets and to calculate a matrix of pairwise sequence identities across each target enzyme across the different species. This was used to generate sequence maps. High sequence identity of enzyme binding pockets, derived from experimentally determined co-crystallized structures, was observed among various species. Comparison at both full sequence level and for drug binding pockets of key metabolic enzymes showed that binding pockets are highly conserved (sequence similarity up to 100%) among various ESKAPE pathogens as well as Mycobacterium tuberculosis . Enzymes orthologs having conserved binding sites may have potential to interact with inhibitors in similar way and might be helpful for design of similar class of inhibitors for a particular species. The derived pocket alignments and distance-based maps provide guidelines for drug discovery and repurposing. In addition they also provide recommendations for the relevant model bacteria that may be used for initial drug testing. Comparing ligand binding sites through sequence identity calculation could be an effective approach to identify conserved orthologs as drug binding pockets have shown higher level of conservation among various species. By using this approach we could avoid the problems associated with full sequence comparison. We identified essential metabolic enzymes among ESKAPE pathogens that share high sequence identity in their putative drug binding pockets (up to 100%), of which known inhibitors can potentially antagonize these identical pockets in the various species in a similar manner.

  13. Global Network Alignment in the Context of Aging.

    PubMed

    Faisal, Fazle Elahi; Zhao, Han; Milenkovic, Tijana

    2015-01-01

    Analogous to sequence alignment, network alignment (NA) can be used to transfer biological knowledge across species between conserved network regions. NA faces two algorithmic challenges: 1) Which cost function to use to capture "similarities" between nodes in different networks? 2) Which alignment strategy to use to rapidly identify "high-scoring" alignments from all possible alignments? We "break down" existing state-of-the-art methods that use both different cost functions and different alignment strategies to evaluate each combination of their cost functions and alignment strategies. We find that a combination of the cost function of one method and the alignment strategy of another method beats the existing methods. Hence, we propose this combination as a novel superior NA method. Then, since human aging is hard to study experimentally due to long lifespan, we use NA to transfer aging-related knowledge from well annotated model species to poorly annotated human. By doing so, we produce novel human aging-related knowledge, which complements currently available knowledge about aging that has been obtained mainly by sequence alignment. We demonstrate significant similarity between topological and functional properties of our novel predictions and those of known aging-related genes. We are the first to use NA to learn more about aging.

  14. Transimulation - protein biosynthesis web service.

    PubMed

    Siwiak, Marlena; Zielenkiewicz, Piotr

    2013-01-01

    Although translation is the key step during gene expression, it remains poorly characterized at the level of individual genes. For this reason, we developed Transimulation - a web service measuring translational activity of genes in three model organisms: Escherichia coli, Saccharomyces cerevisiae and Homo sapiens. The calculations are based on our previous computational model of translation and experimental data sets. Transimulation quantifies mean translation initiation and elongation time (expressed in SI units), and the number of proteins produced per transcript. It also approximates the number of ribosomes that typically occupy a transcript during translation, and simulates their propagation. The simulation of ribosomes' movement is interactive and allows modifying the coding sequence on the fly. It also enables uploading any coding sequence and simulating its translation in one of three model organisms. In such a case, ribosomes propagate according to mean codon elongation times of the host organism, which may prove useful for heterologous expression. Transimulation was used to examine evolutionary conservation of translational parameters of orthologous genes. Transimulation may be accessed at http://nexus.ibb.waw.pl/Transimulation (requires Java version 1.7 or higher). Its manual and source code, distributed under the GPL-2.0 license, is freely available at the website.

  15. Restriction-modification mediated barriers to exogenous DNA uptake and incorporation employed by Prevotella intermedia.

    PubMed

    Johnston, Christopher D; Skeete, Chelsey A; Fomenkov, Alexey; Roberts, Richard J; Rittling, Susan R

    2017-01-01

    Prevotella intermedia, a major periodontal pathogen, is increasingly implicated in human respiratory tract and cystic fibrosis lung infections. Nevertheless, the specific mechanisms employed by this pathogen remain only partially characterized and poorly understood, largely due to its total lack of genetic accessibility. Here, using Single Molecule, Real-Time (SMRT) genome and methylome sequencing, bisulfite sequencing, in addition to cloning and restriction analysis, we define the specific genetic barriers to exogenous DNA present in two of the most widespread laboratory strains, P. intermedia ATCC 25611 and P. intermedia Strain 17. We identified and characterized multiple restriction-modification (R-M) systems, some of which are considerably divergent between the two strains. We propose that these R-M systems are the root cause of the P. intermedia transformation barrier. Additionally, we note the presence of conserved Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) systems in both strains, which could provide a further barrier to exogenous DNA uptake and incorporation. This work will provide a valuable resource during the development of a genetic system for P. intermedia, which will be required for fundamental investigation of this organism's physiology, metabolism, and pathogenesis in human disease.

  16. Restriction-modification mediated barriers to exogenous DNA uptake and incorporation employed by Prevotella intermedia

    PubMed Central

    Skeete, Chelsey A.; Fomenkov, Alexey; Roberts, Richard J.; Rittling, Susan R.

    2017-01-01

    Prevotella intermedia, a major periodontal pathogen, is increasingly implicated in human respiratory tract and cystic fibrosis lung infections. Nevertheless, the specific mechanisms employed by this pathogen remain only partially characterized and poorly understood, largely due to its total lack of genetic accessibility. Here, using Single Molecule, Real-Time (SMRT) genome and methylome sequencing, bisulfite sequencing, in addition to cloning and restriction analysis, we define the specific genetic barriers to exogenous DNA present in two of the most widespread laboratory strains, P. intermedia ATCC 25611 and P. intermedia Strain 17. We identified and characterized multiple restriction-modification (R-M) systems, some of which are considerably divergent between the two strains. We propose that these R-M systems are the root cause of the P. intermedia transformation barrier. Additionally, we note the presence of conserved Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR) systems in both strains, which could provide a further barrier to exogenous DNA uptake and incorporation. This work will provide a valuable resource during the development of a genetic system for P. intermedia, which will be required for fundamental investigation of this organism’s physiology, metabolism, and pathogenesis in human disease. PMID:28934361

  17. Identification of Cis-Acting Promoter Elements in Cold- and Dehydration-Induced Transcriptional Pathways in Arabidopsis, Rice, and Soybean

    PubMed Central

    Maruyama, Kyonoshin; Todaka, Daisuke; Mizoi, Junya; Yoshida, Takuya; Kidokoro, Satoshi; Matsukura, Satoko; Takasaki, Hironori; Sakurai, Tetsuya; Yamamoto, Yoshiharu Y.; Yoshiwara, Kyouko; Kojima, Mikiko; Sakakibara, Hitoshi; Shinozaki, Kazuo; Yamaguchi-Shinozaki, Kazuko

    2012-01-01

    The genomes of three plants, Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and soybean (Glycine max), have been sequenced, and their many genes and promoters have been predicted. In Arabidopsis, cis-acting promoter elements involved in cold- and dehydration-responsive gene expression have been extensively analysed; however, the characteristics of such cis-acting promoter sequences in cold- and dehydration-inducible genes of rice and soybean remain to be clarified. In this study, we performed microarray analyses using the three species, and compared characteristics of identified cold- and dehydration-inducible genes. Transcription profiles of the cold- and dehydration-responsive genes were similar among these three species, showing representative upregulated (dehydrin/LEA) and downregulated (photosynthesis-related) genes. All (46 = 4096) hexamer sequences in the promoters of the three species were investigated, revealing the frequency of conserved sequences in cold- and dehydration-inducible promoters. A core sequence of the abscisic acid-responsive element (ABRE) was the most conserved in dehydration-inducible promoters of all three species, suggesting that transcriptional regulation for dehydration-inducible genes is similar among these three species, with the ABRE-dependent transcriptional pathway. In contrast, for cold-inducible promoters, the conserved hexamer sequences were diversified among these three species, suggesting the existence of diverse transcriptional regulatory pathways for cold-inducible genes among the species. PMID:22184637

  18. Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models

    PubMed Central

    2014-01-01

    Background Logos are commonly used in molecular biology to provide a compact graphical representation of the conservation pattern of a set of sequences. They render the information contained in sequence alignments or profile hidden Markov models by drawing a stack of letters for each position, where the height of the stack corresponds to the conservation at that position, and the height of each letter within a stack depends on the frequency of that letter at that position. Results We present a new tool and web server, called Skylign, which provides a unified framework for creating logos for both sequence alignments and profile hidden Markov models. In addition to static image files, Skylign creates a novel interactive logo plot for inclusion in web pages. These interactive logos enable scrolling, zooming, and inspection of underlying values. Skylign can avoid sampling bias in sequence alignments by down-weighting redundant sequences and by combining observed counts with informed priors. It also simplifies the representation of gap parameters, and can optionally scale letter heights based on alternate calculations of the conservation of a position. Conclusion Skylign is available as a website, a scriptable web service with a RESTful interface, and as a software package for download. Skylign’s interactive logos are easily incorporated into a web page with just a few lines of HTML markup. Skylign may be found at http://skylign.org. PMID:24410852

  19. The lytic origin of herpesvirus papio is highly homologous to Epstein-Barr virus ori-Lyt: evolutionary conservation of transcriptional activation and replication signals.

    PubMed Central

    Ryon, J J; Fixman, E D; Houchens, C; Zong, J; Lieberman, P M; Chang, Y N; Hayward, G S; Hayward, S D

    1993-01-01

    Herpesvirus papio (HVP) is a B-lymphotropic baboon virus with an estimated 40% homology to Epstein-Barr virus (EBV). We have cloned and sequenced ori-Lyt of herpesvirus papio and found a striking degree of nucleotide homology (89%) with ori-Lyt of EBV. Transcriptional elements form an integral part of EBV ori-Lyt. The promoter and enhancer domains of EBV ori-Lyt are conserved in herpesvirus papio. The EBV ori-Lyt promoter contains four binding sites for the EBV lytic cycle transactivator Zta, and the enhancer includes one Zta and two Rta response elements. All five of the Zta response elements and one of the Rta motifs are conserved in HVP ori-Lyt, and the HVP DS-L leftward promoter and the enhancer were activated in transient transfection assays by the EBV Zta and Rta transactivators. The EBV ori-Lyt enhancer contains a palindromic sequence, GGTCAGCTGACC, centered on a PvuII restriction site. This sequence, with a single base change, is also present in the HVP ori-Lyt enhancer. DNase I footprinting demonstrated that the PvuII sequence was bound by a protein present in a Raji nuclear extract. Mobility shift and competition assays using oligonucleotide probes identified this sequence as a binding site for the cellular transcription factor MLTF. Mutagenesis of the binding site indicated that MLTF contributes significantly to the constitutive activity of the ori-Lyt enhancer. The high degree of conservation of cis-acting signal sequences in HVP ori-Lyt was further emphasized by the finding that an HVP ori-Lyt-containing plasmid was replicated in Vero cells by a set of cotransfected EBV replication genes. The central domain of EBV ori-Lyt contains two related AT-rich palindromes, one of which is partially duplicated in the HVP sequence. The AT-rich palindromes are functionally important cis-acting motifs. Deletion of these palindromes severely diminished replication of an ori-Lyt target plasmid. Images PMID:8389916

  20. BayesMotif: de novo protein sorting motif discovery from impure datasets.

    PubMed

    Hu, Jianjun; Zhang, Fan

    2010-01-18

    Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model.

  1. An RNAi in silico approach to find an optimal shRNA cocktail against HIV-1

    PubMed Central

    2010-01-01

    Background HIV-1 can be inhibited by RNA interference in vitro through the expression of short hairpin RNAs (shRNAs) that target conserved genome sequences. In silico shRNA design for HIV has lacked a detailed study of virus variability constituting a possible breaking point in a clinical setting. We designed shRNAs against HIV-1 considering the variability observed in naïve and drug-resistant isolates available at public databases. Methods A Bioperl-based algorithm was developed to automatically scan multiple sequence alignments of HIV, while evaluating the possibility of identifying dominant and subdominant viral variants that could be used as efficient silencing molecules. Student t-test and Bonferroni Dunn correction test were used to assess statistical significance of our findings. Results Our in silico approach identified the most common viral variants within highly conserved genome regions, with a calculated free energy of ≥ -6.6 kcal/mol. This is crucial for strand loading to RISC complex and for a predicted silencing efficiency score, which could be used in combination for achieving over 90% silencing. Resistant and naïve isolate variability revealed that the most frequent shRNA per region targets a maximum of 85% of viral sequences. Adding more divergent sequences maintained this percentage. Specific sequence features that have been found to be related with higher silencing efficiency were hardly accomplished in conserved regions, even when lower entropy values correlated with better scores. We identified a conserved region among most HIV-1 genomes, which meets as many sequence features for efficient silencing. Conclusions HIV-1 variability is an obstacle to achieving absolute silencing using shRNAs designed against a consensus sequence, mainly because there are many functional viral variants. Our shRNA cocktail could be truly effective at silencing dominant and subdominant naïve viral variants. Additionally, resistant isolates might be targeted under specific antiretroviral selective pressure, but in both cases these should be tested exhaustively prior to clinical use. PMID:21172023

  2. Conservation of streptococcal CRISPRs on human skin and saliva.

    PubMed

    Robles-Sikisaka, Refugio; Naidu, Mayuri; Ly, Melissa; Salzman, Julia; Abeles, Shira R; Boehm, Tobias K; Pride, David T

    2014-06-06

    Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) are utilized by bacteria to resist encounters with their viruses. Human body surfaces have numerous bacteria that harbor CRISPRs, and their content can provide clues as to the types and features of viruses they may have encountered. We investigated the conservation of CRISPR content from streptococci on skin and saliva of human subjects over 8-weeks to determine whether similarities existed in the CRISPR spacer profiles and whether CRISPR spacers were a stable component of each biogeographic site. Most of the CRISPR sequences identified were unique, but a small proportion of spacers from the skin and saliva of each subject matched spacers derived from previously sequenced loci of S. thermophilus and other streptococci. There were significant proportions of CRISPR spacers conserved over the entire 8-week study period for all subjects, and salivary CRISPR spacers sampled in the mornings showed significantly higher levels of conservation than any other time of day. We also found substantial similarities in the spacer repertoires of the skin and saliva of each subject. Many skin-derived spacers matched salivary viruses, supporting that bacteria of the skin may encounter viruses with similar sequences to those found in the mouth. Despite the similarities between skin and salivary spacer repertoires, the variation present was distinct based on each subject and body site. The conservation of CRISPR spacers in the saliva and the skin of human subjects over the time period studied suggests a relative conservation of the bacteria harboring them.

  3. Conservation of streptococcal CRISPRs on human skin and saliva

    PubMed Central

    2014-01-01

    Background Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) are utilized by bacteria to resist encounters with their viruses. Human body surfaces have numerous bacteria that harbor CRISPRs, and their content can provide clues as to the types and features of viruses they may have encountered. Results We investigated the conservation of CRISPR content from streptococci on skin and saliva of human subjects over 8-weeks to determine whether similarities existed in the CRISPR spacer profiles and whether CRISPR spacers were a stable component of each biogeographic site. Most of the CRISPR sequences identified were unique, but a small proportion of spacers from the skin and saliva of each subject matched spacers derived from previously sequenced loci of S. thermophilus and other streptococci. There were significant proportions of CRISPR spacers conserved over the entire 8-week study period for all subjects, and salivary CRISPR spacers sampled in the mornings showed significantly higher levels of conservation than any other time of day. We also found substantial similarities in the spacer repertoires of the skin and saliva of each subject. Many skin-derived spacers matched salivary viruses, supporting that bacteria of the skin may encounter viruses with similar sequences to those found in the mouth. Despite the similarities between skin and salivary spacer repertoires, the variation present was distinct based on each subject and body site. Conclusions The conservation of CRISPR spacers in the saliva and the skin of human subjects over the time period studied suggests a relative conservation of the bacteria harboring them. PMID:24903519

  4. SRD: a Staphylococcus regulatory RNA database.

    PubMed

    Sassi, Mohamed; Augagneur, Yoann; Mauro, Tony; Ivain, Lorraine; Chabelskaya, Svetlana; Hallier, Marc; Sallou, Olivier; Felden, Brice

    2015-05-01

    An overflow of regulatory RNAs (sRNAs) was identified in a wide range of bacteria. We designed and implemented a new resource for the hundreds of sRNAs identified in Staphylococci, with primary focus on the human pathogen Staphylococcus aureus. The "Staphylococcal Regulatory RNA Database" (SRD, http://srd.genouest.org/) compiled all published data in a single interface including genetic locations, sequences and other features. SRD proposes novel and simplified identifiers for Staphylococcal regulatory RNAs (srn) based on the sRNA's genetic location in S. aureus strain N315 which served as a reference. From a set of 894 sequences and after an in-depth cleaning, SRD provides a list of 575 srn exempt of redundant sequences. For each sRNA, their experimental support(s) is provided, allowing the user to individually assess their validity and significance. RNA-seq analysis performed on strains N315, NCTC8325, and Newman allowed us to provide further details, upgrade the initial annotation, and identified 159 RNA-seq independent transcribed sRNAs. The lists of 575 and 159 sRNAs sequences were used to predict the number and location of srns in 18 S. aureus strains and 10 other Staphylococci. A comparison of the srn contents within 32 Staphylococcal genomes revealed a poor conservation between species. In addition, sRNA structure predictions obtained with MFold are accessible. A BLAST server and the intaRNA program, which is dedicated to target prediction, were implemented. SRD is the first sRNA database centered on a genus; it is a user-friendly and scalable device with the possibility to submit new sequences that should spread in the literature. © 2015 Sassi et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  5. Conservation of a pH-sensitive structure in the C-terminal region of spider silk extends across the entire silk gene family.

    PubMed

    Strickland, Michelle; Tudorica, Victor; Řezáč, Milan; Thomas, Neil R; Goodacre, Sara L

    2018-06-01

    Spiders produce multiple silks with different physical properties that allow them to occupy a diverse range of ecological niches, including the underwater environment. Despite this functional diversity, past molecular analyses show a high degree of amino acid sequence similarity between C-terminal regions of silk genes that appear to be independent of the physical properties of the resulting silks; instead, this domain is crucial to the formation of silk fibers. Here, we present an analysis of the C-terminal domain of all known types of spider silk and include silk sequences from the spider Argyroneta aquatica, which spins the majority of its silk underwater. Our work indicates that spiders have retained a highly conserved mechanism of silk assembly, despite the extraordinary diversification of species, silk types and applications of silk over 350 million years. Sequence analysis of the silk C-terminal domain across the entire gene family shows the conservation of two uncommon amino acids that are implicated in the formation of a salt bridge, a functional bond essential to protein assembly. This conservation extends to the novel sequences isolated from A. aquatica. This finding is relevant to research regarding the artificial synthesis of spider silk, suggesting that synthesis of all silk types will be possible using a single process.

  6. The Complete Mitochondrial Genome of Gossypium hirsutum and Evolutionary Analysis of Higher Plant Mitochondrial Genomes

    PubMed Central

    Su, Aiguo; Geng, Jianing; Grover, Corrinne E.; Hu, Songnian; Hua, Jinping

    2013-01-01

    Background Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. Methodology/Principal Findings We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. Conclusion The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species. PMID:23940520

  7. The complete mitochondrial genome of Gossypium hirsutum and evolutionary analysis of higher plant mitochondrial genomes.

    PubMed

    Liu, Guozheng; Cao, Dandan; Li, Shuangshuang; Su, Aiguo; Geng, Jianing; Grover, Corrinne E; Hu, Songnian; Hua, Jinping

    2013-01-01

    Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.

  8. Nucleotide sequence of the gene for the Mr 32,000 thylakoid membrane protein from Spinacia oleracea and Nicotiana debneyi predicts a totally conserved primary translation product of Mr 38,950

    PubMed Central

    Zurawski, Gerard; Bohnert, Hans J.; Whitfeld, Paul R.; Bottomley, Warwick

    1982-01-01

    The gene for the so-called Mr 32,000 rapidly labeled photosystem II thylakoid membrane protein (here designated psbA) of spinach (Spinacia oleracea) chloroplasts is located on the chloroplast DNA in the large single-copy region immediately adjacent to one of the inverted repeat sequences. In this paper we show that the size of the mRNA for this protein is ≈ 1.25 kilobases and that the direction of transcription is towards the inverted repeat unit. The nucleotide sequence of the gene and its flanking regions is presented. The only large open reading frame in the sequence codes for a protein of Mr 38,950. The nucleotide sequence of psbA from Nicotiana debneyi also has been determined, and comparison of the sequences from the two species shows them to be highly conserved (>95% homology) throughout the entire reading frame. Conservation of the amino acid sequence is absolute, there being no changes in a total of 353 residues. This leads us to conclude that the primary translation product of psbA must be a protein of Mr 38,950. The protein is characterized by the complete absence of lysine residues and is relatively rich in hydrophobic amino acids, which tend to be clustered. Transcription of spinach psbA starts about 86 base pairs before the first ATG codon. Immediately upstream from this point there is a sequence typical of that found in E. coli promoters. An almost identical sequence occurs in the equivalent region of N. debneyi DNA. Images PMID:16593262

  9. Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns.

    PubMed

    Ortuño, Francisco M; Valenzuela, Olga; Rojas, Fernando; Pomares, Hector; Florido, Javier P; Urquiza, Jose M; Rojas, Ignacio

    2013-09-01

    Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P < 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P < 0.05), whereas it shows results not significantly different to 3D-COFFEE (P > 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.

  10. Sequence diversity within the reovirus S2 gene: reovirus genes reassort in nature, and their termini are predicted to form a panhandle motif.

    PubMed Central

    Chapell, J D; Goral, M I; Rodgers, S E; dePamphilis, C W; Dermody, T S

    1994-01-01

    To better understand genetic diversity within mammalian reoviruses, we determined S2 nucleotide and deduced sigma 2 amino acid sequences of nine reovirus strains and compared these sequences with those of prototype strains of the three reovirus serotypes. The S2 gene and sigma 2 protein are highly conserved among the four type 1, one type 2, and seven type 3 strains studied. Phylogenetic analyses based on S2 nucleotide sequences of the 12 reovirus strains indicate that diversity within the S2 gene is independent of viral serotype. Additionally, we found marked topological differences between phylogenetic trees generated from S1 and S2 gene nucleotide sequences of the seven type 3 strains. These results demonstrate that reovirus S1 and S2 genes have distinct evolutionary histories, thus providing phylogenetic evidence for lateral transfer of reovirus genes in nature. When variability among the 12 sigma 2-encoding S2 nucleotide sequences was analyzed at synonymous positions, we found that approximately 60 nucleotides at the 5' terminus and 30 nucleotides at the 3' terminus were markedly conserved in comparison with other sigma 2-encoding regions of S2. Predictions of RNA secondary structures indicate that the more conserved S2 sequences participate in the formation of an extended region of duplex RNA interrupted by a pair of stem-loops. Among the 12 deduced sigma 2 amino acid sequences examined, substitutions were observed at only 11% of amino acid positions. This finding suggests that constraints on the structure or function of sigma 2, perhaps in part because of its location in the virion core, have limited sequence diversity within this protein. PMID:8289378

  11. On the Role of Aggregation Prone Regions in Protein Evolution, Stability, and Enzymatic Catalysis: Insights from Diverse Analyses

    PubMed Central

    Buck, Patrick M.; Kumar, Sandeep; Singh, Satish K.

    2013-01-01

    The various roles that aggregation prone regions (APRs) are capable of playing in proteins are investigated here via comprehensive analyses of multiple non-redundant datasets containing randomly generated amino acid sequences, monomeric proteins, intrinsically disordered proteins (IDPs) and catalytic residues. Results from this study indicate that the aggregation propensities of monomeric protein sequences have been minimized compared to random sequences with uniform and natural amino acid compositions, as observed by a lower average aggregation propensity and fewer APRs that are shorter in length and more often punctuated by gate-keeper residues. However, evidence for evolutionary selective pressure to disrupt these sequence regions among homologous proteins is inconsistent. APRs are less conserved than average sequence identity among closely related homologues (≥80% sequence identity with a parent) but APRs are more conserved than average sequence identity among homologues that have at least 50% sequence identity with a parent. Structural analyses of APRs indicate that APRs are three times more likely to contain ordered versus disordered residues and that APRs frequently contribute more towards stabilizing proteins than equal length segments from the same protein. Catalytic residues and APRs were also found to be in structural contact significantly more often than expected by random chance. Our findings suggest that proteins have evolved by optimizing their risk of aggregation for cellular environments by both minimizing aggregation prone regions and by conserving those that are important for folding and function. In many cases, these sequence optimizations are insufficient to develop recombinant proteins into commercial products. Rational design strategies aimed at improving protein solubility for biotechnological purposes should carefully evaluate the contributions made by candidate APRs, targeted for disruption, towards protein structure and activity. PMID:24146608

  12. Mitochondrial genome sequences illuminate maternal lineages of conservation concern in a rare carnivore

    Treesearch

    Brian J. Knaus; Richard Cronn; Aaron Liston; Kristine Pilgrim; Michael K. Schwartz

    2011-01-01

    Science-based wildlife management relies on genetic information to infer population connectivity and identify conservation units. The most commonly used genetic marker for characterizing animal biodiversity and identifying maternal lineages is the mitochondrial genome. Mitochondrial genotyping figures prominently in conservation and management plans, with much of the...

  13. A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities.

    PubMed

    Bastien, Olivier; Ortet, Philippe; Roy, Sylvaine; Maréchal, Eric

    2005-03-10

    Popular methods to reconstruct molecular phylogenies are based on multiple sequence alignments, in which addition or removal of data may change the resulting tree topology. We have sought a representation of homologous proteins that would conserve the information of pair-wise sequence alignments, respect probabilistic properties of Z-scores (Monte Carlo methods applied to pair-wise comparisons) and be the basis for a novel method of consistent and stable phylogenetic reconstruction. We have built up a spatial representation of protein sequences using concepts from particle physics (configuration space) and respecting a frame of constraints deduced from pair-wise alignment score properties in information theory. The obtained configuration space of homologous proteins (CSHP) allows the representation of real and shuffled sequences, and thereupon an expression of the TULIP theorem for Z-score probabilities. Based on the CSHP, we propose a phylogeny reconstruction using Z-scores. Deduced trees, called TULIP trees, are consistent with multiple-alignment based trees. Furthermore, the TULIP tree reconstruction method provides a solution for some previously reported incongruent results, such as the apicomplexan enolase phylogeny. The CSHP is a unified model that conserves mutual information between proteins in the way physical models conserve energy. Applications include the reconstruction of evolutionary consistent and robust trees, the topology of which is based on a spatial representation that is not reordered after addition or removal of sequences. The CSHP and its assigned phylogenetic topology, provide a powerful and easily updated representation for massive pair-wise genome comparisons based on Z-score computations.

  14. Characterization of microRNAs Expressed during Secondary Wall Biosynthesis in Acacia mangium

    PubMed Central

    Ong, Seong Siang; Wickneswari, Ratnam

    2012-01-01

    MicroRNAs (miRNAs) play critical regulatory roles by acting as sequence specific guide during secondary wall formation in woody and non-woody species. Although thousands of plant miRNAs have been sequenced, there is no comprehensive view of miRNA mediated gene regulatory network to provide profound biological insights into the regulation of xylem development. Herein, we report the involvement of six highly conserved amg-miRNA families (amg-miR166, amg-miR172, amg-miR168, amg-miR159, amg-miR394, and amg-miR156) as the potential regulatory sequences of secondary cell wall biosynthesis. Within this highly conserved amg-miRNA family, only amg-miR166 exhibited strong differences in expression between phloem and xylem tissue. The functional characterization of amg-miR166 targets in various tissues revealed three groups of HD-ZIP III: ATHB8, ATHB15, and REVOLUTA which play pivotal roles in xylem development. Although these three groups vary in their functions, -psRNA target analysis indicated that miRNA target sequences of the nine different members of HD-ZIP III are always conserved. We found that precursor structures of amg-miR166 undergo exhaustive sequence variation even within members of the same family. Gene expression analysis showed three key lignin pathway genes: C4H, CAD, and CCoAOMT were upregulated in compression wood where a cascade of miRNAs was downregulated. This study offers a comprehensive analysis on the involvement of highly conserved miRNAs implicated in the secondary wall formation of woody plants. PMID:23251324

  15. Synchronous detection of ebolavirus conserved RNA sequences and ebolavirus-encoded miRNA-like fragment based on a zwitterionic copper (II) metal-organic framework.

    PubMed

    Qiu, Gui-Hua; Weng, Zi-Hua; Hu, Pei-Pei; Duan, Wen-Jun; Xie, Bao-Ping; Sun, Bin; Tang, Xiao-Yan; Chen, Jin-Xiang

    2018-04-01

    From a three-dimensional (3D) metal-organic framework (MOF) of {[Cu(Cmdcp)(phen)(H 2 O)] 2 ·9H 2 O} n (1, H 3 CmdcpBr = N-carboxymethyl-(3,5-dicarboxyl)pyridinium bromide, phen = phenanthroline), a sensitive and selective fluorescence sensor has been developed for the simultaneous detection of ebolavirus conserved RNA sequences and ebolavirus-encoded microRNA-like (miRNA-like) fragment. The results from molecular dynamics simulation confirmed that MOF 1 absorbs carboxyfluorescein (FAM)-tagged and 5(6)-carboxyrhodamine, triethylammonium salt (ROX)-tagged probe ss-DNA (probe DNA, P-DNA) by π … π stacking and hydrogen bonding, as well as additional electrostatic interactions to form a sensing platform of P-DNAs@1 with quenched FAM and ROX fluorescence. In the presence of targeted ebolavirus conserved RNA sequences or ebolavirus-encoded miRNA-like fragment, the fluorophore-labeled P-DNA hybridizes with the analyte to give a P-DNA@RNA duplex and released from MOF 1, triggering a fluorescence recovery. Simultaneous detection of two target RNAs has also been realized by single and synchronous fluorescence analysis. The formed sensing platform shows high sensitivity for ebolavirus conserved RNA sequences and ebolavirus-encoded miRNA-like fragment with detection limits at the picomolar level and high selectivity without cross-reaction between the two probes. MOF 1 thus shows the potential as an effective fluorescent sensing platform for the synchronous detection of two ebolavirus-related sequences, and offer improved diagnostic accuracy of Ebola virus disease. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Reptiles and mammals have differentially retained long conserved noncoding sequences from the amniote ancestor.

    PubMed

    Janes, D E; Chapus, C; Gondo, Y; Clayton, D F; Sinha, S; Blatti, C A; Organ, C L; Fujita, M K; Balakrishnan, C N; Edwards, S V

    2011-01-01

    Many noncoding regions of genomes appear to be essential to genome function. Conservation of large numbers of noncoding sequences has been reported repeatedly among mammals but not thus far among birds and reptiles. By searching genomes of chicken (Gallus gallus), zebra finch (Taeniopygia guttata), and green anole (Anolis carolinensis), we quantified the conservation among birds and reptiles and across amniotes of long, conserved noncoding sequences (LCNS), which we define as sequences ≥500 bp in length and exhibiting ≥95% similarity between species. We found 4,294 LCNS shared between chicken and zebra finch and 574 LCNS shared by the two birds and Anolis. The percent of genomes comprised by LCNS in the two birds (0.0024%) is notably higher than the percent in mammals (<0.0003% to <0.001%), differences that we show may be explained in part by differences in genome-wide substitution rates. We reconstruct a large number of LCNS for the amniote ancestor (ca. 8,630) and hypothesize differential loss and substantial turnover of these sites in descendent lineages. By contrast, we estimated a small role for recruitment of LCNS via acquisition of novel functions over time. Across amniotes, LCNS are significantly enriched with transcription factor binding sites for many developmental genes, and 2.9% of LCNS shared between the two birds show evidence of expression in brain expressed sequence tag databases. These results show that the rate of retention of LCNS from the amniote ancestor differs between mammals and Reptilia (including birds) and that this may reflect differing roles and constraints in gene regulation.

  17. Reptiles and Mammals Have Differentially Retained Long Conserved Noncoding Sequences from the Amniote Ancestor

    PubMed Central

    Janes, D.E.; Chapus, C.; Gondo, Y.; Clayton, D.F.; Sinha, S.; Blatti, C.A.; Organ, C.L.; Fujita, M.K.; Balakrishnan, C.N.; Edwards, S.V.

    2010-01-01

    Many noncoding regions of genomes appear to be essential to genome function. Conservation of large numbers of noncoding sequences has been reported repeatedly among mammals but not thus far among birds and reptiles. By searching genomes of chicken (Gallus gallus), zebra finch (Taeniopygia guttata), and green anole (Anolis carolinensis), we quantified the conservation among birds and reptiles and across amniotes of long, conserved noncoding sequences (LCNS), which we define as sequences ≥500 bp in length and exhibiting ≥95% similarity between species. We found 4,294 LCNS shared between chicken and zebra finch and 574 LCNS shared by the two birds and Anolis. The percent of genomes comprised by LCNS in the two birds (0.0024%) is notably higher than the percent in mammals (<0.0003% to <0.001%), differences that we show may be explained in part by differences in genome-wide substitution rates. We reconstruct a large number of LCNS for the amniote ancestor (ca. 8,630) and hypothesize differential loss and substantial turnover of these sites in descendent lineages. By contrast, we estimated a small role for recruitment of LCNS via acquisition of novel functions over time. Across amniotes, LCNS are significantly enriched with transcription factor binding sites for many developmental genes, and 2.9% of LCNS shared between the two birds show evidence of expression in brain expressed sequence tag databases. These results show that the rate of retention of LCNS from the amniote ancestor differs between mammals and Reptilia (including birds) and that this may reflect differing roles and constraints in gene regulation. PMID:21183607

  18. Distantly related lipocalins share two conserved clusters of hydrophobic residues: use in homology modeling

    PubMed Central

    Adam, Benoit; Charloteaux, Benoit; Beaufays, Jerome; Vanhamme, Luc; Godfroid, Edmond; Brasseur, Robert; Lins, Laurence

    2008-01-01

    Background Lipocalins are widely distributed in nature and are found in bacteria, plants, arthropoda and vertebra. In hematophagous arthropods, they are implicated in the successful accomplishment of the blood meal, interfering with platelet aggregation, blood coagulation and inflammation and in the transmission of disease parasites such as Trypanosoma cruzi and Borrelia burgdorferi. The pairwise sequence identity is low among this family, often below 30%, despite a well conserved tertiary structure. Under the 30% identity threshold, alignment methods do not correctly assign and align proteins. The only safe way to assign a sequence to that family is by experimental determination. However, these procedures are long and costly and cannot always be applied. A way to circumvent the experimental approach is sequence and structure analyze. To further help in that task, the residues implicated in the stabilisation of the lipocalin fold were determined. This was done by analyzing the conserved interactions for ten lipocalins having a maximum pairwise identity of 28% and various functions. Results It was determined that two hydrophobic clusters of residues are conserved by analysing the ten lipocalin structures and sequences. One cluster is internal to the barrel, involving all strands and the 310 helix. The other is external, involving four strands and the helix lying parallel to the barrel surface. These clusters are also present in RaHBP2, a unusual "outlier" lipocalin from tick Rhipicephalus appendiculatus. This information was used to assess assignment of LIR2 a protein from Ixodes ricinus and to build a 3D model that helps to predict function. FTIR data support the lipocalin fold for this protein. Conclusion By sequence and structural analyzes, two conserved clusters of hydrophobic residues in interactions have been identified in lipocalins. Since the residues implicated are not conserved for function, they should provide the minimal subset necessary to confer the lipocalin fold. This information has been used to assign LIR2 to lipocalins and to investigate its structure/function relationship. This study could be applied to other protein families with low pairwise similarity, such as the structurally related fatty acid binding proteins or avidins. PMID:18190694

  19. Identification of a new Apscaviroid from Japanese persimmon.

    PubMed

    Nakaune, Ryoji; Nakano, Masaaki

    2008-01-01

    Three viroid-like sequences were detected from Japanese persimmon (Diospyrus kaki Thunb.) by RT-PCR using primers specific for members of the genus Apscaviroid. Based on the sequences, we determined the complete genomic sequences. Two had 92.1-94.3% sequence identity with citrus viroid OS (CVd-OS) and 91.4-96.3% identity with apple fruit crinkle viroid (AFCVd), respectively. Another one, tentatively named persimmon viroid (PVd), had 396 nucleotides and less than 70% sequence identity with known viroids. The secondary structure of PVd is proposed to be rod-like with extensive base pairing and contains the terminal conserved region and the central conserved region characteristic of the genus Apscaviroid. Moreover, we confirmed that the viroids, including PVd, are graft transmissible from persimmon to persimmon and that persimmon is a natural host of these viroids. According to its molecular and biological properties, PVd should be considered a member of a new species in the genus Apscaviroid.

  20. Characterization, genetic diversity, and evolutionary link of Cucumber mosaic virus strain New Delhi from India.

    PubMed

    Koundal, Vikas; Haq, Qazi Mohd Rizwanul; Praveen, Shelly

    2011-02-01

    The genome of Cucumber mosaic virus New Delhi strain (CMV-ND) from India, obtained from tomato, was completely sequenced and compared with full genome sequences of 14 known CMV strains from subgroups I and II, for their genetic diversity. Sequence analysis suggests CMV-ND shares maximum sequence identity at the nucleotide level with a CMV strain from Taiwan. Among all 15 strains of CMV, the encoded protein 2b is least conserved, whereas the coat protein (CP) is most conserved. Sequence identity values and phylogram results indicate that CMV-ND belongs to subgroup I. Based on the recombination detection program result, it appears that CMV is prone to recombination, and different RNA components of CMV-ND have evolved differently. Recombinational analysis of all 15 CMV strains detected maximum recombination breakpoints in RNA2; CP showed the least recombination sites.

  1. Global sequence diversity of the lactate dehydrogenase gene in Plasmodium falciparum.

    PubMed

    Simpalipan, Phumin; Pattaradilokrat, Sittiporn; Harnyuttanakorn, Pongchai

    2018-01-09

    Antigen-detecting rapid diagnostic tests (RDTs) have been recommended by the World Health Organization for use in remote areas to improve malaria case management. Lactate dehydrogenase (LDH) of Plasmodium falciparum is one of the main parasite antigens employed by various commercial RDTs. It has been hypothesized that the poor detection of LDH-based RDTs is attributed in part to the sequence diversity of the gene. To test this, the present study aimed to investigate the genetic diversity of the P. falciparum ldh gene in Thailand and to construct the map of LDH sequence diversity in P. falciparum populations worldwide. The ldh gene was sequenced for 50 P. falciparum isolates in Thailand and compared with hundreds of sequences from P. falciparum populations worldwide. Several indices of molecular variation were calculated, including the proportion of polymorphic sites, the average nucleotide diversity index (π), and the haplotype diversity index (H). Tests of positive selection and neutrality tests were performed to determine signatures of natural selection on the gene. Mean genetic distance within and between species of Plasmodium ldh was analysed to infer evolutionary relationships. Nucleotide sequences of P. falciparum ldh could be classified into 9 alleles, encoding 5 isoforms of LDH. L1a was the most common allelic type and was distributed in P. falciparum populations worldwide. Plasmodium falciparum ldh sequences were highly conserved, with haplotype and nucleotide diversity values of 0.203 and 0.0004, respectively. The extremely low genetic diversity was maintained by purifying selection, likely due to functional constraints. Phylogenetic analysis inferred the close genetic relationship of P. falciparum to malaria parasites of great apes, rather than to other human malaria parasites. This study revealed the global genetic variation of the ldh gene in P. falciparum, providing knowledge for improving detection of LDH-based RDTs and supporting the candidacy of LDH as a therapeutic drug target.

  2. Exploring the genome of the salt-marsh Spartina maritima (Poaceae, Chloridoideae) through BAC end sequence analysis.

    PubMed

    Ferreira de Carvalho, J; Chelaifa, H; Boutte, J; Poulain, J; Couloux, A; Wincker, P; Bellec, A; Fourment, J; Bergès, H; Salmon, A; Ainouche, M

    2013-12-01

    Spartina species play an important ecological role on salt marshes. Spartina maritima is an Old-World species distributed along the European and North-African Atlantic coasts. This hexaploid species (2n = 6x = 60, 2C = 3,700 Mb) hybridized with different Spartina species introduced from the American coasts, which resulted in the formation of new invasive hybrids and allopolyploids. Thus, S. maritima raises evolutionary and ecological interests. However, genomic information is dramatically lacking in this genus. In an effort to develop genomic resources, we analysed 40,641 high-quality bacterial artificial chromosome-end sequences (BESs), representing 26.7 Mb of the S. maritima genome. BESs were searched for sequence homology against known databases. A fraction of 16.91% of the BESs represents known repeats including a majority of long terminal repeat (LTR) retrotransposons (13.67%). Non-LTR retrotransposons represent 0.75%, DNA transposons 0.99%, whereas small RNA, simple repeats and low-complexity sequences account for 1.38% of the analysed BESs. In addition, 4,285 simple sequence repeats were detected. Using the coding sequence database of Sorghum bicolor, 6,809 BESs found homology accounting for 17.1% of all BESs. Comparative genomics with related genera reveals that the microsynteny is better conserved with S. bicolor compared to other sequenced Poaceae, where 37.6% of the paired matching BESs are correctly orientated on the chromosomes. We did not observe large macrosyntenic rearrangements using the mapping strategy employed. However, some regions appeared to have experienced rearrangements when comparing Spartina to Sorghum and to Oryza. This work represents the first overview of S. maritima genome regarding the respective coding and repetitive components. The syntenic relationships with other grass genomes examined here help clarifying evolution in Poaceae, S. maritima being a part of the poorly-known Chloridoideae sub-family.

  3. The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures.

    PubMed

    Goldenberg, Ofir; Erez, Elana; Nimrod, Guy; Ben-Tal, Nir

    2009-01-01

    ConSurf-DB is a repository for evolutionary conservation analysis of the proteins of known structures in the Protein Data Bank (PDB). Sequence homologues of each of the PDB entries were collected and aligned using standard methods. The evolutionary conservation of each amino acid position in the alignment was calculated using the Rate4Site algorithm, implemented in the ConSurf web server. The algorithm takes into account the phylogenetic relations between the aligned proteins and the stochastic nature of the evolutionary process explicitly. Rate4Site assigns a conservation level for each position in the multiple sequence alignment using an empirical Bayesian inference. Visual inspection of the conservation patterns on the 3D structure often enables the identification of key residues that comprise the functionally important regions of the protein. The repository is updated with the latest PDB entries on a monthly basis and will be rebuilt annually. ConSurf-DB is available online at http://consurfdb.tau.ac.il/

  4. The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures

    PubMed Central

    Goldenberg, Ofir; Erez, Elana; Nimrod, Guy; Ben-Tal, Nir

    2009-01-01

    ConSurf-DB is a repository for evolutionary conservation analysis of the proteins of known structures in the Protein Data Bank (PDB). Sequence homologues of each of the PDB entries were collected and aligned using standard methods. The evolutionary conservation of each amino acid position in the alignment was calculated using the Rate4Site algorithm, implemented in the ConSurf web server. The algorithm takes into account the phylogenetic relations between the aligned proteins and the stochastic nature of the evolutionary process explicitly. Rate4Site assigns a conservation level for each position in the multiple sequence alignment using an empirical Bayesian inference. Visual inspection of the conservation patterns on the 3D structure often enables the identification of key residues that comprise the functionally important regions of the protein. The repository is updated with the latest PDB entries on a monthly basis and will be rebuilt annually. ConSurf-DB is available online at http://consurfdb.tau.ac.il/ PMID:18971256

  5. Comparative sequence analysis of a region on human chromosome 13q14, frequently deleted in B-cell chronic lymphocytic leukemia, and its homologous region on mouse chromosome 14.

    PubMed

    Kapanadze, B; Makeeva, N; Corcoran, M; Jareborg, N; Hammarsund, M; Baranova, A; Zabarovsky, E; Vorontsova, O; Merup, M; Gahrton, G; Jansson, M; Yankovsky, N; Einhorn, S; Oscier, D; Grandér, D; Sangfelt, O

    2000-12-15

    Previous studies have indicated the presence of a putative tumor suppressor gene on human chromosome 13q14, commonly deleted in patients with B-cell chronic lymphocytic leukemia (B-CLL). We have recently identified a minimally deleted region encompassing parts of two adjacent genes, termed LEU1 and LEU2 (leukemia-associated genes 1 and 2), and several additional transcripts. In addition, 50 kb centromeric to this region we have identified another gene, LEU5/RFP2. To elucidate further the complex genomic organization of this region, we have identified, mapped, and sequenced the homologous region in the mouse. Fluorescence in situ hybridization analysis demonstrated that the region maps to mouse chromosome 14. The overall organization and gene order in this region were found to be highly conserved in the mouse. Sequence comparison between the human deletion hotspot region and its homologous mouse region revealed a high degree of sequence conservation with an overall score of 74%. However, our data also show that in terms of transcribed sequences, only two of those, human LEU2 and LEU5/RFP2, are clearly conserved, strengthening the case for these genes as putative candidate B-CLL tumor suppressor genes.

  6. Energy Conservation Curriculum for Secondary and Post-Secondary Students. Module 9: Human Comfort and Energy Conservation.

    ERIC Educational Resources Information Center

    Navarro Coll., Corsicana, TX.

    This module is the ninth in a series of eleven modules in an energy conservation curriculum for secondary and postsecondary vocational students. It is designed for use by itself or as part of a sequence of four modules on energy conservation in building construction and operation (see also modules 8, 10, and 11). The objective of this module is to…

  7. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.

    PubMed

    2004-12-09

    We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.

  8. Conservation and Sex-Specific Splicing of the transformer Gene in the Calliphorids Cochliomyia hominivorax, Cochliomyia macellaria and Lucilia sericata

    PubMed Central

    Li, Fang; Vensko, Steven P.; Belikoff, Esther J.; Scott, Maxwell J.

    2013-01-01

    Transformer (TRA) promotes female development in several dipteran species including the Australian sheep blowfly Lucilia cuprina, the Mediterranean fruit fly, housefly and Drosophila melanogaster. tra transcripts are sex-specifically spliced such that only the female form encodes full length functional protein. The presence of six predicted TRA/TRA2 binding sites in the sex-specific female intron of the L. cuprina gene suggested that tra splicing is auto-regulated as in medfly and housefly. With the aim of identifying conserved motifs that may play a role in tra sex-specific splicing, here we have isolated and characterized the tra gene from three additional blowfly species, L. sericata, Cochliomyia hominivorax and C. macellaria. The blowfly adult male and female transcripts differ in the choice of splice donor site in the first intron, with males using a site downstream of the site used in females. The tra genes all contain a single TRA/TRA2 site in the male exon and a cluster of four to five sites in the male intron. However, overall the sex-specific intron sequences are poorly conserved in closely related blowflies. The most conserved regions are around the exon/intron junctions, the 3′ end of the intron and near the cluster of TRA/TRA2 sites. We propose a model for sex specific regulation of tra splicing that incorporates the conserved features identified in this study. In L. sericata embryos, the male tra transcript was first detected at around the time of cellular blastoderm formation. RNAi experiments showed that tra is required for female development in L. sericata and C. macellaria. The isolation of the tra gene from the New World screwworm fly C. hominivorax, a major livestock pest, will facilitate the development of a “male-only” strain for genetic control programs. PMID:23409170

  9. Relevance of the Axis Spermidine/eIF5A for Plant Growth and Development

    PubMed Central

    Belda-Palazón, Borja; Almendáriz, Carla; Martí, Esmeralda; Carbonell, Juan; Ferrando, Alejandro

    2016-01-01

    One key role of the essential polyamine spermidine in eukaryotes is to provide the 4-aminobutyl moiety group destined to the post-translational modification of a lysine in the highly conserved translation factor eIF5A. This modification is catalyzed by two sequential enzymatic steps leading to the activation of eIF5A by the conversion of one conserved lysine to the unusual amino acid hypusine. The active translation factor facilitates the sequence-specific translation of polyproline sequences that otherwise cause ribosome stalling. In spite of the well-characterized involvement of active eIF5A in the translation of proline repeat-rich proteins, its biological role has been recently elucidated only in mammals, and it is poorly described at the functional level in plants. Here we describe the alterations in plant growth and development caused by RNAi-mediated conditional genetic inactivation of the hypusination pathway in Arabidopsis thaliana by knocking-down the enzyme deoxyhypusine synthase. We have uncovered that spermidine-mediated activation of eIF5A by hypusination is involved in several aspects of plant biology such as the control of flowering time, the aerial and root architecture, and root hair growth. In addition this pathway is required for adaptation to challenging growth conditions such as high salt and high glucose medium and to elevated concentrations of the plant hormone ABA. We have also performed a bioinformatic analysis of polyproline-rich containing proteins as putative eIF5A targets to uncover their organization in clusters of protein networks to find molecular culprits for the disclosed phenotypes. This study represents a first attempt to provide a holistic view of the biological relevance of the spermidine-dependent hypusination pathway for plant growth and development. PMID:26973686

  10. The human phospholamban gene: structure and expression.

    PubMed

    McTiernan, C F; Frye, C S; Lemster, B H; Kinder, E A; Ogletree-Hughes, M L; Moravec, C S; Feldman, A M

    1999-03-01

    Phospholamban, through modulation of sarcoplasmic reticulum calcium-ATPase activity, is a key regulator of cardiac diastolic function. Alterations in phospholamban expression may define parameters of muscle relaxation. In experimental animals, phospholamban is differentially expressed in various striated and smooth muscles, and within the four chambers of the heart. Decreased phospholamban expression within the heart during heart failure has also been observed. Furthermore, regulatory elements of mammalian phospholamban genes remain poorly defined. To extend these studies to humans, we (1) characterized phospholamban expression in various human organs, (2) isolated genomic clones encoding the human phospholamban gene, and (3) prepared human phospholamban promoter/luciferase reporter constructs and performed transient transfection assays to begin identification of regulatory elements. We observed that human ventricle and quadriceps displayed high levels of phospholamban transcripts and proteins, with markedly lower expression observed in smooth muscles, while the right atria also expressed low levels of phospholamban. The human phospholamban gene structure closely resembles that reported for chicken, rabbit, rat, and mouse. Comparison of the human to other mammalian phospholamban genes indicates a marked conservation of sequence for at least 217 bp upstream of the transcription start site, which contains conserved motifs for GATA, CP1/NFY, M-CAT-like, and E-box elements. Transient transfection assays with a series of plasmids containing deleted 5' flanking regions (between -2530 and -66 through +85) showed that sequences between -169 and the CP1-box at -93 were required for maximal promoter activity in neonatal rat cardiomyocytes. Activity of these reporters in HeLa cells was markedly lower than that observed in rat cardiomyocytes, suggesting at least a partial tissue selectivity of these reporter constructs.

  11. The Replacement of 10 Non-Conserved Residues in the Core Protein of JFH-1 Hepatitis C Virus Improves Its Assembly and Secretion

    PubMed Central

    Etienne, Loïc; Blanchard, Emmanuelle; Boyer, Audrey; Desvignes, Virginie; Gaillard, Julien; Meunier, Jean-Christophe; Roingeard, Philippe; Hourioux, Christophe

    2015-01-01

    Hepatitis C virus (HCV) assembly is still poorly understood. It is thought that trafficking of the HCV core protein to the lipid droplet (LD) surface is essential for its multimerization and association with newly synthesized HCV RNA to form the viral nucleocapsid. We carried out a mapping analysis of several complete HCV genomes of all genotypes, and found that the genotype 2 JFH-1 core protein contained 10 residues different from those of other genotypes. The replacement of these 10 residues of the JFH-1 strain sequence with the most conserved residues deduced from sequence alignments greatly increased virus production. Confocal microscopy of the modified JFH-1 strain in cell culture showed that the mutated JFH-1 core protein, C10M, was present mostly at the endoplasmic reticulum (ER) membrane, but not at the surface of the LDs, even though its trafficking to these organelles was possible. The non-structural 5A protein of HCV was also redirected to ER membranes and colocalized with the C10M core protein. Using a Semliki forest virus vector to overproduce core protein, we demonstrated that the C10M core protein was able to form HCV-like particles, unlike the native JFH-1 core protein. Thus, the substitution of a few selected residues in the JFH-1 core protein modified the subcellular distribution and assembly properties of the protein. These findings suggest that the early steps of HCV assembly occur at the ER membrane rather than at the LD surface. The C10M-JFH-1 strain will be a valuable tool for further studies of HCV morphogenesis. PMID:26339783

  12. The Silkworm (Bombyx mori) microRNAs and Their Expressions in Multiple Developmental Stages

    PubMed Central

    Luo, Qibin; Cai, Yimei; Lin, Wen-chang; Chen, Huan; Yang, Yue; Hu, Songnian; Yu, Jun

    2008-01-01

    Background MicroRNAs (miRNAs) play crucial roles in various physiological processes through post-transcriptional regulation of gene expressions and are involved in development, metabolism, and many other important molecular mechanisms and cellular processes. The Bombyx mori genome sequence provides opportunities for a thorough survey for miRNAs as well as comparative analyses with other sequenced insect species. Methodology/Principal Findings We identified 114 non-redundant conserved miRNAs and 148 novel putative miRNAs from the B. mori genome with an elaborate computational protocol. We also sequenced 6,720 clones from 14 developmental stage-specific small RNA libraries in which we identified 35 unique miRNAs containing 21 conserved miRNAs (including 17 predicted miRNAs) and 14 novel miRNAs (including 11 predicted novel miRNAs). Among the 114 conserved miRNAs, we found six pairs of clusters evolutionarily conserved cross insect lineages. Our observations on length heterogeneity at 5′ and/or 3′ ends of nine miRNAs between cloned and predicted sequences, and three mature forms deriving from the same arm of putative pre-miRNAs suggest a mechanism by which miRNAs gain new functions. Analyzing development-related miRNAs expression at 14 developmental stages based on clone-sampling and stem-loop RT PCR, we discovered an unusual abundance of 33 sequences representing 12 different miRNAs and sharply fluctuated expression of miRNAs at larva-molting stage. The potential functions of several stage-biased miRNAs were also analyzed in combination with predicted target genes and silkworm's phenotypic traits; our results indicated that miRNAs may play key regulatory roles in specific developmental stages in the silkworm, such as ecdysis. Conclusions/Significance Taking a combined approach, we identified 118 conserved miRNAs and 151 novel miRNA candidates from the B. mori genome sequence. Our expression analyses by sampling miRNAs and real-time PCR over multiple developmental stages allowed us to pinpoint molting stages as hotspots of miRNA expression both in sorts and quantities. Based on the analysis of target genes, we hypothesized that miRNAs regulate development through a particular emphasis on complex stages rather than general regulatory mechanisms. PMID:18714353

  13. A Presumptive Developmental Role for a Sea Urchin Cyclin B Splice Variant

    PubMed Central

    Lozano, Jean-Claude; Schatt, Philippe; Marquès, François; Peaucellier, Gérard; Fort, Philippe; Féral, Jean-Pierre; Genevière, Anne-Marie; Picard, André

    1998-01-01

    We show that a splice variant–derived cyclin B is produced in sea urchin oocytes and embryos. This splice variant protein lacks highly conserved sequences in the COOH terminus of the protein. It is found strikingly abundant in growing oocytes and cells committed to differentiation during embryogenesis. Cyclin B splice variant (CBsv) protein associates weakly in the cell with Xenopus cdc2 and with budding yeast CDC28p. In contrast to classical cyclin B, CBsv very poorly complements a triple CLN deletion in budding yeast, and its microinjection prevents an initial step in MPF activation, leading to an important delay in oocyte meiosis reinitiation. CBsv microinjection in fertilized eggs induces cell cycle delay and abnormal development. We assume that CBsv is produced in growing oocytes to keep them in prophase, and during embryogenesis to slow down cell cycle in cells that will be committed to differentiation. PMID:9442104

  14. A genomic lifespan program that reorganises the young adult brain is targeted in schizophrenia.

    PubMed

    Skene, Nathan G; Roy, Marcia; Grant, Seth Gn

    2017-09-12

    The genetic mechanisms regulating the brain and behaviour across the lifespan are poorly understood. We found that lifespan transcriptome trajectories describe a calendar of gene regulatory events in the brain of humans and mice. Transcriptome trajectories defined a sequence of gene expression changes in neuronal, glial and endothelial cell-types, which enabled prediction of age from tissue samples. A major lifespan landmark was the peak change in trajectories occurring in humans at 26 years and in mice at 5 months of age. This species-conserved peak was delayed in females and marked a reorganization of expression of synaptic and schizophrenia-susceptibility genes. The lifespan calendar predicted the characteristic age of onset in young adults and sex differences in schizophrenia. We propose a genomic program generates a lifespan calendar of gene regulation that times age-dependent molecular organization of the brain and mutations that interrupt the program in young adults cause schizophrenia.

  15. Assessing the Functional Role of Leptin in Energy Homeostasis and the Stress Response in Vertebrates

    PubMed Central

    Deck, Courtney A.; Honeycutt, Jamie L.; Cheung, Eugene; Reynolds, Hannah M.; Borski, Russell J.

    2017-01-01

    Leptin is a pleiotropic hormone that plays a critical role in regulating appetite, energy metabolism, growth, stress, and immune function across vertebrate groups. In mammals, it has been classically described as an adipostat, relaying information regarding energy status to the brain. While retaining poor sequence conservation with mammalian leptins, teleostean leptins elicit a number of similar regulatory properties, although current evidence suggests that it does not function as an adipostat in this group of vertebrates. Teleostean leptin also exhibits functionally divergent properties, however, possibly playing a role in glucoregulation similar to what is observed in lizards. Further, leptin has been recently implicated as a mediator of immune function and the endocrine stress response in teleosts. Here, we provide a review of leptin physiology in vertebrates, with a particular focus on its actions and regulatory properties in the context of stress and the regulation of energy homeostasis. PMID:28439255

  16. 24-Hour colonic manometry in pediatric slow transit constipation shows significant reductions in antegrade propagation.

    PubMed

    King, Sebastian K; Catto-Smith, Anthony G; Stanton, Michael P; Sutcliffe, Jonathan R; Simpson, Dianne; Cook, Ian; Dinning, Phil; Hutson, John M; Southwell, Bridget R

    2008-08-01

    The physiological basis of slow transit constipation (STC) in children remains poorly understood. We wished to examine pan-colonic motility in a group of children with severe chronic constipation refractory to conservative therapy. We performed 24 h pan-colonic manometry in 18 children (13 boys, 11.6 +/- 0.9 yr, range 6.6-18.7 yr) with scintigraphically proven STC. A water-perfused, balloon tipped, 8-channel, silicone catheter with a 7.5 cm intersidehole distance was introduced through a previously formed appendicostomy. Comparison data were obtained from nasocolonic motility studies in 16 healthy young adult controls and per-appendicostomy motility studies in eight constipated children with anorectal retention and/or normal transit on scintigraphy (non-STC). Antegrade propagating sequences (PS) were significantly less frequent (P < 0.01) in subjects with STC (29 +/- 4 per 24 h) compared to adult (53 +/- 4 per 24 h) and non-STC (70 +/- 14 per 24 h) subjects. High amplitude propagating sequences (HAPS) were of a normal frequency in STC subjects. Retrograde propagating sequences were significantly more frequent (P < 0.05) in non-STC subjects compared to STC and adult subjects. High amplitude retrograde propagating sequences were only identified in the STC and non-STC pediatric groups. The normal increase in motility index associated with waking and ingestion of a meal was absent in STC subjects. Prolonged pancolonic manometry in children with STC showed significant impairment in antegrade propagating motor activity and failure to respond to normal physiological stimuli. Despite this, HAPS occurred with normal frequency. These findings suggest significant clinical differences between STC in children and adults.

  17. In Silico Comparative Transcriptome Analysis of Two Color Morphs of the Common Coral Trout (Plectropomus Leopardus)

    PubMed Central

    Wang, Le; Yu, Cuiping; Guo, Liang; Lin, Haoran; Meng, Zining

    2015-01-01

    The common coral trout is one species of major importance in commercial fisheries and aquaculture. Recently, two different color morphs of Plectropomus leopardus were discovered and the biological importance of the color difference is unknown. Since coral trout species are poorly characterized at the molecular level, we undertook the transcriptomic characterization of the two color morphs, one black and one red coral trout, using Illumina next generation sequencing technologies. The study produced 55162966 and 54588952 paired-end reads, for black and red trout, respectively. De novo transcriptome assembly generated 95367 and 99424 unique sequences in black and red trout, respectively, with 88813 sequences shared between them. Approximately 50% of both trancriptomes were functionally annotated by BLAST searches against protein databases. The two trancriptomes were enriched into 25 functional categories and showed similar profiles of Gene Ontology category compositions. 34110 unigenes were grouped into 259 KEGG pathways. Moreover, we identified 14649 simple sequence repeats (SSRs) and designed primers for potential application. We also discovered 130524 putative single nucleotide polymorphisms (SNPs) in the two transcriptomes, supplying potential genomic resources for the coral trout species. In addition, we identified 936 fast-evolving genes and 165 candidate genes under positive selection between the two color morphs. Finally, 38 candidate genes underlying the mechanism of color and pigmentation were also isolated. This study presents the first transcriptome resources for the common coral trout and provides basic information for the development of genomic tools for the identification, conservation, and understanding of the speciation and local adaptation of coral reef fish species. PMID:26713756

  18. Incentives for energy conservation

    NASA Astrophysics Data System (ADS)

    Ways to make homes and workplaces more energy efficient were discussed. Due to rise in the average price of world crude oil, the need for across the board energy savings was stressed. Tax credit for conservation measures and solar installations was considered to favor the affluent and to discriminate against the poor. New York's program to achieve energy conservation was described. Conservation is considered to be a cheaper way of meeting the energy needs than building new generating plants or producing synthetic gas from coal.

  19. Lariat sequencing in a unicellular yeast identifies regulated alternative splicing of exons that are evolutionarily conserved with humans.

    PubMed

    Awan, Ali R; Manfredo, Amanda; Pleiss, Jeffrey A

    2013-07-30

    Alternative splicing is a potent regulator of gene expression that vastly increases proteomic diversity in multicellular eukaryotes and is associated with organismal complexity. Although alternative splicing is widespread in vertebrates, little is known about the evolutionary origins of this process, in part because of the absence of phylogenetically conserved events that cross major eukaryotic clades. Here we describe a lariat-sequencing approach, which offers high sensitivity for detecting splicing events, and its application to the unicellular fungus, Schizosaccharomyces pombe, an organism that shares many of the hallmarks of alternative splicing in mammalian systems but for which no previous examples of exon-skipping had been demonstrated. Over 200 previously unannotated splicing events were identified, including examples of regulated alternative splicing. Remarkably, an evolutionary analysis of four of the exons identified here as subject to skipping in S. pombe reveals high sequence conservation and perfect length conservation with their homologs in scores of plants, animals, and fungi. Moreover, alternative splicing of two of these exons have been documented in multiple vertebrate organisms, making these the first demonstrations of identical alternative-splicing patterns in species that are separated by over 1 billion y of evolution.

  20. Evolution of the cytoskeleton

    PubMed Central

    Erickson, Harold P.

    2009-01-01

    Summary The eukaryotic cytoskeleton appears to have evolved from ancestral precursors related to prokaryotic FtsZ and MreB. FtsZ and MreB show 40−50% sequence identity across different bacterial and archaeal species. Here I suggest that this represents the limit of divergence that is consistent with maintaining their functions for cytokinesis and cell shape. Previous analyses have noted that tubulin and actin are highly conserved across eukaryotic species, but so divergent from their prokaryotic relatives as to be hardly recognizable from sequence comparisons. One suggestion for this extreme divergence of tubulin and actin is that it occurred as they evolved very different functions from FtsZ and MreB. I will present new arguments favoring this suggestion, and speculate on pathways. Moreover, the extreme conservation of tubulin and actin across eukaryotic species is not due to an intrinsic lack of variability, but is attributed to their acquisition of elaborate mechanisms for assembly dynamics and their interactions with multiple motor and binding proteins. A new structure-based sequence alignment identifies amino acids that are conserved from FtsZ to tubulins. The highly conserved amino acids are not those forming the subunit core or protofilament interface, but those involved in binding and hydrolysis of GTP. PMID:17563102

  1. Governance and sustainability at a municipal scale: the challenge of water conservation.

    PubMed

    Furlong, Kathryn; Bakker, Karen

    2011-01-01

    Municipal water conservation is increasingly promoted as a key dimension of environmental sustainability at the municipal scale. Progress toward municipal water conservation in Canada has, however, been poor. This paper examines the governance dimension of water conservation, and presents evidence in support of the argument that conservation efforts on the part of water utilities (and sometimes municipalities) are often constrained by factors external to their jurisdiction. To explore these issues, this paper presents a case study of municipal water conservation in Canada. The analysis identifies governance-related barriers to water conservation and explores the relationship between these barriers and broader issues stemming from the multi-scalar, fragmented nature of environmental governance in Canada.

  2. P53 Gene Mutagenesis in Breast Cancer

    DTIC Science & Technology

    2005-03-01

    the wild type T peak. 12 Table 1. Sonic ntations dected by SINtA Individual Cell Sequence Amino Acid Species Conservation 3 ID’ ID Change2 Change... differences in the content of toxic substances in the diet (Biggs et al., 1993; Blaszyk et al., 1996). The development of this p53 mutation load...Changes in the P53 Gene in Single Cells Individual Sequence Amino acid Species conservation ’ ID’ Cell ID change’ change Monkey Mouse Rat Chicken

  3. Analysis of SSR information in EST resources of sugarcane

    USDA-ARS?s Scientific Manuscript database

    Expressed sequence tags ( ESTs) offer the opportunity to exploit single, low -copy, conserved sequence motifs for the development of simple sequence repeats ( SSRs). The total of 262 113 ESTs of sugarcane (Saccharum officinarum) in the database of NCBI were downloaded and analyzed, which resulted in...

  4. FA-SAT Is an Old Satellite DNA Frozen in Several Bilateria Genomes

    PubMed Central

    Chaves, Raquel; Ferreira, Daniela; Mendes-da-Silva, Ana; Meles, Susana; Adega, Filomena

    2017-01-01

    Abstract In recent years, a growing body of evidence has recognized the tandem repeat sequences, and specifically satellite DNA, as a functional class of sequences in the genomic “dark matter.” Using an original, complementary, and thus an eclectic experimental design, we show that the cat archetypal satellite DNA sequence, FA-SAT, is “frozen” conservatively in several Bilateria genomes. We found different genomic FA-SAT architectures, and the interspersion pattern was conserved. In Carnivora genomes, the FA-SAT-related sequences are also amplified, with the predominance of a specific FA-SAT variant, at the heterochromatic regions. We inspected the cat genome project to locate FA-SAT array flanking regions and revealed an intensive intermingling with transposable elements. Our results also show that FA-SAT-related sequences are transcribed and that the most abundant FA-SAT variant is not always the most transcribed. We thus conclude that the DNA sequences of FA-SAT and their transcripts are “frozen” in these genomes. Future work is needed to disclose any putative function that these sequences may play in these genomes. PMID:29608678

  5. Mutations in the newly identified RAX regulatory sequence are not a frequent cause of micro/anophthalmia.

    PubMed

    Chassaing, Nicolas; Vigouroux, Adeline; Calvas, Patrick

    2009-06-01

    Microphthalmia and anophthalmia are at the severe end of the spectrum of abnormalities in ocular development. A few genes (SOX2, OTX2, RAX, and CHX10) have been implicated in isolated micro/anophthalmia, but causative mutations of these genes explain less than a quarter of these developmental defects. A specifically conserved SOX2/OTX2-mediated RAX expression regulatory sequence has recently been identified. We postulated that mutations in this sequence could lead to micro/anophthalmia, and thus we performed molecular screening of this regulatory element in patients suffering from micro/anophthalmia. Fifty-one patients suffering from nonsyndromic microphthalmia (n = 40) or anophthalmia (n = 11) were included in this study after negative molecular screening for SOX2, OTX2, RAX, and CHX10 mutations. Mutation screening of the RAX regulatory sequence was performed by direct sequencing for these patients. No mutations were identified in the highly conserved RAX regulatory sequence in any of the 51 patients. Mutations in the newly identified RAX regulatory sequence do not represent a frequent cause of nonsyndromic micro/anophthalmia.

  6. Highly conserved D-loop-like nuclear mitochondrial sequences (Numts) in tiger (Panthera tigris).

    PubMed

    Zhang, Wenping; Zhang, Zhihe; Shen, Fujun; Hou, Rong; Lv, Xiaoping; Yue, Bisong

    2006-08-01

    Using oligonucleotide primers designed to match hypervariable segments I (HVS-1) of Panthera tigris mitochondrial DNA (mtDNA), we amplified two different PCR products (500 bp and 287 bp) in the tiger (Panthera tigris), but got only one PCR product (287 bp) in the leopard (Panthera pardus). Sequence analyses indicated that the sequence of 287 bp was a D-loop-like nuclear mitochondrial sequence (Numts), indicating a nuclear transfer that occurred approximately 4.8-17 million years ago in the tiger and 4.6-16 million years ago in the leopard. Although the mtDNA D-loop sequence has a rapid rate of evolution, the 287-bp Numts are highly conserved; they are nearly identical in tiger subspecies and only 1.742% different between tiger and leopard. Thus, such sequences represent molecular 'fossils' that can shed light on evolution of the mitochondrial genome and may be the most appropriate outgroup for phylogenetic analysis. This is also proved by comparing the phylogenetic trees reconstructed using the D-loop sequence of snow leopard and the 287-bp Numts as outgroup.

  7. A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.

    PubMed

    Mehmood, Tahir; Bohlin, Jon; Snipen, Lars

    2015-01-01

    The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.

  8. Parallel tagged next-generation sequencing on pooled samples - a new approach for population genetics in ecology and conservation.

    PubMed

    Zavodna, Monika; Grueber, Catherine E; Gemmell, Neil J

    2013-01-01

    Next-generation sequencing (NGS) on pooled samples has already been broadly applied in human medical diagnostics and plant and animal breeding. However, thus far it has been only sparingly employed in ecology and conservation, where it may serve as a useful diagnostic tool for rapid assessment of species genetic diversity and structure at the population level. Here we undertake a comprehensive evaluation of the accuracy, practicality and limitations of parallel tagged amplicon NGS on pooled population samples for estimating species population diversity and structure. We obtained 16S and Cyt b data from 20 populations of Leiopelma hochstetteri, a frog species of conservation concern in New Zealand, using two approaches - parallel tagged NGS on pooled population samples and individual Sanger sequenced samples. Data from each approach were then used to estimate two standard population genetic parameters, nucleotide diversity (π) and population differentiation (FST), that enable population genetic inference in a species conservation context. We found a positive correlation between our two approaches for population genetic estimates, showing that the pooled population NGS approach is a reliable, rapid and appropriate method for population genetic inference in an ecological and conservation context. Our experimental design also allowed us to identify both the strengths and weaknesses of the pooled population NGS approach and outline some guidelines and suggestions that might be considered when planning future projects.

  9. Conservation for Children [Levels 1-6 and an All-Levels Supplement].

    ERIC Educational Resources Information Center

    Cupertino Union School District, CA.

    Developed to promote conservation awareness in elementary students, each of the six grade-level-sequenced activity guides provides: (1) a list of conservation concepts; (2) a criterion-referenced test; (3) a class record sheet; (4) a content guide; and (5) 90 student worksheets (40 for language arts, 20 for mathematics, 20 for social studies and…

  10. Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications

    PubMed Central

    Cvicek, Vaclav; Goddard, William A.; Abrol, Ravinder

    2016-01-01

    The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs. These residues can be used to make testable hypotheses about the structural basis of receptor function and about the molecular basis of disease-associated single nucleotide polymorphisms. PMID:27028541

  11. Comparative transgenic analysis of enhancers from the human SHOX and mouse Shox2 genomic regions.

    PubMed

    Rosin, Jessica M; Abassah-Oppong, Samuel; Cobb, John

    2013-08-01

    Disruption of presumptive enhancers downstream of the human SHOX gene (hSHOX) is a frequent cause of the zeugopodal limb defects characteristic of Léri-Weill dyschondrosteosis (LWD). The closely related mouse Shox2 gene (mShox2) is also required for limb development, but in the more proximal stylopodium. In this study, we used transgenic mice in a comparative approach to characterize enhancer sequences in the hSHOX and mShox2 genomic regions. Among conserved noncoding elements (CNEs) that function as enhancers in vertebrate genomes, those that are maintained near paralogous genes are of particular interest given their ancient origins. Therefore, we first analyzed the regulatory potential of a genomic region containing one such duplicated CNE (dCNE) downstream of mShox2 and hSHOX. We identified a strong limb enhancer directly adjacent to the mShox2 dCNE that recapitulates the expression pattern of the endogenous gene. Interestingly, this enhancer requires sequences only conserved in the mammalian lineage in order to drive strong limb expression, whereas the more deeply conserved sequences of the dCNE function as a neural enhancer. Similarly, we found that a conserved element downstream of hSHOX (CNE9) also functions as a neural enhancer in transgenic mice. However, when the CNE9 transgenic construct was enlarged to include adjacent, non-conserved sequences frequently deleted in LWD patients, the transgene drove expression in the zeugopodium of the limbs. Therefore, both hSHOX and mShox2 limb enhancers are coupled to distinct neural enhancers. This is the first report demonstrating the activity of cis-regulatory elements from the hSHOX and mShox2 genomic regions in mammalian embryos.

  12. Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis.

    PubMed

    Du, Yushen; Wu, Nicholas C; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting; Sun, Ren

    2016-11-01

    Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. To fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is usually limited by sampling size. Sequence conservation-based methods are further confounded by structural constraints and multifunctionality of proteins. Here we present a method that can systematically identify and annotate functional residues of a given protein. We used a high-throughput functional profiling platform to identify essential residues. Coupling it with homologous-structure comparison, we were able to annotate multiple functions of proteins. We demonstrated the method with the PB1 protein of influenza A virus and identified novel functional residues in addition to its canonical function as an RNA-dependent RNA polymerase. Not limited to virology, this method is generally applicable to other proteins that can be functionally selected and about which homologous-structure information is available. Copyright © 2016 Du et al.

  13. Kinetoplast DNA minicircles of phloem-restricted Phytomonas associated with wilt diseases of coconut and oil palms have a two-domain structure.

    PubMed

    Dollet, M; Sturm, N R; Ahomadegbe, J C; Campbell, D A

    2001-11-27

    We report the cloning and sequencing of the first minicircle from a phloem-restricted, pathogenic Phytomonas sp. (Hart 1) isolated from a coconut palm with hartrot disease. The minicircle possessed a two-domain structure of two conserved regions, each containing three conserved sequence blocks (CSB). Based on the sequence around CSB 3 from Hart 1, PCR primers were designed to allow specific amplification of Phytomonas minicircles. This primer pair demonstrated specificity for at least six groups of plant trypanosomatids and did not amplify from insect trypanosomatids. The PCR results were consistent with a two-domain structure for other plant trypanosomatids.

  14. Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs.

    PubMed

    Powell, Bradford C; Hutchison, Clyde A

    2006-01-19

    Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene prediction. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes.

  15. Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs

    PubMed Central

    Powell, Bradford C; Hutchison, Clyde A

    2006-01-01

    Background Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. Results "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene predicion. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Conclusion Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes. PMID:16423288

  16. Mutational analysis of the human nucleotide excision repair gene ERCC1.

    PubMed Central

    Sijbers, A M; van der Spek, P J; Odijk, H; van den Berg, J; van Duin, M; Westerveld, A; Jaspers, N G; Bootsma, D; Hoeijmakers, J H

    1996-01-01

    The human DNA repair protein ERCC1 resides in a complex together with the ERCC4, ERCC11 and XP-F correcting activities, thought to perform the 5' strand incision during nucleotide excision repair (NER). Its yeast counterpart, RAD1-RAD10, has an additional engagement in a mitotic recombination pathway, probably required for repair of DNA cross-links. Mutational analysis revealed that the poorly conserved N-terminal 91 amino acids of ERCC1 are dispensable for both repair functions, in contrast to a deletion of only four residues from the C-terminus. A database search revealed a strongly conserved motif in this C-terminus sharing sequence homology with many DNA break processing proteins, indicating that this part is primarily required for the presumed structure-specific endonuclease activity of ERCC1. Most missense mutations in the central region give rise to an unstable protein (complex). Accordingly, we found that free ERCC1 is very rapidly degraded, suggesting that protein-protein interactions provide stability. Survival experiments show that the removal of cross-links requires less ERCC1 than UV repair. This suggests that the ERCC1-dependent step in cross-link repair occurs outside the context of NER and provides an explanation for the phenotype of the human repair syndrome xeroderma pigmentosum group F. PMID:8811092

  17. Peptide Synthetase Gene in Trichoderma virens

    PubMed Central

    Wilhite, S. E.; Lumsden, R. D.; Straney, D. C.

    2001-01-01

    Trichoderma virens (synonym, Gliocladium virens), a deuteromycete fungus, suppresses soilborne plant diseases caused by a number of fungi and is used as a biocontrol agent. Several traits that may contribute to the antagonistic interactions of T. virens with disease-causing fungi involve the production of peptide metabolites (e.g., the antibiotic gliotoxin and siderophores used for iron acquisition). We cloned a 5,056-bp partial cDNA encoding a putative peptide synthetase (Psy1) from T. virens using conserved motifs found within the adenylate domain of peptide synthetases. Sequence similarities with conserved motifs of the adenylation domain, acyl transfer, and two condensation domains support identification of the Psy1 gene as a gene that encodes a peptide synthetase. Disruption of the native Psy1 gene through gene replacement was used to identify the function of this gene. Psy1 disruptants produced normal amounts of gliotoxin but grew poorly under low-iron conditions, suggesting that Psy1 plays a role in siderophore production. Psy1 disruptants cannot produce the major T. virens siderophore dimerum acid, a dipetide of acylated Nδ-hydroxyornithine. Biocontrol activity against damping-off diseases caused by Pythium ultimum and Rhizoctonia solani was not reduced by the Psy1 disruption, suggesting that iron competition through dimerum acid production does not contribute significantly to disease suppression activity under the conditions used. PMID:11679326

  18. Structural study of Bombyx mori silk fibroin during processing for regeneration

    NASA Astrophysics Data System (ADS)

    Ha, Sung-Won

    Bombyx mori silk fibroin has excellent mechanical properties combined with flexibility, tissue compatibility, and high oxygen permeability in the wet condition. This important material should be dissolved and regenerated to be utilized as useful forms such as gel, film, fiber, powder, or non-woven. However, it has long been a problem that the regenerated fibroin materials show poor mechanical properties and brittleness. These problems were technically solved by improving a fiber processing method reported here. The regenerated fibroin fibers showed much better mechanical properties compared to the original silk fibers. This improved technique for the fiber processing of Bombyx mori silk fibroin may be used as a model system for other semi-crystalline fiber forming proteins, becoming available through biotechnology. The physical and chemical properties of the regenerated fibers were characterized by SinTechRTM tensile testing, X-ray diffraction, solid state 13C NMR spectroscopy, and SEM. Unlike synthetic polymers, the molecular weight distribution of Bombyx mori silk fibroin is mono-disperse because silk fibroin is synthesized from DNA template. Genetic studies have revealed the entire amino acid sequence of Bombyx mori silk fibroin. It is known that the crystalline silk II structure is composed of hexa-amino acid sequences, GAGAGS. However, in the amino acid sequence of Bombyx mori silk fibroin heavy chain, there are present 11 chemically irregular but evolutionarily conserved sequences with about 31 amino acid residues (irregular GT˜GT sequences). The structure and role of these irregular sequences have remained unknown. One of the most frequently appearing irregular sequences was synthesized by a peptide synthesizer. The three-dimensional structure of this irregular silk peptide was studied by the high resolution two-dimensional NMR technique. The three-dimensional structure of this peptide shows that it makes a turn or loop structure (distorted O shape), which means the proceeding backbone direction is changed 180° by this sequence. This may facilitate the beta-sheet formation of the crystal forming building blocks, GAGAGS/GY˜GY sequences, in fibroin heavy chain. It may also facilitate the solubilization of the fibroin heavy chain within the silk gland.

  19. Genomewide analysis of Drosophila circular RNAs reveals their structural and sequence properties and age-dependent neural accumulation

    PubMed Central

    Westholm, Jakub O.; Miura, Pedro; Olson, Sara; Shenker, Sol; Joseph, Brian; Sanfilippo, Piero; Celniker, Susan E.; Graveley, Brenton R.; Lai, Eric C.

    2014-01-01

    Circularization was recently recognized to broadly expand transcriptome complexity. Here, we exploit massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues and cultured cells, to rigorously annotate >2500 fruitfly circular RNAs. These mostly derive from back-splicing of protein-coding genes and lack poly(A) tails, and circularization of hundreds of genes is conserved across multiple Drosophila species. We elucidate structural and sequence properties of Drosophila circular RNAs, which exhibit commonalities and distinctions from mammalian circles. Notably, Drosophila circular RNAs harbor >1000 well-conserved canonical miRNA seed matches, especially within coding regions, and coding conserved miRNA sites reside preferentially within circularized exons. Finally, we analyze the developmental and tissue specificity of circular RNAs, and note their preferred derivation from neural genes and enhanced accumulation in neural tissues. Interestingly, circular isoforms increase dramatically relative to linear isoforms during CNS aging, and constitute a novel aging biomarker. PMID:25544350

  20. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons.

    PubMed

    Braasch, Ingo; Gehrke, Andrew R; Smith, Jeramiah J; Kawasaki, Kazuhiko; Manousaki, Tereza; Pasquier, Jeremy; Amores, Angel; Desvignes, Thomas; Batzel, Peter; Catchen, Julian; Berlin, Aaron M; Campbell, Michael S; Barrell, Daniel; Martin, Kyle J; Mulley, John F; Ravi, Vydianathan; Lee, Alison P; Nakamura, Tetsuya; Chalopin, Domitille; Fan, Shaohua; Wcisel, Dustin; Cañestro, Cristian; Sydes, Jason; Beaudry, Felix E G; Sun, Yi; Hertel, Jana; Beam, Michael J; Fasold, Mario; Ishiyama, Mikio; Johnson, Jeremy; Kehr, Steffi; Lara, Marcia; Letaw, John H; Litman, Gary W; Litman, Ronda T; Mikami, Masato; Ota, Tatsuya; Saha, Nil Ratan; Williams, Louise; Stadler, Peter F; Wang, Han; Taylor, John S; Fontenot, Quenton; Ferrara, Allyse; Searle, Stephen M J; Aken, Bronwen; Yandell, Mark; Schneider, Igor; Yoder, Jeffrey A; Volff, Jean-Nicolas; Meyer, Axel; Amemiya, Chris T; Venkatesh, Byrappa; Holland, Peter W H; Guiguen, Yann; Bobe, Julien; Shubin, Neil H; Di Palma, Federica; Alföldi, Jessica; Lindblad-Toh, Kerstin; Postlethwait, John H

    2016-04-01

    To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before teleost genome duplication (TGD). The slowly evolving gar genome has conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization and development (mediated, for example, by Hox, ParaHox and microRNA genes). Numerous conserved noncoding elements (CNEs; often cis regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles for such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses showed that the sums of expression domains and expression levels for duplicated teleost genes often approximate the patterns and levels of expression for gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes and the function of human regulatory sequences.

  1. Genome-wide Analysis of Drosophila Circular RNAs Reveals Their Structural and Sequence Properties and Age-Dependent Neural Accumulation

    DOE PAGES

    Westholm, Jakub  O.; Miura, Pedro; Olson, Sara; ...

    2014-11-26

    Circularization was recently recognized to broadly expand transcriptome complexity. Here, we exploit massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues, and cultured cells, to rigorously annotate >2,500 fruit fly circular RNAs. These mostly derive from back-splicing of protein-coding genes and lack poly(A) tails, and the circularization of hundreds of genes is conserved across multiple Drosophila species. We elucidate structural and sequence properties of Drosophila circular RNAs, which exhibit commonalities and distinctions from mammalian circles. Notably, Drosophila circular RNAs harbor >1,000 well-conserved canonical miRNA seed matches, especially within coding regions, and codingmore » conserved miRNA sites reside preferentially within circularized exons. Finally, we analyze the developmental and tissue specificity of circular RNAs and note their preferred derivation from neural genes and enhanced accumulation in neural tissues. Interestingly, circular isoforms increase substantially relative to linear isoforms during CNS aging and constitute an aging biomarker.« less

  2. Genome-wide Analysis of Drosophila Circular RNAs Reveals Their Structural and Sequence Properties and Age-Dependent Neural Accumulation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Westholm, Jakub  O.; Miura, Pedro; Olson, Sara

    Circularization was recently recognized to broadly expand transcriptome complexity. Here, we exploit massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues, and cultured cells, to rigorously annotate >2,500 fruit fly circular RNAs. These mostly derive from back-splicing of protein-coding genes and lack poly(A) tails, and the circularization of hundreds of genes is conserved across multiple Drosophila species. We elucidate structural and sequence properties of Drosophila circular RNAs, which exhibit commonalities and distinctions from mammalian circles. Notably, Drosophila circular RNAs harbor >1,000 well-conserved canonical miRNA seed matches, especially within coding regions, and codingmore » conserved miRNA sites reside preferentially within circularized exons. Finally, we analyze the developmental and tissue specificity of circular RNAs and note their preferred derivation from neural genes and enhanced accumulation in neural tissues. Interestingly, circular isoforms increase substantially relative to linear isoforms during CNS aging and constitute an aging biomarker.« less

  3. Hairpin structures with conserved sequence motifs determine the 3' ends of non-polyadenylated invertebrate iridovirus transcripts.

    PubMed

    İnce, İkbal Agah; Pijlman, Gorben P; Vlak, Just M; van Oers, Monique M

    2017-11-01

    Previously, we observed that the transcripts of Invertebrate iridescent virus 6 (IIV6) are not polyadenylated, in line with the absence of canonical poly(A) motifs (AATAAA) downstream of the open reading frames (ORFs) in the genome. Here, we determined the 3' ends of the transcripts of fifty-four IIV6 virion protein genes in infected Drosophila Schneider 2 (S2) cells. By using ligation-based amplification of cDNA ends (LACE) it was shown that the IIV6 mRNAs often ended with a CAUUA motif. In silico analysis showed that the 3'-untranslated regions of IIV6 genes have the ability to form hairpin structures (22-56 nt in length) and that for about half of all IIV6 genes these 3' sequences contained complementary TAATG and CATTA motifs. We also show that a hairpin in the 3' flanking region with conserved sequence motifs is a conserved feature in invertebrate-infecting iridoviruses (genus Iridovirus and Chloriridovirus). Copyright © 2017 Elsevier Inc. All rights reserved.

  4. The spotted gar genome illuminates vertebrate evolution and facilitates human-to-teleost comparisons

    PubMed Central

    Braasch, Ingo; Gehrke, Andrew R.; Smith, Jeramiah J.; Kawasaki, Kazuhiko; Manousaki, Tereza; Pasquier, Jeremy; Amores, Angel; Desvignes, Thomas; Batzel, Peter; Catchen, Julian; Berlin, Aaron M.; Campbell, Michael S.; Barrell, Daniel; Martin, Kyle J.; Mulley, John F.; Ravi, Vydianathan; Lee, Alison P.; Nakamura, Tetsuya; Chalopin, Domitille; Fan, Shaohua; Wcisel, Dustin; Cañestro, Cristian; Sydes, Jason; Beaudry, Felix E. G.; Sun, Yi; Hertel, Jana; Beam, Michael J.; Fasold, Mario; Ishiyama, Mikio; Johnson, Jeremy; Kehr, Steffi; Lara, Marcia; Letaw, John H.; Litman, Gary W.; Litman, Ronda T.; Mikami, Masato; Ota, Tatsuya; Saha, Nil Ratan; Williams, Louise; Stadler, Peter F.; Wang, Han; Taylor, John S.; Fontenot, Quenton; Ferrara, Allyse; Searle, Stephen M. J.; Aken, Bronwen; Yandell, Mark; Schneider, Igor; Yoder, Jeffrey A.; Volff, Jean-Nicolas; Meyer, Axel; Amemiya, Chris T.; Venkatesh, Byrappa; Holland, Peter W. H.; Guiguen, Yann; Bobe, Julien; Shubin, Neil H.; Di Palma, Federica; Alföldi, Jessica; Lindblad-Toh, Kerstin; Postlethwait, John H.

    2016-01-01

    To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before the teleost genome duplication (TGD). The slowly evolving gar genome conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization, and development (e.g., Hox, ParaHox, and miRNA genes). Numerous conserved non-coding elements (CNEs, often cis-regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles of such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses revealed that the sum of expression domains and levels from duplicated teleost genes often approximate patterns and levels of gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes, and the function of human regulatory sequences. PMID:26950095

  5. Determinism and randomness in the evolution of introns and sine inserts in mouse and human mitochondrial solute carrier and cytokine receptor genes.

    PubMed

    Cianciulli, Antonia; Calvello, Rosa; Panaro, Maria A

    2015-04-01

    In the homologous genes studied, the exons and introns alternated in the same order in mouse and human. We studied, in both species: corresponding short segments of introns, whole corresponding introns and complete homologous genes. We considered the total number of nucleotides and the number and orientation of the SINE inserts. Comparisons of mouse and human data series showed that at the level of individual relatively short segments of intronic sequences the stochastic variability prevails in the local structuring, but at higher levels of organization a deterministic component emerges, conserved in mouse and human during the divergent evolution, despite the ample re-editing of the intronic sequences and the fact that processes such as SINE spread had taken place in an independent way in the two species. Intron conservation is negatively correlated with the SINE occupancy, suggesting that virus inserts interfere with the conservation of the sequences inherited from the common ancestor. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. A +1 ribosomal frameshifting motif prevalent among plant amalgaviruses.

    PubMed

    Nibert, Max L; Pyle, Jesse D; Firth, Andrew E

    2016-11-01

    Sequence accessions attributable to novel plant amalgaviruses have been found in the Transcriptome Shotgun Assembly database. Sixteen accessions, derived from 12 different plant species, appear to encompass the complete protein-coding regions of the proposed amalgaviruses, which would substantially expand the size of genus Amalgavirus from 4 current species. Other findings include evidence for UUU_CGN as a +1 ribosomal frameshifting motif prevalent among plant amalgaviruses; for a variant version of this motif found thus far in only two amalgaviruses from solanaceous plants; for a region of α-helical coiled coil propensity conserved in a central region of the ORF1 translation product of plant amalgaviruses; and for conserved sequences in a C-terminal region of the ORF2 translation product (RNA-dependent RNA polymerase) of plant amalgaviruses, seemingly beyond the region of conserved polymerase motifs. These results additionally illustrate the value of mining the TSA database and others for novel viral sequences for comparative analyses. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  7. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans

    PubMed Central

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-01-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191

  8. Conservation of CD44 exon v3 functional elements in mammals

    PubMed Central

    Vela, Elena; Hilari, Josep M; Delclaux, María; Fernández-Bellon, Hugo; Isamat, Marcos

    2008-01-01

    Background The human CD44 gene contains 10 variable exons (v1 to v10) that can be alternatively spliced to generate hundreds of different CD44 protein isoforms. Human CD44 variable exon v3 inclusion in the final mRNA depends on a multisite bipartite splicing enhancer located within the exon itself, which we have recently described, and provides the protein domain responsible for growth factor binding to CD44. Findings We have analyzed the sequence of CD44v3 in 95 mammalian species to report high conservation levels for both its splicing regulatory elements (the 3' splice site and the exonic splicing enhancer), and the functional glycosaminglycan binding site coded by v3. We also report the functional expression of CD44v3 isoforms in peripheral blood cells of different mammalian taxa with both consensus and variant v3 sequences. Conclusion CD44v3 mammalian sequences maintain all functional splicing regulatory elements as well as the GAG binding site with the same relative positions and sequence identity previously described during alternative splicing of human CD44. The sequence within the GAG attachment site, which in turn contains the Y motif of the exonic splicing enhancer, is more conserved relative to the rest of exon. Amplification of CD44v3 sequence from mammalian species but not from birds, fish or reptiles, may lead to classify CD44v3 as an exclusive mammalian gene trait. PMID:18710510

  9. [Identification of new conserved and variable regions in the 16S rRNA gene of acetic acid bacteria and acetobacteraceae family].

    PubMed

    Chakravorty, S; Sarkar, S; Gachhui, R

    2015-01-01

    The Acetobacteraceae family of the class Alpha Proteobacteria is comprised of high sugar and acid tolerant bacteria. The Acetic Acid Bacteria are the economically most significant group of this family because of its association with food products like vinegar, wine etc. Acetobacteraceae are often hard to culture in laboratory conditions and they also maintain very low abundances in their natural habitats. Thus identification of the organisms in such environments is greatly dependent on modern tools of molecular biology which require a thorough knowledge of specific conserved gene sequences that may act as primers and or probes. Moreover unconserved domains in genes also become markers for differentiating closely related genera. In bacteria, the 16S rRNA gene is an ideal candidate for such conserved and variable domains. In order to study the conserved and variable domains of the 16S rRNA gene of Acetic Acid Bacteria and the Acetobacteraceae family, sequences from publicly available databases were aligned and compared. Near complete sequences of the gene were also obtained from Kombucha tea biofilm, a known Acetobacteraceae family habitat, in order to corroborate the domains obtained from the alignment studies. The study indicated that the degree of conservation in the gene is significantly higher among the Acetic Acid Bacteria than the whole Acetobacteraceae family. Moreover it was also observed that the previously described hypervariable regions V1, V3, V5, V6 and V7 were more or less conserved in the family and the spans of the variable regions are quite distinct as well.

  10. Identification of novel microRNAs in Hevea brasiliensis and computational prediction of their targets

    PubMed Central

    2012-01-01

    Background Plants respond to external stimuli through fine regulation of gene expression partially ensured by small RNAs. Of these, microRNAs (miRNAs) play a crucial role. They negatively regulate gene expression by targeting the cleavage or translational inhibition of target messenger RNAs (mRNAs). In Hevea brasiliensis, environmental and harvesting stresses are known to affect natural rubber production. This study set out to identify abiotic stress-related miRNAs in Hevea using next-generation sequencing and bioinformatic analysis. Results Deep sequencing of small RNAs was carried out on plantlets subjected to severe abiotic stress using the Solexa technique. By combining the LeARN pipeline, data from the Plant microRNA database (PMRD) and Hevea EST sequences, we identified 48 conserved miRNA families already characterized in other plant species, and 10 putatively novel miRNA families. The results showed the most abundant size for miRNAs to be 24 nucleotides, except for seven families. Several MIR genes produced both 20-22 nucleotides and 23-27 nucleotides. The two miRNA class sizes were detected for both conserved and putative novel miRNA families, suggesting their functional duality. The EST databases were scanned with conserved and novel miRNA sequences. MiRNA targets were computationally predicted and analysed. The predicted targets involved in "responses to stimuli" and to "antioxidant" and "transcription activities" are presented. Conclusions Deep sequencing of small RNAs combined with transcriptomic data is a powerful tool for identifying conserved and novel miRNAs when the complete genome is not yet available. Our study provided additional information for evolutionary studies and revealed potentially specific regulation of the control of redox status in Hevea. PMID:22330773

  11. Analysis of hepatitis B virus preS1 variability and prevalence of the rs2296651 polymorphism in a Spanish population

    PubMed Central

    Casillas, Rosario; Tabernero, David; Gregori, Josep; Belmonte, Irene; Cortese, Maria Francesca; González, Carolina; Riveiro-Barciela, Mar; López, Rosa Maria; Quer, Josep; Esteban, Rafael; Buti, Maria; Rodríguez-Frías, Francisco

    2018-01-01

    AIM To determine the variability/conservation of the domain of hepatitis B virus (HBV) preS1 region that interacts with sodium-taurocholate cotransporting polypeptide (hereafter, NTCP-interacting domain) and the prevalence of the rs2296651 polymorphism (S267F, NTCP variant) in a Spanish population. METHODS Serum samples from 246 individuals were included and divided into 3 groups: patients with chronic HBV infection (CHB) (n = 41, 73% Caucasians), patients with resolved HBV infection (n = 100, 100% Caucasians) and an HBV-uninfected control group (n = 105, 100% Caucasians). Variability/conservation of the amino acid (aa) sequences of the NTCP-interacting domain, (aa 2-48 in viral genotype D) and a highly conserved preS1 domain associated with virion morphogenesis (aa 92-103 in viral genotype D) were analyzed by next-generation sequencing and compared in 18 CHB patients with viremia > 4 log IU/mL. The rs2296651 polymorphism was determined in all individuals in all 3 groups using an in-house real-time PCR melting curve analysis. RESULTS The HBV preS1 NTCP-interacting domain showed a high degree of conservation among the examined viral genomes especially between aa 9 and 21 (in the genotype D consensus sequence). As compared with the virion morphogenesis domain, the NTCP-interacting domain had a smaller proportion of HBV genotype-unrelated changes comprising > 1% of the quasispecies (25.5% vs 31.8%), but a larger proportion of genotype-associated viral polymorphisms (34% vs 27.3%), according to consensus sequences from GenBank patterns of HBV genotypes A to H. Variation/conservation in both domains depended on viral genotype, with genotype C being the most highly conserved and genotype E the most variable (limited finding, only 2 genotype E included). Of note, proline residues were highly conserved in both domains, and serine residues showed changes only to threonine or tyrosine in the virion morphogenesis domain. The rs2296651 polymorphism was not detected in any participant. CONCLUSION In our CHB population, the NTCP-interacting domain was highly conserved, particularly the proline residues and essential amino acids related with the NTCP interaction, and the prevalence of rs2296651 was low/null. PMID:29456407

  12. Nicotiana alata Defensin Chimeras Reveal Differences in the Mechanism of Fungal and Tumor Cell Killing and an Enhanced Antifungal Variant

    PubMed Central

    Payne, Jennifer A. E.; Hayes, Brigitte M. E.; Durek, Thomas; Craik, David J.; Shafee, Thomas M. A.; Poon, Ivan K. H.; Hulett, Mark D.; van der Weerden, Nicole L.

    2016-01-01

    The plant defensin NaD1 is a potent antifungal molecule that also targets tumor cells with a high efficiency. We examined the features of NaD1 that contribute to these two activities by producing a series of chimeras with NaD2, a defensin that has relatively poor activity against fungi and no activity against tumor cells. All plant defensins have a common tertiary structure known as a cysteine-stabilized α-β motif which consists of an α helix and a triple-stranded β-sheet stabilized by four disulfide bonds. The chimeras were produced by replacing loops 1 to 7, the sequences between each of the conserved cysteine residues on NaD1, with the corresponding loops from NaD2. The loop 5 swap replaced the sequence motif (SKILRR) that mediates tight binding with phosphatidylinositol 4,5-bisphosphate [PI(4,5)P2] and is essential for the potent cytotoxic effect of NaD1 on tumor cells. Consistent with previous reports, there was a strong correlation between PI(4,5)P2 binding and the tumor cell killing activity of all of the chimeras. However, this correlation did not extend to antifungal activity. Some of the loop swap chimeras were efficient antifungal molecules, even though they bound poorly to PI(4,5)P2, suggesting that additional mechanisms operate against fungal cells. Unexpectedly, the loop 1B swap chimera was 10 times more active than NaD1 against filamentous fungi. This led to the conclusion that defensin loops have evolved as modular components that combine to make antifungal molecules with variable mechanisms of action and that artificial combinations of loops can increase antifungal activity compared to that of the natural variants. PMID:27503651

  13. The central globular domain of the nucleocapsid protein of human immunodeficiency virus type 1 is critical for virion structure and infectivity.

    PubMed

    Ottmann, M; Gabus, C; Darlix, J L

    1995-03-01

    The nucleocapsid protein NCp7 of human immunodeficiency virus type 1 (HIV-1) is a 72-amino-acid peptide containing two CCHC-type zinc fingers linked by a short basic sequence, 29RAPRKKG35, which is conserved in HIV-1 and simian immunodeficiency virus. The complete three-dimensional structure of NCp7 has been determined by 1H-nuclear magnetic resonance spectroscopy (N. Morellet, H. de Rocquigny, Y. Mely, N. Jullian, H. Demene, M. Ottmann, D. Gerard, J. L. Darlix, M. C. Fournié-Zaluski, and B. P. Roques, J. Mol. Biol. 235:287-301, 1994) and revealed a central globular domain where the two zinc fingers are brought in close proximity by the RAPRKKG linker. To examine the role of this globular structure and more precisely of the RAPRKKG linker in virion structure and infectivity, we generated HIV-1 DNA mutants in the RAPRKK sequence of NCp7 and analyzed the mutant virions produced by transfected cells. Mutations that probably alter the structure of NCp7 structure led to the formation of very poorly infectious virus (A30P) or noninfectious virus (P31L and R32G). In addition, the P31L mutant did not contain detectable amounts of reverse transcriptase and had an immature core morphology, as determined by electron microscopy. On the other hand, mutations changing the basic nature of NCp7 had poor effect. R29S had a wild-type phenotype, and the replacement of 32RKK34 by SSS (S3 mutant) resulted in a decrease by no more than 100-fold of the virus titer. These results clearly show that the RAPRKKG linker contains residues that are critical for virion structure and infectivity.

  14. The central globular domain of the nucleocapsid protein of human immunodeficiency virus type 1 is critical for virion structure and infectivity.

    PubMed Central

    Ottmann, M; Gabus, C; Darlix, J L

    1995-01-01

    The nucleocapsid protein NCp7 of human immunodeficiency virus type 1 (HIV-1) is a 72-amino-acid peptide containing two CCHC-type zinc fingers linked by a short basic sequence, 29RAPRKKG35, which is conserved in HIV-1 and simian immunodeficiency virus. The complete three-dimensional structure of NCp7 has been determined by 1H-nuclear magnetic resonance spectroscopy (N. Morellet, H. de Rocquigny, Y. Mely, N. Jullian, H. Demene, M. Ottmann, D. Gerard, J. L. Darlix, M. C. Fournié-Zaluski, and B. P. Roques, J. Mol. Biol. 235:287-301, 1994) and revealed a central globular domain where the two zinc fingers are brought in close proximity by the RAPRKKG linker. To examine the role of this globular structure and more precisely of the RAPRKKG linker in virion structure and infectivity, we generated HIV-1 DNA mutants in the RAPRKK sequence of NCp7 and analyzed the mutant virions produced by transfected cells. Mutations that probably alter the structure of NCp7 structure led to the formation of very poorly infectious virus (A30P) or noninfectious virus (P31L and R32G). In addition, the P31L mutant did not contain detectable amounts of reverse transcriptase and had an immature core morphology, as determined by electron microscopy. On the other hand, mutations changing the basic nature of NCp7 had poor effect. R29S had a wild-type phenotype, and the replacement of 32RKK34 by SSS (S3 mutant) resulted in a decrease by no more than 100-fold of the virus titer. These results clearly show that the RAPRKKG linker contains residues that are critical for virion structure and infectivity. PMID:7853517

  15. Conserved and species-specific transcription factor co-binding patterns drive divergent gene regulation in human and mouse

    PubMed Central

    Diehl, Adam G

    2018-01-01

    Abstract The mouse is widely used as system to study human genetic mechanisms. However, extensive rewiring of transcriptional regulatory networks often confounds translation of findings between human and mouse. Site-specific gain and loss of individual transcription factor binding sites (TFBS) has caused functional divergence of orthologous regulatory loci, and so we must look beyond this positional conservation to understand common themes of regulatory control. Fortunately, transcription factor co-binding patterns shared across species often perform conserved regulatory functions. These can be compared to ‘regulatory sentences’ that retain the same meanings regardless of sequence and species context. By analyzing TFBS co-occupancy patterns observed in four human and mouse cell types, we learned a regulatory grammar: the rules by which TFBS are combined into meaningful regulatory sentences. Different parts of this grammar associate with specific sets of functional annotations regardless of sequence conservation and predict functional signatures more accurately than positional conservation. We further show that both species-specific and conserved portions of this grammar are involved in gene expression divergence and human disease risk. These findings expand our understanding of transcriptional regulatory mechanisms, suggesting that phenotypic divergence and disease risk are driven by a complex interplay between deeply conserved and species-specific transcriptional regulatory pathways. PMID:29361190

  16. Typical Window, Interior Wall Paint Sequence, Wall Section, and Foundation ...

    Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

    Typical Window, Interior Wall Paint Sequence, Wall Section, and Foundation Sections - Civilian Conservation Corps (CCC) Camp NP-5-C, Barracks No. 5, CCC Camp Historic District at Chapin Mesa, Cortez, Montezuma County, CO

  17. Conservation of the sequence of the Alzheimer's disease amyloid peptide in dog, polar bear and five other mammals by cross-species polymerase chain reaction analysis.

    PubMed

    Johnstone, E M; Chaney, M O; Norris, F H; Pascual, R; Little, S P

    1991-07-01

    Neuritic plaque and cerebrovascular amyloid deposits have been detected in the aged monkey, dog, and polar bear and have rarely been found in aged rodents (Biochem. Biophy. Res. Commun., 12 (1984) 885-890; Proc. Natl. Acad. Sci. U.S.A., 82 (1985) 4245-4249). To determine if the primary structure of the 42-43 residue amyloid peptide is conserved in species that accumulate plaques, the region of the amyloid precursor protein (APP) cDNA that encodes the peptide region was amplified by the polymerase chain reaction and sequenced. The deduced amino acid sequence was compared to those species where amyloid accumulation has not been detected. The DNA sequences of dog, polar bear, rabbit, cow, sheep, pig and guinea pig were compared and a phylogenetic tree was generated. We conclude that the amino acid sequence of dog and polar bear and other mammals which may form amyloid plaques is conserved and the species where amyloid has not been detected (mouse, rat) may be evolutionarily a distinct group. In addition, the predicted secondary structure of mouse and rat amyloid that differs from that of amyloid bearing species is its lack of propensity to form a beta sheeted structure. Thus, a cross-species examination of the amyloid peptide may suggest what is essential for amyloid deposition.

  18. A dehydrin cognate protein from pea (Pisum sativum L.) with an atypical pattern of expression.

    PubMed

    Robertson, M; Chandler, P M

    1994-11-01

    Dehydrins are a family of proteins characterised by conserved amino acid motifs, and induced in plants by dehydration or treatment with ABA. An antiserum was raised against a synthetic oligopeptide based on the most highly conserved dehydrin amino acid motif, the lysine-rich (core sequence KIKEK-LPG). This antiserum detected a novel M(r) 40,000 polypeptide and enabled isolation of a corresponding cDNA clone, pPsB61 (B61). The deduced amino acid sequence contained two lysine-rich blocks, however the remainder of the sequenced differed markedly from other pea dehydrins. Surprisingly, the sequence contained a stretch of serine residues, a characteristic common to dehydrins from many plant species but which is missing in pea dehydrin. The expression patterns of B61 mRNA and polypeptide were distinctively different from those of the pea dehydrins during seed development, germination and in young seedlings exposed to dehydration stress or treated with ABA. In particular, dehydration stress led to slightly reduced levels of B61 RNA, and ABA application to young seedlings had no marked effect on its abundance. The M(r) 40,000 polypeptide is thus related to pea dehydrin by the presence of the most highly conserved amino acid sequence motifs, but lacks the characteristic expression pattern of dehydrin. By analogy with heat shock cognate proteins we refer to this protein as a dehydrin cognate.

  19. Synteny of Prunus and other model plant species

    PubMed Central

    Jung, Sook; Jiwan, Derick; Cho, Ilhyung; Lee, Taein; Abbott, Albert; Sosinski, Bryon; Main, Dorrie

    2009-01-01

    Background Fragmentary conservation of synteny has been reported between map-anchored Prunus sequences and Arabidopsis. With the availability of genome sequence for fellow rosid I members Populus and Medicago, we analyzed the synteny between Prunus and the three model genomes. Eight Prunus BAC sequences and map-anchored Prunus sequences were used in the comparison. Results We found a well conserved synteny across the Prunus species – peach, plum, and apricot – and Populus using a set of homologous Prunus BACs. Conversely, we could not detect any synteny with Arabidopsis in this region. Other peach BACs also showed extensive synteny with Populus. The syntenic regions detected were up to 477 kb in Populus. Two syntenic regions between Arabidopsis and these BACs were much shorter, around 10 kb. We also found syntenic regions that are conserved between the Prunus BACs and Medicago. The array of synteny corresponded with the proposed whole genome duplication events in Populus and Medicago. Using map-anchored Prunus sequences, we detected many syntenic blocks with several gene pairs between Prunus and Populus or Arabidopsis. We observed a more complex network of synteny between Prunus-Arabidopsis, indicative of multiple genome duplication and subsequence gene loss in Arabidopsis. Conclusion Our result shows the striking microsynteny between the Prunus BACs and the genome of Populus and Medicago. In macrosynteny analysis, more distinct Prunus regions were syntenic to Populus than to Arabidopsis. PMID:19208249

  20. Local Renyi entropic profiles of DNA sequences.

    PubMed

    Vinga, Susana; Almeida, Jonas S

    2007-10-16

    In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at http://kdbio.inesc-id.pt/~svinga/ep/. The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.

  1. Local Renyi entropic profiles of DNA sequences

    PubMed Central

    Vinga, Susana; Almeida, Jonas S

    2007-01-01

    Background In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at . Conclusion The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures. PMID:17939871

  2. Characterization of the complete genome segments from BmCPV-SZ, a novel Bombyx mori cypovirus 1 isolate.

    PubMed

    Cao, Guangli; Meng, Xiangkun; Xue, Renyu; Zhu, Yuexiong; Zhang, Xiaorong; Pan, Zhonghua; Zheng, Xiaojian; Gong, Chengliang

    2012-07-01

    A novel Bombyx mori cypovirus 1 isolated from infected silkworm larvae and tentatively assigned as Bombyx mori cypovirus 1 isolate Suzhou (BmCPV-SZ). The complete nucleotide sequences of genomic segments S1-S10 from BmCPV-SZ were determined. All segments possessed a single open reading frame; however, bioinformatic evidence suggested a short overlapping coding sequence in S1. Each BmCPV-SZ segment possessed the conserved terminal sequences AGUAA and GUUAGCC at the 5' and 3' ends, respectively. The conserved A/G at the -3 position in relation to the AUG codon could be found in the BmCPV-SZ genome, and it was postulated that this conserved A/G may be the most important nucleotide for efficient translation initiation in cypoviruses (CPVs). Examination of the putative amino acid sequences encoded by BmCPV-SZ revealed some characteristic motifs. Homology searches showed that viral structural proteins VP1, VP3, and VP4 had localized homologies with proteins of Rice ragged stunt virus , a member of the genus Oryzavirus within the family Reoviridae. A phylogenetic tree based on RNA-dependent RNA polymerase sequences demonstrated that CPV is more closely related to Rice ragged stunt virus and Aedes pseudoscutellaris reovirus than to other members of Reoviridae, suggesting that they may have originated from common ancestors.

  3. Cloning and sequence analysis of complementary DNA encoding an aberrantly rearranged human T-cell gamma chain.

    PubMed Central

    Dialynas, D P; Murre, C; Quertermous, T; Boss, J M; Leiden, J M; Seidman, J G; Strominger, J L

    1986-01-01

    Complementary DNA (cDNA) encoding a human T-cell gamma chain has been cloned and sequenced. At the junction of the variable and joining regions, there is an apparent deletion of two nucleotides in the human cDNA sequence relative to the murine gamma-chain cDNA sequence, resulting simultaneously in the generation of an in-frame stop codon and in a translational frameshift. For this reason, the sequence presented here encodes an aberrantly rearranged human T-cell gamma chain. There are several surprising differences between the deduced human and murine gamma-chain amino acid sequences. These include poor homology in the variable region, poor homology in a discrete segment of the constant region precisely bounded by the expected junctions of exon CII, and the presence in the human sequence of five potential sites for N-linked glycosylation. Images PMID:3458221

  4. Emotional Disturbance and Chronic Low Back Pain.

    ERIC Educational Resources Information Center

    McCreary, Charles P.; And Others

    1980-01-01

    Patients high in alientation and distrust may be poor compliers. Because only the somatic concern dimension predicted outcome, a single scale that measures this characteristic may be sufficient for effective identification of the potential good v poor responders to conservative treatment of low back pain. (Author)

  5. Phylogenetic and expression analysis of the NPR1-like gene family from Persea americana (Mill.).

    PubMed

    Backer, Robert; Mahomed, Waheed; Reeksting, Bianca J; Engelbrecht, Juanita; Ibarra-Laclette, Enrique; van den Berg, Noëlani

    2015-01-01

    The NONEXPRESSOR OF PATHOGENESIS-RELATED GENES1 (NPR1) forms an integral part of the salicylic acid (SA) pathway in plants and is involved in cross-talk between the SA and jasmonic acid/ethylene (JA/ET) pathways. Therefore, NPR1 is essential to the effective response of plants to pathogens. Avocado (Persea americana) is a commercially important crop worldwide. Significant losses in production result from Phytophthora root rot, caused by the hemibiotroph, Phytophthora cinnamomi. This oomycete infects the feeder roots of avocado trees leading to an overall decline in health and eventual death. The interaction between avocado and P. cinnamomi is poorly understood and as such limited control strategies exist. Thus uncovering the role of NPR1 in avocado could provide novel insights into the avocado - P. cinnamomi interaction. A total of five NPR1-like sequences were identified. These sequences were annotated using FGENESH and a maximum-likelihood tree was constructed using 34 NPR1-like protein sequences from other plant species. The conserved protein domains and functional motifs of these sequences were predicted. Reverse transcription quantitative PCR was used to analyze the expression of the five NPR1-like sequences in the roots of avocado after treatment with salicylic and jasmonic acid, P. cinnamomi infection, across different tissues and in P. cinnamomi infected tolerant and susceptible rootstocks. Of the five NPR1-like sequences three have strong support for a defensive role while two are most likely involved in development. Significant differences in the expression profiles of these five NPR1-like genes were observed, assisting in functional classification. Understanding the interaction of avocado and P. cinnamomi is essential to developing new control strategies. This work enables further classification of these genes by means of functional annotation and is a crucial step in understanding the role of NPR1 during P. cinnamomi infection.

  6. Phylogenetic and expression analysis of the NPR1-like gene family from Persea americana (Mill.)

    PubMed Central

    Backer, Robert; Mahomed, Waheed; Reeksting, Bianca J.; Engelbrecht, Juanita; Ibarra-Laclette, Enrique; van den Berg, Noëlani

    2015-01-01

    The NONEXPRESSOR OF PATHOGENESIS-RELATED GENES1 (NPR1) forms an integral part of the salicylic acid (SA) pathway in plants and is involved in cross-talk between the SA and jasmonic acid/ethylene (JA/ET) pathways. Therefore, NPR1 is essential to the effective response of plants to pathogens. Avocado (Persea americana) is a commercially important crop worldwide. Significant losses in production result from Phytophthora root rot, caused by the hemibiotroph, Phytophthora cinnamomi. This oomycete infects the feeder roots of avocado trees leading to an overall decline in health and eventual death. The interaction between avocado and P. cinnamomi is poorly understood and as such limited control strategies exist. Thus uncovering the role of NPR1 in avocado could provide novel insights into the avocado – P. cinnamomi interaction. A total of five NPR1-like sequences were identified. These sequences were annotated using FGENESH and a maximum-likelihood tree was constructed using 34 NPR1-like protein sequences from other plant species. The conserved protein domains and functional motifs of these sequences were predicted. Reverse transcription quantitative PCR was used to analyze the expression of the five NPR1-like sequences in the roots of avocado after treatment with salicylic and jasmonic acid, P. cinnamomi infection, across different tissues and in P. cinnamomi infected tolerant and susceptible rootstocks. Of the five NPR1-like sequences three have strong support for a defensive role while two are most likely involved in development. Significant differences in the expression profiles of these five NPR1-like genes were observed, assisting in functional classification. Understanding the interaction of avocado and P. cinnamomi is essential to developing new control strategies. This work enables further classification of these genes by means of functional annotation and is a crucial step in understanding the role of NPR1 during P. cinnamomi infection. PMID:25972890

  7. Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs

    PubMed Central

    2014-01-01

    Background We introduce Sequence Bundles--a novel data visualisation method for representing multiple sequence alignments (MSAs). We identify and address key limitations of the existing bioinformatics data visualisation methods (i.e. the Sequence Logo) by enabling Sequence Bundles to give salient visual expression to sequence motifs and other data features, which would otherwise remain hidden. Methods For the development of Sequence Bundles we employed research-led information design methodologies. Sequences are encoded as uninterrupted, semi-opaque lines plotted on a 2-dimensional reconfigurable grid. Each line represents a single sequence. The thickness and opacity of the stack at each residue in each position indicates the level of conservation and the lines' curved paths expose patterns in correlation and functionality. Several MSAs can be visualised in a composite image. The Sequence Bundles method is designed to favour a tangible, continuous and intuitive display of information. Results We have developed a software demonstration application for generating a Sequence Bundles visualisation of MSAs provided for the BioVis 2013 redesign contest. A subsequent exploration of the visualised line patterns allowed for the discovery of a number of interesting features in the dataset. Reported features include the extreme conservation of sequences displaying a specific residue and bifurcations of the consensus sequence. Conclusions Sequence Bundles is a novel method for visualisation of MSAs and the discovery of sequence motifs. It can aid in generating new insight and hypothesis making. Sequence Bundles is well disposed for future implementation as an interactive visual analytics software, which can complement existing visualisation tools. PMID:25237395

  8. Selection of Optimal Polypurine Tract Region Sequences during Moloney Murine Leukemia Virus Replication

    PubMed Central

    Robson, Nicole D.; Telesnitsky, Alice

    2000-01-01

    Retrovirus plus-strand synthesis is primed by a cleavage remnant of the polypurine tract (PPT) region of viral RNA. In this study, we tested replication properties for Moloney murine leukemia viruses with targeted mutations in the PPT and in conserved sequences upstream, as well as for pools of mutants with randomized sequences in these regions. The importance of maintaining some purine residues within the PPT was indicated both by examining the evolution of random PPT pools and from the replication properties of targeted mutants. Although many different PPT sequences could support efficient replication and one mutant that contained two differences in the core PPT was found to replicate as well as the wild type, some sequences in the core PPT clearly conferred advantages over others. Contributions of sequences upstream of the core PPT were examined with deletion mutants. A conserved T-stretch within the upstream sequence was examined in detail and found to be unimportant to helper functions. Evolution of virus pools containing randomized T-stretch sequences demonstrated marked preference for the wild-type sequence in six of its eight positions. These findings demonstrate that maintenance of the T-rich element is more important to viral replication than is maintenance of the core PPT. PMID:11044073

  9. The Physics of Toys.

    ERIC Educational Resources Information Center

    Levinstein, Henry

    1982-01-01

    Outlines a course in which toys are used to demonstrate physics concepts, stressing available or easily constructed toys. A recent lecture sequence included toys demonstrating equilibrium, force/torque, linear/angular momentum conservation, energy conservation/storage, flying, vibrations, programed music, and others. Illustrations of selected toys…

  10. Evaluating, Comparing, and Interpreting Protein Domain Hierarchies

    PubMed Central

    2014-01-01

    Abstract Arranging protein domain sequences hierarchically into evolutionarily divergent subgroups is important for investigating evolutionary history, for speeding up web-based similarity searches, for identifying sequence determinants of protein function, and for genome annotation. However, whether or not a particular hierarchy is optimal is often unclear, and independently constructed hierarchies for the same domain can often differ significantly. This article describes methods for statistically evaluating specific aspects of a hierarchy, for probing the criteria underlying its construction and for direct comparisons between hierarchies. Information theoretical notions are used to quantify the contributions of specific hierarchical features to the underlying statistical model. Such features include subhierarchies, sequence subgroups, individual sequences, and subgroup-associated signature patterns. Underlying properties are graphically displayed in plots of each specific feature's contributions, in heat maps of pattern residue conservation, in “contrast alignments,” and through cross-mapping of subgroups between hierarchies. Together, these approaches provide a deeper understanding of protein domain functional divergence, reveal uncertainties caused by inconsistent patterns of sequence conservation, and help resolve conflicts between competing hierarchies. PMID:24559108

  11. Cube - an online tool for comparison and contrasting of protein sequences.

    PubMed

    Zhang, Zong Hong; Khoo, Aik Aun; Mihalek, Ivana

    2013-01-01

    When comparing sequences of similar proteins, two kinds of questions can be asked, and the related two kinds of inference made. First, one may ask to what degree they are similar, and then, how they differ. In the first case one may tentatively conclude that the conserved elements common to all sequences are of central and common importance to the protein's function. In the latter case the regions of specialization may be discriminative of the function or binding partners across subfamilies of related proteins. Experimental efforts - mutagenesis or pharmacological intervention - can then be pointed in either direction, depending on the context of the study. Cube simplifies this process for users that already have their favorite sets of sequences, and helps them collate the information by visualization of the conservation and specialization scores on the sequence and on the structure, and by spreadsheet tabulation. All information can be visualized on the spot, or downloaded for reference and later inspection. http://eopsf.org/cube.

  12. Genetic diversity of the captive Asian tapir population in Thailand, based on mitochondrial control region sequence data and the comparison of its nucleotide structure with Brazilian tapir.

    PubMed

    Muangkram, Yuttamol; Amano, Akira; Wajjwalku, Worawidh; Pinyopummintr, Tanu; Thongtip, Nikorn; Kaolim, Nongnid; Sukmak, Manakorn; Kamolnorranath, Sumate; Siriaroonrat, Boripat; Tipkantha, Wanlaya; Maikaew, Umaporn; Thomas, Warisara; Polsrila, Kanda; Dongsaard, Kwanreaun; Sanannu, Saowaphang; Wattananorrasate, Anuwat

    2017-07-01

    The Asian tapir (Tapirus indicus) has been classified as Endangered on the IUCN Red List of Threatened Species (2008). Genetic diversity data provide important information for the management of captive breeding and conservation of this species. We analyzed mitochondrial control region (CR) sequences from 37 captive Asian tapirs in Thailand. Multiple alignments of the full-length CR sequences sized 1268 bp comprised three domains as described in other mammal species. Analysis of 16 parsimony-informative variable sites revealed 11 haplotypes. Furthermore, the phylogenetic analysis using median-joining network clearly showed three clades correlated with our earlier cytochrome b gene study in this endangered species. The repetitive motif is located between first and second conserved sequence blocks, similar to the Brazilian tapir. The highest polymorphic site was located in the extended termination associated sequences domain. The results could be applied for future genetic management based in captivity and wild that shows stable populations.

  13. Centromere-Like Regions in the Budding Yeast Genome

    PubMed Central

    Lefrançois, Philippe; Auerbach, Raymond K.; Yellman, Christopher M.; Roeder, G. Shirleen; Snyder, Michael

    2013-01-01

    Accurate chromosome segregation requires centromeres (CENs), the DNA sequences where kinetochores form, to attach chromosomes to microtubules. In contrast to most eukaryotes, which have broad centromeres, Saccharomyces cerevisiae possesses sequence-defined point CENs. Chromatin immunoprecipitation followed by sequencing (ChIP–Seq) reveals colocalization of four kinetochore proteins at novel, discrete, non-centromeric regions, especially when levels of the centromeric histone H3 variant, Cse4 (a.k.a. CENP-A or CenH3), are elevated. These regions of overlapping protein binding enhance the segregation of plasmids and chromosomes and have thus been termed Centromere-Like Regions (CLRs). CLRs form in close proximity to S. cerevisiae CENs and share characteristics typical of both point and regional CENs. CLR sequences are conserved among related budding yeasts. Many genomic features characteristic of CLRs are also associated with these conserved homologous sequences from closely related budding yeasts. These studies provide general and important insights into the origin and evolution of centromeres. PMID:23349633

  14. Gene context conservation of a higher order than operons.

    PubMed

    Lathe, W C; Snel, B; Bork, P

    2000-10-01

    Operons, co-transcribed and co-regulated contiguous sets of genes, are poorly conserved over short periods of evolutionary time. The gene order, gene content and regulatory mechanisms of operons can be very different, even in closely related species. Here, we present several lines of evidence which suggest that, although an operon and its individual genes and regulatory structures are rearranged when comparing the genomes of different species, this rearrangement is a conservative process. Genomic rearrangements invariably maintain individual genes in very specific functional and regulatory contexts. We call this conserved context an uber-operon.

  15. Distribution, Diversity, and Long-Term Retention of Grass Short Interspersed Nuclear Elements (SINEs).

    PubMed

    Mao, Hongliang; Wang, Hao

    2017-08-01

    Instances of highly conserved plant short interspersed nuclear element (SINE) families and their enrichment near genes have been well documented, but little is known about the general patterns of such conservation and enrichment and underlying mechanisms. Here, we perform a comprehensive investigation of the structure, distribution, and evolution of SINEs in the grass family by analyzing 14 grass and 5 other flowering plant genomes using comparative genomics methods. We identify 61 SINE families composed of 29,572 copies, in which 46 families are first described. We find that comparing with other grass TEs, grass SINEs show much higher level of conservation in terms of genomic retention: The origin of at least 26% families can be traced to early grass diversification and these families are among most abundant SINE families in 86% species. We find that these families show much higher level of enrichment near protein coding genes than families of relatively recent origin (51%:28%), and that 40% of all grass SINEs are near gene and the percentage is higher than other types of grass TEs. The pattern of enrichment suggests that differential removal of SINE copies in gene-poor regions plays an important role in shaping the genomic distribution of these elements. We also identify a sequence motif located at 3' SINE end which is shared in 17 families. In short, this study provides insights into structure and evolution of SINEs in the grass family. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  16. Transcriptomic analysis of the autophagy machinery in crustaceans.

    PubMed

    Suwansa-Ard, Saowaros; Kankuan, Wilairat; Thongbuakaew, Tipsuda; Saetan, Jirawat; Kornthong, Napamanee; Kruangkum, Thanapong; Khornchatri, Kanjana; Cummins, Scott F; Isidoro, Ciro; Sobhon, Prasert

    2016-08-09

    The giant freshwater prawn, Macrobrachium rosenbergii, is a decapod crustacean that is commercially important as a food source. Farming of commercial crustaceans requires an efficient management strategy because the animals are easily subjected to stress and diseases during the culture. Autophagy, a stress response process, is well-documented and conserved in most animals, yet it is poorly studied in crustaceans. In this study, we have performed an in silico search for transcripts encoding autophagy-related (Atg) proteins within various tissue transcriptomes of M. rosenbergii. Basic Local Alignment Search Tool (BLAST) search using previously known Atg proteins as queries revealed 41 transcripts encoding homologous M. rosenbergii Atg proteins. Among these Atg proteins, we selected commonly used autophagy markers, including Beclin 1, vacuolar protein sorting (Vps) 34, microtubule-associated proteins 1A/1B light chain 3B (MAP1LC3B), p62/sequestosome 1 (SQSTM1), and lysosomal-associated membrane protein 1 (Lamp-1) for further sequence analyses using comparative alignment and protein structural prediction. We found that crustacean autophagy marker proteins contain conserved motifs typical of other animal Atg proteins. Western blotting using commercial antibodies raised against human Atg marker proteins indicated their presence in various M. rosenbergii tissues, while immunohistochemistry localized Atg marker proteins within ovarian tissue, specifically late stage oocytes. This study demonstrates that the molecular components of autophagic process are conserved in crustaceans, which is comparable to autophagic process in mammals. Furthermore, it provides a foundation for further studies of autophagy in crustaceans that may lead to more understanding of the reproduction- and stress-related autophagy, which will enable the efficient aquaculture practices.

  17. Conserved thioredoxin fold is present in Pisum sativum L. sieve element occlusion-1 protein

    PubMed Central

    Umate, Pavan; Tuteja, Renu

    2010-01-01

    Homology-based three-dimensional model for Pisum sativum sieve element occlusion 1 (Ps.SEO1) (forisomes) protein was constructed. A stretch of amino acids (residues 320 to 456) which is well conserved in all known members of forisomes proteins was used to model the 3D structure of Ps.SEO1. The structural prediction was done using Protein Homology/analogY Recognition Engine (PHYRE) web server. Based on studies of local sequence alignment, the thioredoxin-fold containing protein [Structural Classification of Proteins (SCOP) code d1o73a_], a member of the glutathione peroxidase family was selected as a template for modeling the spatial structure of Ps.SEO1. Selection was based on comparison of primary sequence, higher match quality and alignment accuracy. Motif 1 (EVF) is conserved in Ps.SEO1, Vicia faba (Vf.For1) and Medicago truncatula (MT.SEO3); motif 2 (KKED) is well conserved across all forisomes proteins and motif 3 (IGYIGNP) is conserved in Ps.SEO1 and Vf.For1. PMID:20404566

  18. Development and application of a PCR assay to detect chicken and turkey parvoviruses in commercial poultry flocks in the United States.

    USDA-ARS?s Scientific Manuscript database

    Comparative sequence analysis of six independent chicken and turkey parvovirus nonstructural (NS) genes revealed specific genomic regions with 100% nucleotide sequence identity. A PCR assay with primers targeting these conserved genome sequences proved to be highly specific and sensitive to detect p...

  19. Plant centromeres: structure and control.

    PubMed

    Richards, E J; Dawe, R K

    1998-04-01

    Recent work has led to a better understanding of the molecular components of plant centromeres. Conservation of at least some centromere protein constituents between plant and non-plant systems has been demonstrated. The identity and organization of plant centromeric DNA sequences are also beginning to yield to analysis. While there is little primary DNA sequence conservation among the characterized plant centromeres and their non-plant counterparts, some parallels in centromere genomic organisation can be seen across species. Finally, the emerging idea that centromere activity is controlled epigenetically finds support in an examination of the plant centromere literature.

  20. Optimal packaging of FIV genomic RNA depends upon a conserved long-range interaction and a palindromic sequence within gag.

    PubMed

    Rizvi, Tahir A; Kenyon, Julia C; Ali, Jahabar; Aktar, Suriya J; Phillip, Pretty S; Ghazawi, Akela; Mustafa, Farah; Lever, Andrew M L

    2010-10-15

    The feline immunodeficiency virus (FIV) is a lentivirus that is related to human immunodeficiency virus (HIV), causing a similar pathology in cats. It is a potential small animal model for AIDS and the FIV-based vectors are also being pursued for human gene therapy. Previous studies have mapped the FIV packaging signal (ψ) to two or more discontinuous regions within the 5' 511 nt of the genomic RNA and structural analyses have determined its secondary structure. The 5' and 3' sequences within ψ region interact through extensive long-range interactions (LRIs), including a conserved heptanucleotide interaction between R/U5 and gag. Other secondary structural elements identified include a conserved 150 nt stem-loop (SL2) and a small palindromic stem-loop within gag open reading frame that might act as a viral dimerization initiation site. We have performed extensive mutational analysis of these sequences and structures and ascertained their importance in FIV packaging using a trans-complementation assay. Disrupting the conserved heptanucleotide LRI to prevent base pairing between R/U5 and gag reduced packaging by 2.8-5.5 fold. Restoration of pairing using an alternative, non-wild type (wt) LRI sequence restored RNA packaging and propagation to wt levels, suggesting that it is the structure of the LRI, rather than its sequence, that is important for FIV packaging. Disrupting the palindrome within gag reduced packaging by 1.5-3-fold, but substitution with a different palindromic sequence did not restore packaging completely, suggesting that the sequence of this region as well as its palindromic nature is important. Mutation of individual regions of SL2 did not have a pronounced effect on FIV packaging, suggesting that either it is the structure of SL2 as a whole that is necessary for optimal packaging, or that there is redundancy within this structure. The mutational analysis presented here has further validated the previously predicted RNA secondary structure of FIV ψ. Copyright © 2010 Elsevier Ltd. All rights reserved.

  1. On a new class of completely integrable nonlinear wave equations. II. Multi-Hamiltonian structure

    NASA Astrophysics Data System (ADS)

    Nutku, Y.

    1987-11-01

    The multi-Hamiltonian structure of a class of nonlinear wave equations governing the propagation of finite amplitude waves is discussed. Infinitely many conservation laws had earlier been obtained for these equations. Starting from a (primary) Hamiltonian formulation of these equations the necessary and sufficient conditions for the existence of bi-Hamiltonian structure are obtained and it is shown that the second Hamiltonian operator can be constructed solely through a knowledge of the first Hamiltonian function. The recursion operator which first appears at the level of bi-Hamiltonian structure gives rise to an infinite sequence of conserved Hamiltonians. It is found that in general there exist two different infinite sequences of conserved quantities for these equations. The recursion relation defining higher Hamiltonian structures enables one to obtain the necessary and sufficient conditions for the existence of the (k+1)st Hamiltonian operator which depends on the kth Hamiltonian function. The infinite sequence of conserved Hamiltonians are common to all the higher Hamiltonian structures. The equations of gas dynamics are discussed as an illustration of this formalism and it is shown that in general they admit tri-Hamiltonian structure with two distinct infinite sets of conserved quantities. The isothermal case of γ=1 is an exceptional one that requires separate treatment. This corresponds to a specialization of the equations governing the expansion of plasma into vacuum which will be shown to be equivalent to Poisson's equation in nonlinear acoustics.

  2. Microfluidic affinity and ChIP-seq analyses converge on a conserved FOXP2-binding motif in chimp and human, which enables the detection of evolutionarily novel targets.

    PubMed

    Nelson, Christopher S; Fuller, Chris K; Fordyce, Polly M; Greninger, Alexander L; Li, Hao; DeRisi, Joseph L

    2013-07-01

    The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein's DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2's-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved.

  3. Microfluidic affinity and ChIP-seq analyses converge on a conserved FOXP2-binding motif in chimp and human, which enables the detection of evolutionarily novel targets

    PubMed Central

    Nelson, Christopher S.; Fuller, Chris K.; Fordyce, Polly M.; Greninger, Alexander L.; Li, Hao; DeRisi, Joseph L.

    2013-01-01

    The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein’s DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2’s-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved. PMID:23625967

  4. Conservation and divergence of plant LHP1 protein sequences and expression patterns in angiosperms and gymnosperms.

    PubMed

    Guan, Hexin; Zheng, Zhengui; Grey, Paris H; Li, Yuhua; Oppenheimer, David G

    2011-05-01

    Floral transition is a critical and strictly regulated developmental process in plants. Mutations in Arabidopsis LIKE HETEROCHROMATIN PROTEIN 1 (AtLHP1)/TERMINAL FLOWER 2 (TFL2) result in early and terminal flowers. Little is known about the gene expression, function and evolution of plant LHP1 homologs, except for Arabidopsis LHP1. In this study, the conservation and divergence of plant LHP1 protein sequences was analyzed by sequence alignments and phylogeny. LHP1 expression patterns were compared among taxa that occupy pivotal phylogenetic positions. Several relatively conserved new motifs/regions were identified among LHP1 homologs. Phylogeny of plant LHP1 proteins agreed with established angiosperm relationships. In situ hybridization unveiled conserved expression of plant LHP1 in the axillary bud/tiller, vascular bundles, developing stamens, and carpels. Unlike AtLHP1, cucumber CsLHP1-2, sugarcane SoLHP1 and maize ZmLHP1, rice OsLHP1 is not expressed in the shoot apical meristem (SAM) and the OsLHP1 transcript level is consistently low in shoots. "Unequal crossover" might have contributed to the divergence in the N-terminal and hinge region lengths of LHP1 homologs. We propose an "insertion-deletion" model for soybean (Glycine max L.) GmLHP1s evolution. Plant LHP1 homologs are more conserved than previously expected, and may favor vegetative meristem identity and primordia formation. OsLHP1 may not function in rice SAM during floral induction.

  5. The impact of age, biogenesis, and genomic clustering on Drosophila microRNA evolution

    PubMed Central

    Mohammed, Jaaved; Flynt, Alex S.; Siepel, Adam; Lai, Eric C.

    2013-01-01

    The molecular evolutionary signatures of miRNAs inform our understanding of their emergence, biogenesis, and function. The known signatures of miRNA evolution have derived mostly from the analysis of deeply conserved, canonical loci. In this study, we examine the impact of age, biogenesis pathway, and genomic arrangement on the evolutionary properties of Drosophila miRNAs. Crucial to the accuracy of our results was our curation of high-quality miRNA alignments, which included nearly 150 corrections to ortholog calls and nucleotide sequences of the global 12-way Drosophilid alignments currently available. Using these data, we studied primary sequence conservation, normalized free-energy values, and types of structure-preserving substitutions. We expand upon common miRNA evolutionary patterns that reflect fundamental features of miRNAs that are under functional selection. We observe that melanogaster-subgroup-specific miRNAs, although recently emerged and rapidly evolving, nonetheless exhibit evolutionary signatures that are similar to well-conserved miRNAs and distinct from other structured noncoding RNAs and bulk conserved non-miRNA hairpins. This provides evidence that even young miRNAs may be selected for regulatory activities. More strikingly, we observe that mirtrons and clustered miRNAs both exhibit distinct evolutionary properties relative to solo, well-conserved miRNAs, even after controlling for sequence depth. These studies highlight the previously unappreciated impact of biogenesis strategy and genomic location on the evolutionary dynamics of miRNAs, and affirm that miRNAs do not evolve as a unitary class. PMID:23882112

  6. Assessment of phylogenetic relationship of rare plant species collected from Saudi Arabia using internal transcribed spacer sequences of nuclear ribosomal DNA.

    PubMed

    Al-Qurainy, F; Khan, S; Nadeem, M; Tarroum, M; Alaklabi, A

    2013-03-11

    The rare and endangered plants of any country are important genetic resources that often require urgent conservation measures. Assessment of phylogenetic relationships and evaluation of genetic diversity is very important prior to implementation of conservation strategies for saving rare and endangered plant species. We used internal transcribed spacer sequences of nuclear ribosomal DNA for the evaluation of sequence identity from the available taxa in the GenBank database by using the Basic Local Alignment Search Tool (BLAST). Two rare plant species viz, Heliotropium strigosum claded with H. pilosum (98% branch support) and Pancratium tortuosum claded with P. tenuifolium (61% branch support) clearly. However, some species, viz Scadoxus multiflorus, Commiphora myrrha and Senecio hadiensis showed close relationships with more than one species. We conclude that nuclear ribosomal internal transcribed spacer sequences are useful markers for phylogenetic study of these rare plant species in Saudi Arabia.

  7. Biocuration in the structure-function linkage database: the anatomy of a superfamily.

    PubMed

    Holliday, Gemma L; Brown, Shoshana D; Akiva, Eyal; Mischel, David; Hicks, Michael A; Morris, John H; Huang, Conrad C; Meng, Elaine C; Pegg, Scott C-H; Ferrin, Thomas E; Babbitt, Patricia C

    2017-01-01

    With ever-increasing amounts of sequence data available in both the primary literature and sequence repositories, there is a bottleneck in annotating molecular function to a sequence. This article describes the biocuration process and methods used in the structure-function linkage database (SFLD) to help address some of the challenges. We discuss how the hierarchy within the SFLD allows us to infer detailed functional properties for functionally diverse enzyme superfamilies in which all members are homologous, conserve an aspect of their chemical function and have associated conserved structural features that enable the chemistry. Also presented is the Enzyme Structure-Function Ontology (ESFO), which has been designed to capture the relationships between enzyme sequence, structure and function that underlie the SFLD and is used to guide the biocuration processes within the SFLD. http://sfld.rbvi.ucsf.edu/. © The Author 2017. Published by Oxford University Press.

  8. An alternative nested-PCR assay for the detection of Toxoplasma gondii strains based on GRA7 gene sequences.

    PubMed

    Costa, Maria Eduarda S M; Oliveira, Claudio Bruno S; Andrade, Joelma Maria de A; Medeiros, Thatiany A; Neto, Valter F Andrade; Lanza, Daniel C F

    2016-07-01

    Toxoplasma gondii is a widespread parasite able to infect virtually any nucleated cells of warm-blooded hosts. In some cases, T. gondii detection using already developed PCR primers can be inefficient in routine laboratory tests, especially to detect atypical strains. Here we report a new nested-PCR protocol able to detect virtually all T. gondii isolates. Analyzing 685 sequences available in GenBank, we determine that GRA7 is one of the most conserved genes of T. gondii genome. Based on an alignment of 85 GRA7 sequences new primer sets that anneal in the highly conserved regions of this gene were designed. The new GRA7 nested-PCR assay providing sensitivity and specificity equal to or greater than the gold standard PCR assays for T. gondii detection, that amplify the B1 sequence or the repetitive 529bp element. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. A TALE-inspired computational screen for proteins that contain approximate tandem repeats.

    PubMed

    Perycz, Malgorzata; Krwawicz, Joanna; Bochtler, Matthias

    2017-01-01

    TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen.

  10. A TALE-inspired computational screen for proteins that contain approximate tandem repeats

    PubMed Central

    Krwawicz, Joanna

    2017-01-01

    TAL (transcription activator-like) effectors (TALEs) are bacterial proteins that are secreted from bacteria to plant cells to act as transcriptional activators. TALEs and related proteins (RipTALs, BurrH, MOrTL1 and MOrTL2) contain approximate tandem repeats that differ in conserved positions that define specificity. Using PERL, we screened ~47 million protein sequences for TALE-like architecture characterized by approximate tandem repeats (between 30 and 43 amino acids in length) and sequence variability in conserved positions, without requiring sequence similarity to TALEs. Candidate proteins were scored according to their propensity for nuclear localization, secondary structure, repeat sequence complexity, as well as covariation and predicted structural proximity of variable residues. Biological context was tentatively inferred from co-occurrence of other domains and interactome predictions. Approximate repeats with TALE-like features that merit experimental characterization were found in a protein of chestnut blight fungus, a eukaryotic plant pathogen. PMID:28617832

  11. Identification of evolutionarily conserved Momordica charantia microRNAs using computational approach and its utility in phylogeny analysis.

    PubMed

    Thirugnanasambantham, Krishnaraj; Saravanan, Subramanian; Karikalan, Kulandaivelu; Bharanidharan, Rajaraman; Lalitha, Perumal; Ilango, S; HairulIslam, Villianur Ibrahim

    2015-10-01

    Momordica charantia (bitter gourd, bitter melon) is a monoecious Cucurbitaceae with anti-oxidant, anti-microbial, anti-viral and anti-diabetic potential. Molecular studies on this economically valuable plant are very essential to understand its phylogeny and evolution. MicroRNAs (miRNAs) are conserved, small, non-coding RNA with ability to regulate gene expression by bind the 3' UTR region of target mRNA and are evolved at different rates in different plant species. In this study we have utilized homology based computational approach and identified 27 mature miRNAs for the first time from this bio-medically important plant. The phylogenetic tree developed from binary data derived from the data on presence/absence of the identified miRNAs were noticed to be uncertain and biased. Most of the identified miRNAs were highly conserved among the plant species and sequence based phylogeny analysis of miRNAs resolved the above difficulties in phylogeny approach using miRNA. Predicted gene targets of the identified miRNAs revealed their importance in regulation of plant developmental process. Reported miRNAs held sequence conservation in mature miRNAs and the detailed phylogeny analysis of pre-miRNA sequences revealed genus specific segregation of clusters. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Placental Hypomethylation Is More Pronounced in Genomic Loci Devoid of Retroelements

    PubMed Central

    Chatterjee, Aniruddha; Macaulay, Erin C.; Rodger, Euan J.; Stockwell, Peter A.; Parry, Matthew F.; Roberts, Hester E.; Slatter, Tania L.; Hung, Noelyn A.; Devenish, Celia J.; Morison, Ian M.

    2016-01-01

    The human placenta is hypomethylated compared to somatic tissues. However, the degree and specificity of placental hypomethylation across the genome is unclear. We assessed genome-wide methylation of the human placenta and compared it to that of the neutrophil, a representative homogeneous somatic cell. We observed global hypomethylation in placenta (relative reduction of 22%) compared to neutrophils. Placental hypomethylation was pronounced in intergenic regions and gene bodies, while the unmethylated state of the promoter remained conserved in both tissues. For every class of repeat elements, the placenta showed lower methylation but the degree of hypomethylation differed substantially between these classes. However, some retroelements, especially the evolutionarily younger Alu elements, retained high levels of placental methylation. Surprisingly, nonretrotransposon-containing sequences showed a greater degree of placental hypomethylation than retrotransposons in every genomic element (intergenic, introns, and exons) except promoters. The differentially methylated fragments (DMFs) in placenta and neutrophils were enriched in gene-poor and CpG-poor regions. The placentally hypomethylated DMFs were enriched in genomic regions that are usually inactive, whereas hypermethylated DMFs were enriched in active regions. Hypomethylation of the human placenta is not specific to retroelements, indicating that the evolutionary advantages of placental hypomethylation go beyond those provided by expression of retrotransposons and retrogenes. PMID:27172225

  13. Conserved Sequences at the Origin of Adenovirus DNA Replication

    PubMed Central

    Stillman, Bruce W.; Topp, William C.; Engler, Jeffrey A.

    1982-01-01

    The origin of adenovirus DNA replication lies within an inverted sequence repetition at either end of the linear, double-stranded viral DNA. Initiation of DNA replication is primed by a deoxynucleoside that is covalently linked to a protein, which remains bound to the newly synthesized DNA. We demonstrate that virion-derived DNA-protein complexes from five human adenovirus serological subgroups (A to E) can act as a template for both the initiation and the elongation of DNA replication in vitro, using nuclear extracts from adenovirus type 2 (Ad2)-infected HeLa cells. The heterologous template DNA-protein complexes were not as active as the homologous Ad2 DNA, most probably due to inefficient initiation by Ad2 replication factors. In an attempt to identify common features which may permit this replication, we have also sequenced the inverted terminal repeated DNA from human adenovirus serotypes Ad4 (group E), Ad9 and Ad10 (group D), and Ad31 (group A), and we have compared these to previously determined sequences from Ad2 and Ad5 (group C), Ad7 (group B), and Ad12 and Ad18 (group A) DNA. In all cases, the sequence around the origin of DNA replication can be divided into two structural domains: a proximal A · T-rich region which is partially conserved among these serotypes, and a distal G · C-rich region which is less well conserved. The G · C-rich region contains sequences similar to sequences present in papovavirus replication origins. The two domains may reflect a dual mechanism for initiation of DNA replication: adenovirus-specific protein priming of replication, and subsequent utilization of this primer by host replication factors for completion of DNA synthesis. Images PMID:7143575

  14. Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis

    PubMed Central

    Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia

    2011-01-01

    Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358

  15. Antimicrobial activity of Brassica nectar lipid transfer protein

    USDA-ARS?s Scientific Manuscript database

    Antimicrobial peptides (AMPs) provide an ancient, innate immunity conserved in all multicellular organisms. In plants, there are several large families of AMPs defined by sequence similarity. The nonspecific lipid transfer protein (LTP) family is defined by a conserved signature of eight cysteines a...

  16. Information analysis of sequences that bind the replication initiator RepA | Center for Cancer Research

    Cancer.gov

    The tall letters represent the highly conserved bases in DNA binding sites of several prokaryotic repressors and activators. Conservation is strongest where major grooves of the double helical DNA (represented by crests of a cosine wave) face the protein. This shows that conservation analysis alone can be used to predict the face of DNA that contacts the proteins.

  17. Whole-Genome Sequencing and Variant Analysis of Human Papillomavirus 16 Infections.

    PubMed

    van der Weele, Pascal; Meijer, Chris J L M; King, Audrey J

    2017-10-01

    Human papillomavirus (HPV) is a strongly conserved DNA virus, high-risk types of which can cause cervical cancer in persistent infections. The most common type found in HPV-attributable cancer is HPV16, which can be subdivided into four lineages (A to D) with different carcinogenic properties. Studies have shown HPV16 sequence diversity in different geographical areas, but only limited information is available regarding HPV16 diversity within a population, especially at the whole-genome level. We analyzed HPV16 major variant diversity and conservation in persistent infections and performed a single nucleotide polymorphism (SNP) comparison between persistent and clearing infections. Materials were obtained in the Netherlands from a cohort study with longitudinal follow-up for up to 3 years. Our analysis shows a remarkably large variant diversity in the population. Whole-genome sequences were obtained for 57 persistent and 59 clearing HPV16 infections, resulting in 109 unique variants. Interestingly, persistent infections were completely conserved through time. One reinfection event was identified where the initial and follow-up samples clustered differently. Non-A1/A2 variants seemed to clear preferentially ( P = 0.02). Our analysis shows that population-wide HPV16 sequence diversity is very large. In persistent infections, the HPV16 sequence was fully conserved. Sequencing can identify HPV16 reinfections, although occurrence is rare. SNP comparison identified no strongly acting effect of the viral genome affecting HPV16 infection clearance or persistence in up to 3 years of follow-up. These findings suggest the progression of an early HPV16 infection could be host related. IMPORTANCE Human papillomavirus 16 (HPV16) is the predominant type found in cervical cancer. Progression of initial infection to cervical cancer has been linked to sequence properties; however, knowledge of variants circulating in European populations, especially with longitudinal follow-up, is limited. By sequencing a number of infections with known follow-up for up to 3 years, we gained initial insights into the genetic diversity of HPV16 and the effects of the viral genome on the persistence of infections. A SNP comparison between sequences obtained from clearing and persistent infections did not identify strongly acting DNA variations responsible for these infection outcomes. In addition, we identified an HPV16 reinfection event where sequencing of initial and follow-up samples showed different HPV16 variants. Based on conventional genotyping, this infection would incorrectly be considered a persistent HPV16 infection. In the context of vaccine efficacy and monitoring studies, such infections could potentially cause reduced reported efficacy or efficiency. Copyright © 2017 van der Weele et al.

  18. Molecular signatures for the phylum Synergistetes and some of its subclades.

    PubMed

    Bhandari, Vaibhav; Gupta, Radhey S

    2012-11-01

    Species belonging to the phylum Synergistetes are poorly characterized. Though the known species display Gram-negative characteristics and the ability to ferment amino acids, no single characteristic is known which can define this group. For eight Synergistetes species, complete genome sequences or draft genomes have become available. We have used these genomes to construct detailed phylogenetic trees for the Synergistetes species and carried out comprehensive analysis to identify molecular markers consisting of conserved signature indels (CSIs) in protein sequences that are specific for either all Synergistetes or some of their sub-groups. We report here identification of 32 CSIs in widely distributed proteins such as RpoB, RpoC, UvrD, GyrA, PolA, PolC, MraW, NadD, PyrE, RpsA, RpsH, FtsA, RadA, etc., including a large >300 aa insert within the RpoC protein, that are present in various Synergistetes species, but except for isolated bacteria, these CSIs are not found in the protein homologues from any other organisms. These CSIs provide novel molecular markers that distinguish the species of the phylum Synergistetes from all other bacteria. The large numbers of other CSIs discovered in this work provide valuable information that supports and consolidates evolutionary relationships amongst the sequenced Synergistetes species. Of these CSIs, seven are specifically present in Jonquetella, Pyramidobacter and Dethiosulfovibrio species indicating a cladal relationship among them, which is also strongly supported by phylogenetic trees. A further 15 CSIs that are only present in Jonquetella and Pyramidobacter indicate a close association between these two species. Additionally, a previously described phylogenetic relationship between the Aminomonas and Thermanaerovibrio species was also supported by 9 CSIs. The strong relationships indicated by the indel analysis provide incentives for the grouping of species from these clades into higher taxonomic groups such as families or orders. The identified molecular markers, due to their specificity for Synergistetes and presence in highly conserved regions of important proteins suggest novel targets for evolutionary, genetic and biochemical studies on these bacteria as well as for the identification of additional species belonging to this phylum in different environments.

  19. Screening of broad spectrum natural pesticides against conserved target arginine kinase in cotton pests by molecular modeling.

    PubMed

    Sakthivel, Seethalakshmi; Habeeb, S K M; Raman, Chandrasekar

    2018-03-12

    Cotton is an economically important crop and its production is challenged by the diversity of pests and related insecticide resistance. Identification of the conserved target across the cotton pest will help to design broad spectrum insecticide. In this study, we have identified conserved sequences by Expressed Sequence Tag profiling from three cotton pests namely Aphis gossypii, Helicoverpa armigera, and Spodoptera exigua. One target protein arginine kinase having a key role in insect physiology and energy metabolism was studied further using homology modeling, virtual screening, molecular docking, and molecular dynamics simulation to identify potential biopesticide compounds from the Zinc natural database. We have identified four compounds having excellent inhibitor potential against the identified broad spectrum target which are highly specific to invertebrates.

  20. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans.

    PubMed

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-07-20

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  1. Identification of miRNAs and their targets in wild tomato at moderately and acutely elevated temperatures by high-throughput sequencing and degradome analysis

    PubMed Central

    Zhou, Rong; Wang, Qian; Jiang, Fangling; Cao, Xue; Sun, Mintao; Liu, Min; Wu, Zhen

    2016-01-01

    MicroRNAs (miRNAs) are 19–24 nucleotide (nt) noncoding RNAs that play important roles in abiotic stress responses in plants. High temperatures have been the subject of considerable attention due to their negative effects on plant growth and development. Heat-responsive miRNAs have been identified in some plants. However, there have been no reports on the global identification of miRNAs and their targets in tomato at high temperatures, especially at different elevated temperatures. Here, three small-RNA libraries and three degradome libraries were constructed from the leaves of the heat-tolerant tomato at normal, moderately and acutely elevated temperatures (26/18 °C, 33/33 °C and 40/40 °C, respectively). Following high-throughput sequencing, 662 conserved and 97 novel miRNAs were identified in total with 469 conserved and 91 novel miRNAs shared in the three small-RNA libraries. Of these miRNAs, 96 and 150 miRNAs were responsive to the moderately and acutely elevated temperature, respectively. Following degradome sequencing, 349 sequences were identified as targets of 138 conserved miRNAs, and 13 sequences were identified as targets of eight novel miRNAs. The expression levels of seven miRNAs and six target genes obtained by quantitative real-time PCR (qRT-PCR) were largely consistent with the sequencing results. This study enriches the number of heat-responsive miRNAs and lays a foundation for the elucidation of the miRNA-mediated regulatory mechanism in tomatoes at elevated temperatures. PMID:27653374

  2. Unravelling the complexity of microRNA-mediated gene regulation in black pepper (Piper nigrum L.) using high-throughput small RNA profiling.

    PubMed

    Asha, Srinivasan; Sreekumar, Sweda; Soniya, E V

    2016-01-01

    Analysis of high-throughput small RNA deep sequencing data, in combination with black pepper transcriptome sequences revealed microRNA-mediated gene regulation in black pepper ( Piper nigrum L.). Black pepper is an important spice crop and its berries are used worldwide as a natural food additive that contributes unique flavour to foods. In the present study to characterize microRNAs from black pepper, we generated a small RNA library from black pepper leaf and sequenced it by Illumina high-throughput sequencing technology. MicroRNAs belonging to a total of 303 conserved miRNA families were identified from the sRNAome data. Subsequent analysis from recently sequenced black pepper transcriptome confirmed precursor sequences of 50 conserved miRNAs and four potential novel miRNA candidates. Stem-loop qRT-PCR experiments demonstrated differential expression of eight conserved miRNAs in black pepper. Computational analysis of targets of the miRNAs showed 223 potential black pepper unigene targets that encode diverse transcription factors and enzymes involved in plant development, disease resistance, metabolic and signalling pathways. RLM-RACE experiments further mapped miRNA-mediated cleavage at five of the mRNA targets. In addition, miRNA isoforms corresponding to 18 miRNA families were also identified from black pepper. This study presents the first large-scale identification of microRNAs from black pepper and provides the foundation for the future studies of miRNA-mediated gene regulation of stress responses and diverse metabolic processes in black pepper.

  3. Multiple sequence alignment using multi-objective based bacterial foraging optimization algorithm.

    PubMed

    Rani, R Ranjani; Ramyachitra, D

    2016-12-01

    Multiple sequence alignment (MSA) is a widespread approach in computational biology and bioinformatics. MSA deals with how the sequences of nucleotides and amino acids are sequenced with possible alignment and minimum number of gaps between them, which directs to the functional, evolutionary and structural relationships among the sequences. Still the computation of MSA is a challenging task to provide an efficient accuracy and statistically significant results of alignments. In this work, the Bacterial Foraging Optimization Algorithm was employed to align the biological sequences which resulted in a non-dominated optimal solution. It employs Multi-objective, such as: Maximization of Similarity, Non-gap percentage, Conserved blocks and Minimization of gap penalty. BAliBASE 3.0 benchmark database was utilized to examine the proposed algorithm against other methods In this paper, two algorithms have been proposed: Hybrid Genetic Algorithm with Artificial Bee Colony (GA-ABC) and Bacterial Foraging Optimization Algorithm. It was found that Hybrid Genetic Algorithm with Artificial Bee Colony performed better than the existing optimization algorithms. But still the conserved blocks were not obtained using GA-ABC. Then BFO was used for the alignment and the conserved blocks were obtained. The proposed Multi-Objective Bacterial Foraging Optimization Algorithm (MO-BFO) was compared with widely used MSA methods Clustal Omega, Kalign, MUSCLE, MAFFT, Genetic Algorithm (GA), Ant Colony Optimization (ACO), Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO) and Hybrid Genetic Algorithm with Artificial Bee Colony (GA-ABC). The final results show that the proposed MO-BFO algorithm yields better alignment than most widely used methods. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  4. Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis

    PubMed Central

    Du, Yushen; Wu, Nicholas C.; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting

    2016-01-01

    ABSTRACT Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. PMID:27803181

  5. In silico Analysis of 3′-End-Processing Signals in Aspergillus oryzae Using Expressed Sequence Tags and Genomic Sequencing Data

    PubMed Central

    Tanaka, Mizuki; Sakai, Yoshifumi; Yamada, Osamu; Shintani, Takahiro; Gomi, Katsuya

    2011-01-01

    To investigate 3′-end-processing signals in Aspergillus oryzae, we created a nucleotide sequence data set of the 3′-untranslated region (3′ UTR) plus 100 nucleotides (nt) sequence downstream of the poly(A) site using A. oryzae expressed sequence tags and genomic sequencing data. This data set comprised 1065 sequences derived from 1042 unique genes. The average 3′ UTR length in A. oryzae was 241 nt, which is greater than that in yeast but similar to that in plants. The 3′ UTR and 100 nt sequence downstream of the poly(A) site is notably U-rich, while the region located 15–30 nt upstream of the poly(A) site is markedly A-rich. The most frequently found hexanucleotide in this A-rich region is AAUGAA, although this sequence accounts for only 6% of all transcripts. These data suggested that A. oryzae has no highly conserved sequence element equivalent to AAUAAA, a mammalian polyadenylation signal. We identified that putative 3′-end-processing signals in A. oryzae, while less well conserved than those in mammals, comprised four sequence elements: the furthest upstream U-rich element, A-rich sequence, cleavage site, and downstream U-rich element flanking the cleavage site. Although these putative 3′-end-processing signals are similar to those in yeast and plants, some notable differences exist between them. PMID:21586533

  6. Incorporating evolution of transcription factor binding sites into annotated alignments.

    PubMed

    Bais, Abha S; Grossmann, Stefen; Vingron, Martin

    2007-08-01

    Identifying transcription factor binding sites (TFBSs) is essential to elucidate putative regulatory mechanisms. A common strategy is to combine cross-species conservation with single sequence TFBS annotation to yield "conserved TFBSs". Most current methods in this field adopt a multi-step approach that segregates the two aspects. Again, it is widely accepted that the evolutionary dynamics of binding sites differ from those of the surrounding sequence. Hence, it is desirable to have an approach that explicitly takes this factor into account. Although a plethora of approaches have been proposed for the prediction of conserved TFBSs, very few explicitly model TFBS evolutionary properties, while additionally being multi-step. Recently, we introduced a novel approach to simultaneously align and annotate conserved TFBSs in a pair of sequences. Building upon the standard Smith-Waterman algorithm for local alignments, SimAnn introduces additional states for profiles to output extended alignments or annotated alignments. That is, alignments with parts annotated as gaplessly aligned TFBSs (pair-profile hits)are generated. Moreover,the pair- profile related parameters are derived in a sound statistical framework. In this article, we extend this approach to explicitly incorporate evolution of binding sites in the SimAnn framework. We demonstrate the extension in the theoretical derivations through two position-specific evolutionary models, previously used for modelling TFBS evolution. In a simulated setting, we provide a proof of concept that the approach works given the underlying assumptions,as compared to the original work. Finally, using a real dataset of experimentally verified binding sites in human-mouse sequence pairs,we compare the new approach (eSimAnn) to an existing multi-step tool that also considers TFBS evolution. Although it is widely accepted that binding sites evolve differently from the surrounding sequences, most comparative TFBS identification methods do not explicitly consider this.Additionally, prediction of conserved binding sites is carried out in a multi-step approach that segregates alignment from TFBS annotation. In this paper, we demonstrate how the simultaneous alignment and annotation approach of SimAnn can be further extended to incorporate TFBS evolutionary relationships. We study how alignments and binding site predictions interplay at varying evolutionary distances and for various profile qualities.

  7. Combined sequence and structure analysis of the fungal laccase family.

    PubMed

    Kumar, S V Suresh; Phale, Prashant S; Durani, S; Wangikar, Pramod P

    2003-08-20

    Plant and fungal laccases belong to the family of multi-copper oxidases and show much broader substrate specificity than other members of the family. Laccases have consequently been of interest for potential industrial applications. We have analyzed the essential sequence features of fungal laccases based on multiple sequence alignments of more than 100 laccases. This has resulted in identification of a set of four ungapped sequence regions, L1-L4, as the overall signature sequences that can be used to identify the laccases, distinguishing them within the broader class of multi-copper oxidases. The 12 amino acid residues in the enzymes serving as the copper ligands are housed within these four identified conserved regions, of which L2 and L4 conform to the earlier reported copper signature sequences of multi-copper oxidases while L1 and L3 are distinctive to the laccases. The mapping of regions L1-L4 on to the three-dimensional structure of the Coprinus cinerius laccase indicates that many of the non-copper-ligating residues of the conserved regions could be critical in maintaining a specific, more or less C-2 symmetric, protein conformational motif characterizing the active site apparatus of the enzymes. The observed intraprotein homologies between L1 and L3 and between L2 and L4 at both the structure and the sequence levels suggest that the quasi C-2 symmetric active site conformational motif may have arisen from a structural duplication event that neither the sequence homology analysis nor the structure homology analysis alone would have unraveled. Although the sequence and structure homology is not detectable in the rest of the protein, the relative orientation of region L1 with L2 is similar to that of L3 with L4. The structure duplication of first-shell and second-shell residues has become cryptic because the intraprotein sequence homology noticeable for a given laccase becomes significant only after comparing the conservation pattern in several fungal laccases. The identified motifs, L1-L4, can be useful in searching the newly sequenced genomes for putative laccase enzymes. Copyright 2003 Wiley Periodicals, Inc. Biotechnol Bioeng 83: 386-394, 2003.

  8. Finding functional features in Saccharomyces genomes by phylogenetic footprinting.

    PubMed

    Cliften, Paul; Sudarsanam, Priya; Desikan, Ashwin; Fulton, Lucinda; Fulton, Bob; Majors, John; Waterston, Robert; Cohen, Barak A; Johnston, Mark

    2003-07-04

    The sifting and winnowing of DNA sequence that occur during evolution cause nonfunctional sequences to diverge, leaving phylogenetic footprints of functional sequence elements in comparisons of genome sequences. We searched for such footprints among the genome sequences of six Saccharomyces species and identified potentially functional sequences. Comparison of these sequences allowed us to revise the catalog of yeast genes and identify sequence motifs that may be targets of transcriptional regulatory proteins. Some of these conserved sequence motifs reside upstream of genes with similar functional annotations or similar expression patterns or those bound by the same transcription factor and are thus good candidates for functional regulatory sequences.

  9. Plastome Sequence Determination and Comparative Analysis for Members of the Lolium-Festuca Grass Species Complex

    PubMed Central

    Hand, Melanie L.; Spangenberg, German C.; Forster, John W.; Cogan, Noel O. I.

    2013-01-01

    Chloroplast genome sequences are of broad significance in plant biology, due to frequent use in molecular phylogenetics, comparative genomics, population genetics, and genetic modification studies. The present study used a second-generation sequencing approach to determine and assemble the plastid genomes (plastomes) of four representatives from the agriculturally important Lolium-Festuca species complex of pasture grasses (Lolium multiflorum, Festuca pratensis, Festuca altissima, and Festuca ovina). Total cellular DNA was extracted from either roots or leaves, was sequenced, and the output was filtered for plastome-related reads. A comparison between sources revealed fewer plastome-related reads from root-derived template but an increase in incidental bacterium-derived sequences. Plastome assembly and annotation indicated high levels of sequence identity and a conserved organization and gene content between species. However, frequent deletions within the F. ovina plastome appeared to contribute to a smaller plastid genome size. Comparative analysis with complete plastome sequences from other members of the Poaceae confirmed conservation of most grass-specific features. Detailed analysis of the rbcL–psaI intergenic region, however, revealed a “hot-spot” of variation characterized by independent deletion events. The evolutionary implications of this observation are discussed. The complete plastome sequences are anticipated to provide the basis for potential organelle-specific genetic modification of pasture grasses. PMID:23550121

  10. rpoB Gene Sequence-Based Identification of Aerobic Gram-Positive Cocci of the Genera Streptococcus, Enterococcus, Gemella, Abiotrophia, and Granulicatella

    PubMed Central

    Drancourt, Michel; Roux, Véronique; Fournier, Pierre-Edouard; Raoult, Didier

    2004-01-01

    We developed a new molecular tool based on rpoB gene (encoding the beta subunit of RNA polymerase) sequencing to identify streptococci. We first sequenced the complete rpoB gene for Streptococcus anginosus, S. equinus, and Abiotrophia defectiva. Sequences were aligned with these of S. pyogenes, S. agalactiae, and S. pneumoniae available in GenBank. Using an in-house analysis program (SVARAP), we identified a 740-bp variable region surrounded by conserved, 20-bp zones and, by using these conserved zones as PCR primer targets, we amplified and sequenced this variable region in an additional 30 Streptococcus, Enterococcus, Gemella, Granulicatella, and Abiotrophia species. This region exhibited 71.2 to 99.3% interspecies homology. We therefore applied our identification system by PCR amplification and sequencing to a collection of 102 streptococci and 60 bacterial isolates belonging to other genera. Amplicons were obtained in streptococci and Bacillus cereus, and sequencing allowed us to make a correct identification of streptococci. Molecular signatures were determined for the discrimination of closely related species within the S. pneumoniae-S. oralis-S. mitis group and the S. agalactiae-S. difficile group. These signatures allowed us to design a S. pneumoniae-specific PCR and sequencing primer pair. PMID:14766807

  11. Maintenance of an Intact Human Immunodeficiency Virus Type 1 vpr Gene following Mother-to-Infant Transmission

    PubMed Central

    Yedavalli, Venkat R. K.; Chappey, Colombe; Ahmad, Nafees

    1998-01-01

    The vpr sequences from six human immunodeficiency virus type 1 (HIV-1)-infected mother-infant pairs following perinatal transmission were analyzed. We found that 153 of the 166 clones analyzed from uncultured peripheral blood mononuclear cell DNA samples showed a 92.17% frequency of intact vpr open reading frames. There was a low degree of heterogeneity of vpr genes within mothers, within infants, and between epidemiologically linked mother-infant pairs. The distances between vpr sequences were greater in epidemiologically unlinked individuals than in epidemiologically linked mother-infant pairs. Moreover, the infants’ sequences displayed patterns similar to those seen in their mothers. The functional domains essential for Vpr activity, including virion incorporation, nuclear import, and cell cycle arrest and differentiation were highly conserved in most of the sequences. Phylogenetic analyses of 166 mother-infant pairs and 195 other available vpr sequences from HIV databases formed distinct clusters for each mother-infant pair and for other vpr sequences and grouped the six mother-infant pairs’ sequences with subtype B sequences. A high degree of conservation of intact and functional vpr supports the notion that vpr plays an important role in HIV-1 infection and replication in mother-infant isolates that are involved in perinatal transmission. PMID:9658150

  12. Sequence analysis of Jembrana disease virus strains reveals a genetically stable lentivirus.

    PubMed

    Desport, Moira; Stewart, Meredith E; Mikosza, Andrew S; Sheridan, Carol A; Peterson, Shane E; Chavand, Olivier; Hartaningsih, Nining; Wilcox, Graham E

    2007-06-01

    Jembrana disease virus (JDV) is a lentivirus associated with an acute disease syndrome with a 20% case fatality rate in Bos javanicus (Bali cattle) in Indonesia, occurring after a short incubation period and with no recurrence of the disease after recovery. Partial regions of gag and pol and the entire env were examined for sequence variation in DNA samples from cases of Jembrana disease obtained from Bali, Sumatra and South Kalimantan in Indonesian Borneo. A high level of nucleotide conservation (97-100%) was observed in gag sequences from samples taken in Bali and Sumatra, indicating that the source of JDV in Sumatra was most likely to have originated from Bali. The pol sequences and, unexpectedly, the env sequences from Bali samples were also well conserved with low nucleotide (96-99%) and amino acid substitutions (95-99%). However, the sample from South Kalimantan (JDV(KAL/01)) contained more divergent sequences, particularly in env (88% identity). Phylogenetic analysis revealed that the JDV(KAL/01)env sequences clustered with the sequence from the Pulukan sample (Bali) from 2001. JDV appears to be remarkably stable genetically and has undergone minor genetic changes over a period of nearly 20 years in Bali despite becoming endemic in the cattle population of the island.

  13. Community structure and distribution of benthic cyanobacteria in Antarctic lacustrine microbial mats.

    PubMed

    Pessi, Igor S; Lara, Yannick; Durieu, Benoit; Maalouf, Pedro de C; Verleyen, Elie; Wilmotte, Annick

    2018-05-01

    The terrestrial Antarctic Realm has recently been divided into 16 Antarctic Conservation Biogeographic Regions (ACBRs) based on environmental properties and the distribution of biota. Despite their prominent role in the primary production and nutrient cycling in Antarctic lakes, cyanobacteria were only poorly represented in the biological dataset used to delineate these ACBRs. Here, we provide a first high-throughput sequencing insight into the spatial distribution of benthic cyanobacterial communities in Antarctic lakes located in four distinct, geographically distant ACBRs and covering a range of limnological conditions. Cyanobacterial community structure differed between saline and freshwater lakes. No clear bioregionalization was observed, as clusters of community similarity encompassed lakes from distinct ACBRs. Most phylotypes (77.0%) were related to cyanobacterial lineages (defined at ≥99.0% 16S rRNA gene sequence similarity) restricted to the cold biosphere, including lineages potentially endemic to Antarctica (55.4%). The latter were generally rare and restricted to a small number of lakes, while more ubiquitous phylotypes were generally abundant and present in different ACBRs. These results point to a widespread distribution of some cosmopolitan cyanobacterial phylotypes across the different Antarctic ice-free regions, but also suggest the existence of dispersal barriers both within and between Antarctica and the other continents.

  14. RNA editing in nascent RNA affects pre-mRNA splicing

    PubMed Central

    Hsiao, Yun-Hua Esther; Bahn, Jae Hoon; Yang, Yun; Lin, Xianzhi; Tran, Stephen; Yang, Ei-Wen; Quinones-Valdez, Giovanni

    2018-01-01

    In eukaryotes, nascent RNA transcripts undergo an intricate series of RNA processing steps to achieve mRNA maturation. RNA editing and alternative splicing are two major RNA processing steps that can introduce significant modifications to the final gene products. By tackling these processes in isolation, recent studies have enabled substantial progress in understanding their global RNA targets and regulatory pathways. However, the interplay between individual steps of RNA processing, an essential aspect of gene regulation, remains poorly understood. By sequencing the RNA of different subcellular fractions, we examined the timing of adenosine-to-inosine (A-to-I) RNA editing and its impact on alternative splicing. We observed that >95% A-to-I RNA editing events occurred in the chromatin-associated RNA prior to polyadenylation. We report about 500 editing sites in the 3′ acceptor sequences that can alter splicing of the associated exons. These exons are highly conserved during evolution and reside in genes with important cellular function. Furthermore, we identified a second class of exons whose splicing is likely modulated by RNA secondary structures that are recognized by the RNA editing machinery. The genome-wide analyses, supported by experimental validations, revealed remarkable interplay between RNA editing and splicing and expanded the repertoire of functional RNA editing sites. PMID:29724793

  15. Nuclear accumulation of the Arabidopsis immune receptor RPS4 is necessary for triggering EDS1-dependent defense.

    PubMed

    Wirthmueller, Lennart; Zhang, Yan; Jones, Jonathan D G; Parker, Jane E

    2007-12-04

    Recognition of specific pathogen molecules inside the cell by nucleotide-binding domain and leucine-rich repeat (NB-LRR) receptors constitutes an important layer of innate immunity in plants. Receptor activation triggers host cellular reprogramming involving transcriptional potentiation of basal defenses and localized programmed cell death. The sites and modes of action of NB-LRR receptors are, however, poorly understood. Arabidopsis Toll/Interleukin-1 (TIR) type NB-LRR receptor RPS4 recognizes the bacterial type III effector AvrRps4. We show that epitope-tagged RPS4 expressed under its native regulatory sequences distributes between endomembranes and nuclei in healthy and AvrRps4-triggered tissues. RPS4 accumulation in the nucleus, mediated by a bipartite nuclear localization sequence (NLS) at its C terminus, is necessary for triggering immunity through authentic activation by AvrRps4 in Arabidopsis or as an effector-independent "deregulated" receptor in tobacco. A strikingly conserved feature of TIR-NB-LRR receptors is their recruitment of the nucleocytoplasmic basal-defense regulator EDS1 in resistance to diverse pathogens. We find that EDS1 is an indispensable component of RPS4 signaling and that it functions downstream of RPS4 activation but upstream of RPS4-mediated transcriptional reprogramming in the nucleus.

  16. A cost-effective high-throughput metabarcoding approach powerful enough to genotype ~44 000 year-old rodent remains from Northern Africa.

    PubMed

    Guimaraes, S; Pruvost, M; Daligault, J; Stoetzel, E; Bennett, E A; Côté, N M-L; Nicolas, V; Lalis, A; Denys, C; Geigl, E-M; Grange, T

    2017-05-01

    We present a cost-effective metabarcoding approach, aMPlex Torrent, which relies on an improved multiplex PCR adapted to highly degraded DNA, combining barcoding and next-generation sequencing to simultaneously analyse many heterogeneous samples. We demonstrate the strength of these improvements by generating a phylochronology through the genotyping of ancient rodent remains from a Moroccan cave whose stratigraphy covers the last 120 000 years. Rodents are important for epidemiology, agronomy and ecological investigations and can act as bioindicators for human- and/or climate-induced environmental changes. Efficient and reliable genotyping of ancient rodent remains has the potential to deliver valuable phylogenetic and paleoecological information. The analysis of multiple ancient skeletal remains of very small size with poor DNA preservation, however, requires a sensitive high-throughput method to generate sufficient data. We show this approach to be particularly adapted at accessing this otherwise difficult taxonomic and genetic resource. As a highly scalable, lower cost and less labour-intensive alternative to targeted sequence capture approaches, we propose the aMPlex Torrent strategy to be a useful tool for the genetic analysis of multiple degraded samples in studies involving ecology, archaeology, conservation and evolutionary biology. © 2016 John Wiley & Sons Ltd.

  17. A cluster of novel serotonin receptor 3-like genes on human chromosome 3.

    PubMed

    Karnovsky, Alla M; Gotow, Lisa F; McKinley, Denise D; Piechan, Julie L; Ruble, Cara L; Mills, Cynthia J; Schellin, Kathleen A B; Slightom, Jerry L; Fitzgerald, Laura R; Benjamin, Christopher W; Roberds, Steven L

    2003-11-13

    The ligand-gated ion channel family includes receptors for serotonin (5-hydroxytryptamine, 5-HT), acetylcholine, GABA, and glutamate. Drugs targeting subtypes of these receptors have proven useful for the treatment of various neuropsychiatric and neurological disorders. To identify new ligand-gated ion channels as potential therapeutic targets, drafts of human genome sequence were interrogated. Portions of four novel genes homologous to 5-HT(3A) and 5-HT(3B) receptors were identified within human sequence databases. We named the genes 5-HT(3C1)-5-HT(3C4). Radiation hybrid (RH) mapping localized these genes to chromosome 3q27-28. All four genes shared similar intron-exon organizations and predicted protein secondary structure with 5-HT(3A) and 5-HT(3B). Orthologous genes were detected by Southern blotting in several species including dog, cow, and chicken, but not in rodents, suggesting that these novel genes are not present in rodents or are very poorly conserved. Two of the novel genes are predicted to be pseudogenes, but two other genes are transcribed and spliced to form appropriate open reading frames. The 5-HT(3C1) transcript is expressed almost exclusively in small intestine and colon, suggesting a possible role in the serotonin-responsiveness of the gut.

  18. Implications of Hybridization, NUMTs, and Overlooked Diversity for DNA Barcoding of Eurasian Ground Squirrels

    PubMed Central

    Ermakov, Oleg A.; Simonov, Evgeniy; Surin, Vadim L.; Titov, Sergey V.; Brandler, Oleg V.; Ivanova, Natalia V.; Borisenko, Alex V.

    2015-01-01

    The utility of DNA Barcoding for species identification and discovery has catalyzed a concerted effort to build the global reference library; however, many animal groups of economical or conservational importance remain poorly represented. This study aims to contribute DNA barcode records for all ground squirrel species (Xerinae, Sciuridae, Rodentia) inhabiting Eurasia and to test efficiency of this approach for species discrimination. Cytochrome c oxidase subunit 1 (COI) gene sequences were obtained for 97 individuals representing 16 ground squirrel species of which 12 were correctly identified. Taxonomic allocation of some specimens within four species was complicated by geographically restricted mtDNA introgression. Exclusion of individuals with introgressed mtDNA allowed reaching a 91.6% identification success rate. Significant COI divergence (3.5–4.4%) was observed within the most widespread ground squirrel species (Spermophilus erythrogenys, S. pygmaeus, S. suslicus, Urocitellus undulatus), suggesting the presence of cryptic species. A single putative NUMT (nuclear mitochondrial pseudogene) sequence was recovered during molecular analysis; mitochondrial COI from this sample was amplified following re-extraction of DNA. Our data show high discrimination ability of 100 bp COI fragments for Eurasian ground squirrels (84.3%) with no incorrect assessments, underscoring the potential utility of the existing reference librariy for the development of diagnostic ‘mini-barcodes’. PMID:25617768

  19. RNA editing in nascent RNA affects pre-mRNA splicing.

    PubMed

    Hsiao, Yun-Hua Esther; Bahn, Jae Hoon; Yang, Yun; Lin, Xianzhi; Tran, Stephen; Yang, Ei-Wen; Quinones-Valdez, Giovanni; Xiao, Xinshu

    2018-06-01

    In eukaryotes, nascent RNA transcripts undergo an intricate series of RNA processing steps to achieve mRNA maturation. RNA editing and alternative splicing are two major RNA processing steps that can introduce significant modifications to the final gene products. By tackling these processes in isolation, recent studies have enabled substantial progress in understanding their global RNA targets and regulatory pathways. However, the interplay between individual steps of RNA processing, an essential aspect of gene regulation, remains poorly understood. By sequencing the RNA of different subcellular fractions, we examined the timing of adenosine-to-inosine (A-to-I) RNA editing and its impact on alternative splicing. We observed that >95% A-to-I RNA editing events occurred in the chromatin-associated RNA prior to polyadenylation. We report about 500 editing sites in the 3' acceptor sequences that can alter splicing of the associated exons. These exons are highly conserved during evolution and reside in genes with important cellular function. Furthermore, we identified a second class of exons whose splicing is likely modulated by RNA secondary structures that are recognized by the RNA editing machinery. The genome-wide analyses, supported by experimental validations, revealed remarkable interplay between RNA editing and splicing and expanded the repertoire of functional RNA editing sites. © 2018 Hsiao et al.; Published by Cold Spring Harbor Laboratory Press.

  20. Homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function.

    PubMed

    Busk, P K; Pilgaard, B; Lezyk, M J; Meyer, A S; Lange, L

    2017-04-12

    Carbohydrate-active enzymes are found in all organisms and participate in key biological processes. These enzymes are classified in 274 families in the CAZy database but the sequence diversity within each family makes it a major task to identify new family members and to provide basis for prediction of enzyme function. A fast and reliable method for de novo annotation of genes encoding carbohydrate-active enzymes is to identify conserved peptides in the curated enzyme families followed by matching of the conserved peptides to the sequence of interest as demonstrated for the glycosyl hydrolase and the lytic polysaccharide monooxygenase families. This approach not only assigns the enzymes to families but also provides functional prediction of the enzymes with high accuracy. We identified conserved peptides for all enzyme families in the CAZy database with Peptide Pattern Recognition. The conserved peptides were matched to protein sequence for de novo annotation and functional prediction of carbohydrate-active enzymes with the Hotpep method. Annotation of protein sequences from 12 bacterial and 16 fungal genomes to families with Hotpep had an accuracy of 0.84 (measured as F1-score) compared to semiautomatic annotation by the CAZy database whereas the dbCAN HMM-based method had an accuracy of 0.77 with optimized parameters. Furthermore, Hotpep provided a functional prediction with 86% accuracy for the annotated genes. Hotpep is available as a stand-alone application for MS Windows. Hotpep is a state-of-the-art method for automatic annotation and functional prediction of carbohydrate-active enzymes.

  1. CRISPR Diversity and Microevolution in Clostridium difficile

    PubMed Central

    Andersen, Joakim M.; Shoup, Madelyn; Robinson, Cathy; Britton, Robert; Olsen, Katharina E.P.; Barrangou, Rodolphe

    2016-01-01

    Abstract Virulent strains of Clostridium difficile have become a global health problem associated with morbidity and mortality. Traditional typing methods do not provide ideal resolution to track outbreak strains, ascertain genetic diversity between isolates, or monitor the phylogeny of this species on a global basis. Here, we investigate the occurrence and diversity of clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated genes (cas) in C. difficile to assess the potential of CRISPR-based phylogeny and high-resolution genotyping. A single Type-IB CRISPR-Cas system was identified in 217 analyzed genomes with cas gene clusters present at conserved chromosomal locations, suggesting vertical evolution of the system, assessing a total of 1,865 CRISPR arrays. The CRISPR arrays, markedly enriched (8.5 arrays/genome) compared with other species, occur both at conserved and variable locations across strains, and thus provide a basis for typing based on locus occurrence and spacer polymorphism. Clustering of strains by array composition correlated with sequence type (ST) analysis. Spacer content and polymorphism within conserved CRISPR arrays revealed phylogenetic relationship across clades and within ST. Spacer polymorphisms of conserved arrays were instrumental for differentiating closely related strains, e.g., ST1/RT027/B1 strains and pathogenicity locus encoding ST3/RT001 strains. CRISPR spacers showed sequence similarity to phage sequences, which is consistent with the native role of CRISPR-Cas as adaptive immune systems in bacteria. Overall, CRISPR-Cas sequences constitute a valuable basis for genotyping of C. difficile isolates, provide insights into the micro-evolutionary events that occur between closely related strains, and reflect the evolutionary trajectory of these genomes. PMID:27576538

  2. Computational approaches for identification of conserved/unique binding pockets in the A chain of ricin

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ecale Zhou, C L; Zemla, A T; Roe, D

    2005-01-29

    Specific and sensitive ligand-based protein detection assays that employ antibodies or small molecules such as peptides, aptamers, or other small molecules require that the corresponding surface region of the protein be accessible and that there be minimal cross-reactivity with non-target proteins. To reduce the time and cost of laboratory screening efforts for diagnostic reagents, we developed new methods for evaluating and selecting protein surface regions for ligand targeting. We devised combined structure- and sequence-based methods for identifying 3D epitopes and binding pockets on the surface of the A chain of ricin that are conserved with respect to a set ofmore » ricin A chains and unique with respect to other proteins. We (1) used structure alignment software to detect structural deviations and extracted from this analysis the residue-residue correspondence, (2) devised a method to compare corresponding residues across sets of ricin structures and structures of closely related proteins, (3) devised a sequence-based approach to determine residue infrequency in local sequence context, and (4) modified a pocket-finding algorithm to identify surface crevices in close proximity to residues determined to be conserved/unique based on our structure- and sequence-based methods. In applying this combined informatics approach to ricin A we identified a conserved/unique pocket in close proximity (but not overlapping) the active site that is suitable for bi-dentate ligand development. These methods are generally applicable to identification of surface epitopes and binding pockets for development of diagnostic reagents, therapeutics, and vaccines.« less

  3. Taxonomic uncertainty and the loss of biodiversity on Christmas Island, Indian Ocean.

    PubMed

    Eldridge, Mark D B; Meek, Paul D; Johnson, Rebecca N

    2014-04-01

    The taxonomic uniqueness of island populations is often uncertain which hinders effective prioritization for conservation. The Christmas Island shrew (Crocidura attenuata trichura) is the only member of the highly speciose eutherian family Soricidae recorded from Australia. It is currently classified as a subspecies of the Asian gray or long-tailed shrew (C. attenuata), although it was originally described as a subspecies of the southeast Asian white-toothed shrew (C. fuliginosa). The Christmas Island shrew is currently listed as endangered and has not been recorded in the wild since 1984-1985, when 2 specimens were collected after an 80-year absence. We aimed to obtain DNA sequence data for cytochrome b (cytb) from Christmas Island shrew museum specimens to determine their taxonomic affinities and to confirm the identity of the 1980s specimens. The Cytb sequences from 5, 1898 specimens and a 1985 specimen were identical. In addition, the Christmas Island shrew cytb sequence was divergent at the species level from all available Crocidura cytb sequences. Rather than a population of a widespread species, current evidence suggests the Christmas Island shrew is a critically endangered endemic species, C. trichura, and a high priority for conservation. As the decisions typically required to save declining species can be delayed or deferred if the taxonomic status of the population in question is uncertain, it is hoped that the history of the Christmas Island shrew will encourage the clarification of taxonomy to be seen as an important first step in initiating informed and effective conservation action. © 2013 Society for Conservation Biology.

  4. Novel candidate genes may be possible predisposing factors revealed by whole exome sequencing in familial esophageal squamous cell carcinoma.

    PubMed

    Forouzanfar, Narjes; Baranova, Ancha; Milanizadeh, Saman; Heravi-Moussavi, Alireza; Jebelli, Amir; Abbaszadegan, Mohammad Reza

    2017-05-01

    Esophageal squamous cell carcinoma is one of the deadliest of all the cancers. Its metastatic properties portend poor prognosis and high rate of recurrence. A more advanced method to identify new molecular biomarkers predicting disease prognosis can be whole exome sequencing. Here, we report the most effective genetic variants of the Notch signaling pathway in esophageal squamous cell carcinoma susceptibility by whole exome sequencing. We analyzed nine probands in unrelated familial esophageal squamous cell carcinoma pedigrees to identify candidate genes. Genomic DNA was extracted and whole exome sequencing performed to generate information about genetic variants in the coding regions. Bioinformatics software applications were utilized to exploit statistical algorithms to demonstrate protein structure and variants conservation. Polymorphic regions were excluded by false-positive investigations. Gene-gene interactions were analyzed for Notch signaling pathway candidates. We identified novel and damaging variants of the Notch signaling pathway through extensive pathway-oriented filtering and functional predictions, which led to the study of 27 candidate novel mutations in all nine patients. Detection of the trinucleotide repeat containing 6B gene mutation (a slice site alteration) in five of the nine probands, but not in any of the healthy samples, suggested that it may be a susceptibility factor for familial esophageal squamous cell carcinoma. Noticeably, 8 of 27 novel candidate gene mutations (e.g. epidermal growth factor, signal transducer and activator of transcription 3, MET) act in a cascade leading to cell survival and proliferation. Our results suggest that the trinucleotide repeat containing 6B mutation may be a candidate predisposing gene in esophageal squamous cell carcinoma. In addition, some of the Notch signaling pathway genetic mutations may act as key contributors to esophageal squamous cell carcinoma.

  5. A Novel Mutation in ERCC8 Gene Causing Cockayne Syndrome

    PubMed Central

    Taghdiri, Maryam; Dastsooz, Hassan; Fardaei, Majid; Mohammadi, Sanaz; Farazi Fard, Mohammad Ali; Faghihi, Mohammad Ali

    2017-01-01

    Cockayne syndrome (CS) is a rare autosomal recessive multisystem disorder characterized by impaired neurological and sensory functions, cachectic dwarfism, microcephaly, and photosensitivity. This syndrome shows a variable age of onset and rate of progression, and its phenotypic spectrum include a wide range of severity. Due to the progressive nature of this disorder, diagnosis can be more important when additional signs and symptoms appear gradually and become steadily worse over time. Therefore, mutation analysis of genes involved in CS pathogenesis can be helpful to confirm the suspected clinical diagnosis. Here, we report a novel mutation in ERCC8 gene in a 16-year-old boy who suffers from poor weight gain, short stature, microcephaly, intellectual disability, and photosensitivity. The patient was born to consanguineous family with no previous documented disease in his parents. To identify disease-causing mutation in the patient, whole exome sequencing utilizing next-generation sequencing on an Illumina HiSeq 2000 platform was performed. Results revealed a novel homozygote mutation in ERCC8 gene (NM_000082: exon 11, c.1122G>C) in our patient. Another gene (ERCC6), which is also involved in CS did not have any disease-causing mutations in the proband. The new identified mutation was then confirmed by Sanger sequencing in the proband, his parents, and extended family members, confirming co-segregation with the disease. In addition, different bioinformatics programs which included MutationTaster, I-Mutant v2.0, NNSplice, Combined Annotation Dependent Depletion, The PhastCons, Genomic Evolutationary Rate Profiling conservation score, and T-Coffee Multiple Sequence Alignment predicted the pathogenicity of the mutation. Our study identified a rare novel mutation in ERCC8 gene and help to provide accurate genetic counseling and prenatal diagnosis to minimize new affected individuals in this family. PMID:28848724

  6. A Novel Mutation in ERCC8 Gene Causing Cockayne Syndrome.

    PubMed

    Taghdiri, Maryam; Dastsooz, Hassan; Fardaei, Majid; Mohammadi, Sanaz; Farazi Fard, Mohammad Ali; Faghihi, Mohammad Ali

    2017-01-01

    Cockayne syndrome (CS) is a rare autosomal recessive multisystem disorder characterized by impaired neurological and sensory functions, cachectic dwarfism, microcephaly, and photosensitivity. This syndrome shows a variable age of onset and rate of progression, and its phenotypic spectrum include a wide range of severity. Due to the progressive nature of this disorder, diagnosis can be more important when additional signs and symptoms appear gradually and become steadily worse over time. Therefore, mutation analysis of genes involved in CS pathogenesis can be helpful to confirm the suspected clinical diagnosis. Here, we report a novel mutation in ERCC8 gene in a 16-year-old boy who suffers from poor weight gain, short stature, microcephaly, intellectual disability, and photosensitivity. The patient was born to consanguineous family with no previous documented disease in his parents. To identify disease-causing mutation in the patient, whole exome sequencing utilizing next-generation sequencing on an Illumina HiSeq 2000 platform was performed. Results revealed a novel homozygote mutation in ERCC8 gene (NM_000082: exon 11, c.1122G>C) in our patient. Another gene ( ERCC6 ), which is also involved in CS did not have any disease-causing mutations in the proband. The new identified mutation was then confirmed by Sanger sequencing in the proband, his parents, and extended family members, confirming co-segregation with the disease. In addition, different bioinformatics programs which included MutationTaster, I-Mutant v2.0, NNSplice, Combined Annotation Dependent Depletion, The PhastCons, Genomic Evolutationary Rate Profiling conservation score, and T-Coffee Multiple Sequence Alignment predicted the pathogenicity of the mutation. Our study identified a rare novel mutation in ERCC8 gene and help to provide accurate genetic counseling and prenatal diagnosis to minimize new affected individuals in this family.

  7. Regulatory versus coding signatures of natural selection in a candidate gene involved in the adaptive divergence of whitefish species pairs (Coregonus spp.)

    PubMed Central

    Jeukens, Julie; Bernatchez, Louis

    2012-01-01

    While gene expression divergence is known to be involved in adaptive phenotypic divergence and speciation, the relative importance of regulatory and structural evolution of genes is poorly understood. A recent next-generation sequencing experiment allowed identifying candidate genes potentially involved in the ongoing speciation of sympatric dwarf and normal lake whitefish (Coregonus clupeaformis), such as cytosolic malate dehydrogenase (MDH1), which showed both significant expression and sequence divergence. The main goal of this study was to investigate into more details the signatures of natural selection in the regulatory and coding sequences of MDH1 in lake whitefish and test for parallelism of these signatures with other coregonine species. Sequencing of the two regions in 118 fish from four sympatric pairs of whitefish and two cisco species revealed a total of 35 single nucleotide polymorphisms (SNPs), with more genetic diversity in European compared to North American coregonine species. While the coding region was found to be under purifying selection, an SNP in the proximal promoter exhibited significant allele frequency divergence in a parallel manner among independent sympatric pairs of North American lake whitefish and European whitefish (C. lavaretus). According to transcription factor binding simulation for 22 regulatory haplotypes of MDH1, putative binding profiles were fairly conserved among species, except for the region around this SNP. Moreover, we found evidence for the role of this SNP in the regulation of MDH1 expression level. Overall, these results provide further evidence for the role of natural selection in gene regulation evolution among whitefish species pairs and suggest its possible link with patterns of phenotypic diversity observed in coregonine species. PMID:22408741

  8. Regulatory versus coding signatures of natural selection in a candidate gene involved in the adaptive divergence of whitefish species pairs (Coregonus spp.).

    PubMed

    Jeukens, Julie; Bernatchez, Louis

    2012-01-01

    While gene expression divergence is known to be involved in adaptive phenotypic divergence and speciation, the relative importance of regulatory and structural evolution of genes is poorly understood. A recent next-generation sequencing experiment allowed identifying candidate genes potentially involved in the ongoing speciation of sympatric dwarf and normal lake whitefish (Coregonus clupeaformis), such as cytosolic malate dehydrogenase (MDH1), which showed both significant expression and sequence divergence. The main goal of this study was to investigate into more details the signatures of natural selection in the regulatory and coding sequences of MDH1 in lake whitefish and test for parallelism of these signatures with other coregonine species. Sequencing of the two regions in 118 fish from four sympatric pairs of whitefish and two cisco species revealed a total of 35 single nucleotide polymorphisms (SNPs), with more genetic diversity in European compared to North American coregonine species. While the coding region was found to be under purifying selection, an SNP in the proximal promoter exhibited significant allele frequency divergence in a parallel manner among independent sympatric pairs of North American lake whitefish and European whitefish (C. lavaretus). According to transcription factor binding simulation for 22 regulatory haplotypes of MDH1, putative binding profiles were fairly conserved among species, except for the region around this SNP. Moreover, we found evidence for the role of this SNP in the regulation of MDH1 expression level. Overall, these results provide further evidence for the role of natural selection in gene regulation evolution among whitefish species pairs and suggest its possible link with patterns of phenotypic diversity observed in coregonine species.

  9. Ermelin, an endoplasmic reticulum transmembrane protein, contains the novel HELP domain conserved in eukaryotes.

    PubMed

    Suzuki, Akiko; Endo, Takeshi

    2002-02-06

    We have cloned a cDNA encoding a novel protein referred to as ermelin from mouse C2 skeletal muscle cells. This protein contained six hydrophobic amino acid stretches corresponding to transmembrane domains, two histidine-rich sequences, and a sequence homologous to the fusion peptides of certain fusion proteins. Ermelin also contained a novel modular sequence, designated as HELP domain, which was highly conserved among eukaryotes, from yeast to higher plants and animals. All these HELP domain-containing proteins, including mouse KE4, Drosophila Catsup, and Arabidopsis IAR1, possessed multipass transmembrane domains and histidine-rich sequences. Ermelin was predominantly expressed in brain and testis, and induced during neuronal differentiation of N1E-115 neuroblastoma cells but downregulated during myogenic differentiation of C2 cells. The mRNA was accumulated in hippocampus and cerebellum of brain and central areas of seminiferous tubules in testis. Epitope-tagging experiments located ermelin and KE4 to a network structure throughout the cytoplasm. Staining with the fluorescent dye DiOC(6)(3) identified this structure as the endoplasmic reticulum. These results suggest that at least some, if not all, of the HELP domain-containing proteins are multipass endoplasmic reticulum membrane proteins with functions conserved among eukaryotes.

  10. Long-range comparison of human and mouse Sprr loci to identify conserved noncoding sequences involved in coordinate regulation

    PubMed Central

    Martin, Natalia; Patel, Satyakam; Segre, Julia A.

    2004-01-01

    Mammalian epidermis provides a permeability barrier between an organism and its environment. Under homeostatic conditions, epidermal cells produce structural proteins, which are cross-linked in an orderly fashion to form a cornified envelope (CE). However, under genetic or environmental stress, specific genes are induced to rapidly build a temporary barrier. Small proline-rich (SPRR) proteins are the primary constituents of the CE. Under stress the entire family of 14 Sprr genes is upregulated. The Sprr genes are clustered within the larger epidermal differentiation complex on mouse chromosome 3, human chromosome 1q21. The clustering of the Sprr genes and their upregulation under stress suggest that these genes may be coordinately regulated. To identify enhancer elements that regulate this stress response activation of the Sprr locus, we utilized bioinformatic tools and classical biochemical dissection. Long-range comparative sequence analysis identified conserved noncoding sequences (CNSs). Clusters of epidermal-specific DNaseI-hypersensitive sites (HSs) mapped to specific CNSs. Increased prevalence of these HSs in barrier-deficient epidermis provides in vivo evidence of the regulation of the Sprr locus by these conserved sequences. Individual components of these HSs were cloned, and one was shown to have strong enhancer activity specific to conditions when the Sprr genes are coordinately upregulated. PMID:15574822

  11. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules

    PubMed Central

    Ashkenazy, Haim; Abadi, Shiran; Martz, Eric; Chay, Ofer; Mayrose, Itay; Pupko, Tal; Ben-Tal, Nir

    2016-01-01

    The degree of evolutionary conservation of an amino acid in a protein or a nucleic acid in DNA/RNA reflects a balance between its natural tendency to mutate and the overall need to retain the structural integrity and function of the macromolecule. The ConSurf web server (http://consurf.tau.ac.il), established over 15 years ago, analyses the evolutionary pattern of the amino/nucleic acids of the macromolecule to reveal regions that are important for structure and/or function. Starting from a query sequence or structure, the server automatically collects homologues, infers their multiple sequence alignment and reconstructs a phylogenetic tree that reflects their evolutionary relations. These data are then used, within a probabilistic framework, to estimate the evolutionary rates of each sequence position. Here we introduce several new features into ConSurf, including automatic selection of the best evolutionary model used to infer the rates, the ability to homology-model query proteins, prediction of the secondary structure of query RNA molecules from sequence, the ability to view the biological assembly of a query (in addition to the single chain), mapping of the conservation grades onto 2D RNA models and an advanced view of the phylogenetic tree that enables interactively rerunning ConSurf with the taxa of a sub-tree. PMID:27166375

  12. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bahl, C.; Morisseau, C; Bomberger, J

    Cystic fibrosis transmembrane conductance regulator (CFTR) inhibitory factor (Cif) is a virulence factor secreted by Pseudomonas aeruginosa that reduces the quantity of CFTR in the apical membrane of human airway epithelial cells. Initial sequence analysis suggested that Cif is an epoxide hydrolase (EH), but its sequence violates two strictly conserved EH motifs and also is compatible with other {alpha}/{beta} hydrolase family members with diverse substrate specificities. To investigate the mechanistic basis of Cif activity, we have determined its structure at 1.8-{angstrom} resolution by X-ray crystallography. The catalytic triad consists of residues Asp129, His297, and Glu153, which are conserved across themore » family of EHs. At other positions, sequence deviations from canonical EH active-site motifs are stereochemically conservative. Furthermore, detailed enzymatic analysis confirms that Cif catalyzes the hydrolysis of epoxide compounds, with specific activity against both epibromohydrin and cis-stilbene oxide, but with a relatively narrow range of substrate selectivity. Although closely related to two other classes of {alpha}/{beta} hydrolase in both sequence and structure, Cif does not exhibit activity as either a haloacetate dehalogenase or a haloalkane dehalogenase. A reassessment of the structural and functional consequences of the H269A mutation suggests that Cif's effect on host-cell CFTR expression requires the hydrolysis of an extended endogenous epoxide substrate.« less

  13. A Bioinformatic Strategy for the Detection, Classification and Analysis of Bacterial Autotransporters

    PubMed Central

    Celik, Nermin; Webb, Chaille T.; Leyton, Denisse L.; Holt, Kathryn E.; Heinz, Eva; Gorrell, Rebecca; Kwok, Terry; Naderer, Thomas; Strugnell, Richard A.; Speed, Terence P.; Teasdale, Rohan D.; Likić, Vladimir A.; Lithgow, Trevor

    2012-01-01

    Autotransporters are secreted proteins that are assembled into the outer membrane of bacterial cells. The passenger domains of autotransporters are crucial for bacterial pathogenesis, with some remaining attached to the bacterial surface while others are released by proteolysis. An enigma remains as to whether autotransporters should be considered a class of secretion system, or simply a class of substrate with peculiar requirements for their secretion. We sought to establish a sensitive search protocol that could identify and characterize diverse autotransporters from bacterial genome sequence data. The new sequence analysis pipeline identified more than 1500 autotransporter sequences from diverse bacteria, including numerous species of Chlamydiales and Fusobacteria as well as all classes of Proteobacteria. Interrogation of the proteins revealed that there are numerous classes of passenger domains beyond the known proteases, adhesins and esterases. In addition the barrel-domain-a characteristic feature of autotransporters-was found to be composed from seven conserved sequence segments that can be arranged in multiple ways in the tertiary structure of the assembled autotransporter. One of these conserved motifs overlays the targeting information required for autotransporters to reach the outer membrane. Another conserved and diagnostic motif maps to the linker region between the passenger domain and barrel-domain, indicating it as an important feature in the assembly of autotransporters. PMID:22905239

  14. Antisense Transcription Is Pervasive but Rarely Conserved in Enteric Bacteria

    PubMed Central

    Raghavan, Rahul; Sloan, Daniel B.; Ochman, Howard

    2012-01-01

    ABSTRACT Noncoding RNAs, including antisense RNAs (asRNAs) that originate from the complementary strand of protein-coding genes, are involved in the regulation of gene expression in all domains of life. Recent application of deep-sequencing technologies has revealed that the transcription of asRNAs occurs genome-wide in bacteria. Although the role of the vast majority of asRNAs remains unknown, it is often assumed that their presence implies important regulatory functions, similar to those of other noncoding RNAs. Alternatively, many antisense transcripts may be produced by chance transcription events from promoter-like sequences that result from the degenerate nature of bacterial transcription factor binding sites. To investigate the biological relevance of antisense transcripts, we compared genome-wide patterns of asRNA expression in closely related enteric bacteria, Escherichia coli and Salmonella enterica serovar Typhimurium, by performing strand-specific transcriptome sequencing. Although antisense transcripts are abundant in both species, less than 3% of asRNAs are expressed at high levels in both species, and only about 14% appear to be conserved among species. And unlike the promoters of protein-coding genes, asRNA promoters show no evidence of sequence conservation between, or even within, species. Our findings suggest that many or even most bacterial asRNAs are nonadaptive by-products of the cell’s transcription machinery. PMID:22872780

  15. Environmental Globalization, Organizational Form, and Expected Benefits from Protected Areas in Central America

    ERIC Educational Resources Information Center

    Pfeffer, Max J.; Schelhas, John W.; Meola, Catherine

    2006-01-01

    Environmental globalization has led to the implementation of conservation efforts like the creation of protected areas that often promote the interests of core countries in poorer regions. The creation of protected areas in poor areas frequently creates tensions between human needs like food and shelter and environmental conservation. Support for…

  16. Regina rigida (glossy crayfish snake)

    Treesearch

    David A. Steen; James A. Stiles; Sierra H. Stiles; Craig Guyer; Josh B. Pierce; D. Craig Rudolph; Lora L. Smith

    2011-01-01

    The overland movements and upland habitat use of wetland-associated reptiles has important conservation implications (Semlitsch and Bodie 2003. Conserv. BioI. 17:1219-1228). However, for many species, particularly snakes, we lack a basic understanding of spatial ecology and habitat use. Regina rigida is a poorly known species for which "observations of any kind...

  17. Public Schools and the Common Good.

    ERIC Educational Resources Information Center

    Reese, William J.

    1988-01-01

    Improving public school education, especially for the poor, requires defining and articulating some vision of the common good. This article reviews key positions taken by liberals and conservatives regarding educational reform during the 19th and 20th centuries and critiques these positions with regard to their disservice to the poor. (IAH)

  18. Finding a (pine) needle in a haystack: chloroplast genome sequence divergence in rare and widespread pines

    Treesearch

    J.B. Whittall; J. Syring; M. Parks; J. Buenrostro; C. Dick; A. Liston; R. Cronn

    2010-01-01

    Critical to conservation efforts and other investigations at low taxonomic levels, DNA sequence data offer important insights into the distinctiveness, biogeographic partitioning, and evolutionary histories of species. The resolving power of DNA sequences is often limited by insufficient variability at the intraspecific level. This is particularly true of studies...

  19. Genome Sequence of the Yeast Clavispora lusitaniae Type Strain CBS 6936.

    PubMed

    Durrens, Pascal; Klopp, Christophe; Biteau, Nicolas; Fitton-Ouhabi, Valérie; Dementhon, Karine; Accoceberry, Isabelle; Sherman, David J; Noël, Thierry

    2017-08-03

    Clavispora lusitaniae , an environmental saprophytic yeast belonging to the CTG clade of Candida , can behave occasionally as an opportunistic pathogen in humans. We report here the genome sequence of the type strain CBS 6936. Comparison with sequences of strain ATCC 42720 indicates conservation of chromosomal structure but significant nucleotide divergence. Copyright © 2017 Durrens et al.

  20. Genome Sequence of the Yeast Clavispora lusitaniae Type Strain CBS 6936

    PubMed Central

    Klopp, Christophe; Biteau, Nicolas; Fitton-Ouhabi, Valérie; Dementhon, Karine; Accoceberry, Isabelle; Sherman, David J.; Noël, Thierry

    2017-01-01

    ABSTRACT Clavispora lusitaniae, an environmental saprophytic yeast belonging to the CTG clade of Candida, can behave occasionally as an opportunistic pathogen in humans. We report here the genome sequence of the type strain CBS 6936. Comparison with sequences of strain ATCC 42720 indicates conservation of chromosomal structure but significant nucleotide divergence. PMID:28774979

Top