Gene order in rosid phylogeny, inferred from pairwise syntenies among extant genomes
2012-01-01
Background Ancestral gene order reconstruction for flowering plants has lagged behind developments in yeasts, insects and higher animals, because of the recency of widespread plant genome sequencing, sequencers' embargoes on public data use, paralogies due to whole genome duplication (WGD) and fractionation of undeleted duplicates, extensive paralogy from other sources, and the computational cost of existing methods. Results We address these problems, using the gene order of four core eudicot genomes (cacao, castor bean, papaya and grapevine) that have escaped any recent WGD events, and two others (poplar and cucumber) that descend from independent WGDs, in inferring the ancestral gene order of the rosid clade and those of its main subgroups, the fabids and malvids. We improve and adapt techniques including the OMG method for extracting large, paralogy-free, multiple orthologies from conflated pairwise synteny data among the six genomes and the PATHGROUPS approach for ancestral gene order reconstruction in a given phylogeny, where some genomes may be descendants of WGD events. We use the gene order evidence to evaluate the hypothesis that the order Malpighiales belongs to the malvids rather than as traditionally assigned to the fabids. Conclusions Gene orders of ancestral eudicot species, involving 10,000 or more genes can be reconstructed in an efficient, parsimonious and consistent way, despite paralogies due to WGD and other processes. Pairwise genomic syntenies provide appropriate input to a parameter-free procedure of multiple ortholog identification followed by gene-order reconstruction in solving instances of the "small phylogeny" problem. PMID:22759433
OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes
Li, Li; Stoeckert, Christian J.; Roos, David S.
2003-01-01
The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of “recent” paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome. PMID:12952885
Extensive concerted evolution of rice paralogs and the road to regaining independence.
Wang, Xiyin; Tang, Haibao; Bowers, John E; Feltus, Frank A; Paterson, Andrew H
2007-11-01
Many genes duplicated by whole-genome duplications (WGDs) are more similar to one another than expected. We investigated whether concerted evolution through conversion and crossing over, well-known to affect tandem gene clusters, also affects dispersed paralogs. Genome sequences for two Oryza subspecies reveal appreciable gene conversion in the approximately 0.4 MY since their divergence, with a gradual progression toward independent evolution of older paralogs. Since divergence from subspecies indica, approximately 8% of japonica paralogs produced 5-7 MYA on chromosomes 11 and 12 have been affected by gene conversion and several reciprocal exchanges of chromosomal segments, while approximately 70-MY-old "paleologs" resulting from a genome duplication (GD) show much less conversion. Sequence similarity analysis in proximal gene clusters also suggests more conversion between younger paralogs. About 8% of paleologs may have been converted since rice-sorghum divergence approximately 41 MYA. Domain-encoding sequences are more frequently converted than nondomain sequences, suggesting a sort of circularity--that sequences conserved by selection may be further conserved by relatively frequent conversion. The higher level of concerted evolution in the 5-7 MY-old segmental duplication may reflect the behavior of many genomes within the first few million years after duplication or polyploidization.
Extensive Concerted Evolution of Rice Paralogs and the Road to Regaining Independence
Wang, Xiyin; Tang, Haibao; Bowers, John E.; Feltus, Frank A.; Paterson, Andrew H.
2007-01-01
Many genes duplicated by whole-genome duplications (WGDs) are more similar to one another than expected. We investigated whether concerted evolution through conversion and crossing over, well-known to affect tandem gene clusters, also affects dispersed paralogs. Genome sequences for two Oryza subspecies reveal appreciable gene conversion in the ∼0.4 MY since their divergence, with a gradual progression toward independent evolution of older paralogs. Since divergence from subspecies indica, ∼8% of japonica paralogs produced 5–7 MYA on chromosomes 11 and 12 have been affected by gene conversion and several reciprocal exchanges of chromosomal segments, while ∼70-MY-old “paleologs” resulting from a genome duplication (GD) show much less conversion. Sequence similarity analysis in proximal gene clusters also suggests more conversion between younger paralogs. About 8% of paleologs may have been converted since rice–sorghum divergence ∼41 MYA. Domain-encoding sequences are more frequently converted than nondomain sequences, suggesting a sort of circularity—that sequences conserved by selection may be further conserved by relatively frequent conversion. The higher level of concerted evolution in the 5–7 MY-old segmental duplication may reflect the behavior of many genomes within the first few million years after duplication or polyploidization. PMID:18039882
Method of identity analyte-binding peptides
Kauvar, Lawrence M.
1990-01-01
A method for affinity chromatography or adsorption of a designated analyte utilizes a paralog as the affinity partner. The immobilized paralog can be used in purification or analysis of the analyte; the paralog can also be used as a substitute for antibody in an immunoassay. The paralog is identified by screening candidate peptide sequences of 4-20 amino acids for specific affinity to the analyte.
Method of identity analyte-binding peptides
Kauvar, L.M.
1990-10-16
A method for affinity chromatography or adsorption of a designated analyte utilizes a paralog as the affinity partner. The immobilized paralog can be used in purification or analysis of the analyte; the paralog can also be used as a substitute for antibody in an immunoassay. The paralog is identified by screening candidate peptide sequences of 4--20 amino acids for specific affinity to the analyte. 5 figs.
Sequencing and analysis of 10967 full-length cDNA clones from Xenopus laevis and Xenopus tropicalis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morin, R D; Chang, E; Petrescu, A
2005-10-31
Sequencing of full-insert clones from full-length cDNA libraries from both Xenopus laevis and Xenopus tropicalis has been ongoing as part of the Xenopus Gene Collection initiative. Here we present an analysis of 10967 clones (8049 from X. laevis and 2918 from X. tropicalis). The clone set contains 2013 orthologs between X. laevis and X. tropicalis as well as 1795 paralog pairs within X. laevis. 1199 are in-paralogs, believed to have resulted from an allotetraploidization event approximately 30 million years ago, and the remaining 546 are likely out-paralogs that have resulted from more ancient gene duplications, prior to the divergence betweenmore » the two species. We do not detect any evidence for positive selection by the Yang and Nielsen maximum likelihood method of approximating d{sub N}/d{sub S}. However, d{sub N}/d{sub S} for X. laevis in-paralogs is elevated relative to X. tropicalis orthologs. This difference is highly significant, and indicates an overall relaxation of selective pressures on duplicated gene pairs. Within both groups of paralogs, we found evidence of subfunctionalization, manifested as differential expression of paralogous genes among tissues, as measured by EST information from public resources. We have observed, as expected, a higher instance of subfunctionalization in out-paralogs relative to in-paralogs.« less
Carrigan, Matthew A.; Uryasev, Oleg; Davis, Ross P.; Zhai, LanMin; Hurley, Thomas D.; Benner, Steven A.
2012-01-01
Background Gene duplication is a source of molecular innovation throughout evolution. However, even with massive amounts of genome sequence data, correlating gene duplication with speciation and other events in natural history can be difficult. This is especially true in its most interesting cases, where rapid and multiple duplications are likely to reflect adaptation to rapidly changing environments and life styles. This may be so for Class I of alcohol dehydrogenases (ADH1s), where multiple duplications occurred in primate lineages in Old and New World monkeys (OWMs and NWMs) and hominoids. Methodology/Principal Findings To build a preferred model for the natural history of ADH1s, we determined the sequences of nine new ADH1 genes, finding for the first time multiple paralogs in various prosimians (lemurs, strepsirhines). Database mining then identified novel ADH1 paralogs in both macaque (an OWM) and marmoset (a NWM). These were used with the previously identified human paralogs to resolve controversies relating to dates of duplication and gene conversion in the ADH1 family. Central to these controversies are differences in the topologies of trees generated from exonic (coding) sequences and intronic sequences. Conclusions/Significance We provide evidence that gene conversions are the primary source of difference, using molecular clock dating of duplications and analyses of microinsertions and deletions (micro-indels). The tree topology inferred from intron sequences appear to more correctly represent the natural history of ADH1s, with the ADH1 paralogs in platyrrhines (NWMs) and catarrhines (OWMs and hominoids) having arisen by duplications shortly predating the divergence of OWMs and NWMs. We also conclude that paralogs in lemurs arose independently. Finally, we identify errors in database interpretation as the source of controversies concerning gene conversion. These analyses provide a model for the natural history of ADH1s that posits four ADH1 paralogs in the ancestor of Catarrhine and Platyrrhine primates, followed by the loss of an ADH1 paralog in the human lineage. PMID:22859968
Two Paralogous Families of a Two-Gene Subtilisin Operon Are Widely Distributed in Oral Treponemes
Correia, Frederick F.; Plummer, Alvin R.; Ellen, Richard P.; Wyss, Chris; Boches, Susan K.; Galvin, Jamie L.; Paster, Bruce J.; Dewhirst, Floyd E.
2003-01-01
Certain oral treponemes express a highly proteolytic phenotype and have been associated with periodontal diseases. The periodontal pathogen Treponema denticola produces dentilisin, a serine protease of the subtilisin family. The two-gene operon prcA-prtP is required for expression of active dentilisin (PrtP), a putative lipoprotein attached to the treponeme's outer membrane or sheath. The purpose of this study was to examine the diversity and structure of treponemal subtilisin-like proteases in order to better understand their distribution and function. The complete sequences of five prcA-prtP operons were determined for Treponema lecithinolyticum, “Treponema vincentii,” and two canine species. Partial operon sequences were obtained for T. socranskii subsp. 04 as well as 450- to 1,000-base fragments of prtP genes from four additional treponeme strains. Phylogenetic analysis demonstrated that the sequences fall into two paralogous families. The first family includes the sequence from T. denticola. Treponemes possessing this operon family express chymotrypsin-like protease activity and can cleave the substrate N-succinyl-alanyl-alanyl-prolyl-phenylalanine-p-nitroanilide (SAAPFNA). Treponemes possessing the second paralog family do not possess chymotrypsin-like activity or cleave SAAPFNA. Despite examination of a range of protein and peptide substrates, the specificity of the second protease family remains unknown. Each of the fully sequenced prcA and prtP genes contains a 5′ hydrophobic leader sequence with a treponeme lipobox. The two paralogous families of treponeme subtilisins represent a new subgroup within the subtilisin family of proteases and are the only subtilisin lipoprotein family. The present study demonstrated that the subtilisin paralogs comprising a two-gene operon are widely distributed among treponemes. PMID:14617650
Domain architecture conservation in orthologs
2011-01-01
Background As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs. Results The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent. Conclusions On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance. PMID:21819573
Divergence of Gene Body DNA Methylation and Evolution of Plant Duplicate Genes
Wang, Jun; Marowsky, Nicholas C.; Fan, Chuanzhu
2014-01-01
It has been shown that gene body DNA methylation is associated with gene expression. However, whether and how deviation of gene body DNA methylation between duplicate genes can influence their divergence remains largely unexplored. Here, we aim to elucidate the potential role of gene body DNA methylation in the fate of duplicate genes. We identified paralogous gene pairs from Arabidopsis and rice (Oryza sativa ssp. japonica) genomes and reprocessed their single-base resolution methylome data. We show that methylation in paralogous genes nonlinearly correlates with several gene properties including exon number/gene length, expression level and mutation rate. Further, we demonstrated that divergence of methylation level and pattern in paralogs indeed positively correlate with their sequence and expression divergences. This result held even after controlling for other confounding factors known to influence the divergence of paralogs. We observed that methylation level divergence might be more relevant to the expression divergence of paralogs than methylation pattern divergence. Finally, we explored the mechanisms that might give rise to the divergence of gene body methylation in paralogs. We found that exonic methylation divergence more closely correlates with expression divergence than intronic methylation divergence. We show that genomic environments (e.g., flanked by transposable elements and repetitive sequences) of paralogs generated by various duplication mechanisms are associated with the methylation divergence of paralogs. Overall, our results suggest that the changes in gene body DNA methylation could provide another avenue for duplicate genes to develop differential expression patterns and undergo different evolutionary fates in plant genomes. PMID:25310342
Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling
Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien
2012-01-01
The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697
Drosophila Nnf1 paralogs are partially redundant for somatic and germ line kinetochore function.
Blattner, Ariane C; Aguilar-Rodríguez, José; Kränzlin, Marcella; Wagner, Andreas; Lehner, Christian F
2017-02-01
Kinetochores allow attachment of chromosomes to spindle microtubules. Moreover, they host proteins that permit correction of erroneous attachments and prevent premature anaphase onset before bi-orientation of all chromosomes in metaphase has been achieved. Kinetochores are assembled from subcomplexes. Kinetochore proteins as well as the underlying centromere proteins and the centromeric DNA sequences evolve rapidly despite their fundamental importance for faithful chromosome segregation during mitotic and meiotic divisions. During evolution of Drosophila melanogaster, several centromere proteins were lost and a recent gene duplication has resulted in two Nnf1 paralogs, Nnf1a and Nnf1b, which code for alternative forms of a Mis12 kinetochore complex component. The rapid evolutionary divergence of centromere/kinetochore constituents in animals and plants has been proposed to be driven by an intragenome conflict resulting from centromere drive during female meiosis. Thus, a female meiosis-specific paralog might be expected to evolve rapidly under positive selection. While our characterization of the D. melanogaster Nnf1 paralogs hints at some partial functional specialization of Nnf1b for meiosis, we have failed to detect evidence for positive selection in our analysis of Nnf1 sequence evolution in the Drosophilid lineage. Neither paralog is essential, even though we find some clear differences in subcellular localization and expression during development. Loss of both paralogs results in developmental lethality. We therefore conclude that the two paralogs are still in early stages of differentiation.
Shimada, Norimoto; Sato, Shusei; Akashi, Tomoyoshi; Nakamura, Yasukazu; Tabata, Satoshi; Ayabe, Shin-ichi; Aoki, Toshio
2007-01-01
Abstract A model legume Lotus japonicus (Regel) K. Larsen is one of the subjects of genome sequencing and functional genomics programs. In the course of targeted approaches to the legume genomics, we analyzed the genes encoding enzymes involved in the biosynthesis of the legume-specific 5-deoxyisoflavonoid of L. japonicus, which produces isoflavan phytoalexins on elicitor treatment. The paralogous biosynthetic genes were assigned as comprehensively as possible by biochemical experiments, similarity searches, comparison of the gene structures, and phylogenetic analyses. Among the 10 biosynthetic genes investigated, six comprise multigene families, and in many cases they form gene clusters in the chromosomes. Semi-quantitative reverse transcriptase–PCR analyses showed coordinate up-regulation of most of the genes during phytoalexin induction and complex accumulation patterns of the transcripts in different organs. Some paralogous genes exhibited similar expression specificities, suggesting their genetic redundancy. The molecular evolution of the biosynthetic genes is discussed. The results presented here provide reliable annotations of the genes and genetic markers for comparative and functional genomics of leguminous plants. PMID:17452423
Kocot, Kevin M; Citarella, Mathew R; Moroz, Leonid L; Halanych, Kenneth M
2013-01-01
Molecular phylogenetics relies on accurate identification of orthologous sequences among the taxa of interest. Most orthology inference programs available for use in phylogenomics rely on small sets of pre-defined orthologs from model organisms or phenetic approaches such as all-versus-all sequence comparisons followed by Markov graph-based clustering. Such approaches have high sensitivity but may erroneously include paralogous sequences. We developed PhyloTreePruner, a software utility that uses a phylogenetic approach to refine orthology inferences made using phenetic methods. PhyloTreePruner checks single-gene trees for evidence of paralogy and generates a new alignment for each group containing only sequences inferred to be orthologs. Importantly, PhyloTreePruner takes into account support values on the tree and avoids unnecessarily deleting sequences in cases where a weakly supported tree topology incorrectly indicates paralogy. A test of PhyloTreePruner on a dataset generated from 11 completely sequenced arthropod genomes identified 2,027 orthologous groups sampled for all taxa. Phylogenetic analysis of the concatenated supermatrix yielded a generally well-supported topology that was consistent with the current understanding of arthropod phylogeny. PhyloTreePruner is freely available from http://sourceforge.net/projects/phylotreepruner/.
Carmon, Amber; MacIntyre, Ross
2010-01-01
The genome sequences of 12 Drosophila species contain 3 paralogs for alpha glycerophosphate dehydrogenase (GPDH) and for the mitochondrial alpha glycerophosphate oxidase (GPO). These 2 enzymes participate in the alpha glycerophosphate cycle in the adult thoracic flight muscles. The flight muscle enzymes are encoded by gpdh-1 at 26A2 and gpo-1 at 52C8. In this paper, we show that the GPDH paralogs share the same evolutionarily conserved functional domains and most intron positions, whereas the GPO paralogs share only some of the functional domains of mitochondrial oxidoreductases. The GPO paralogs not expressed in the flight muscles essentially lack introns. GPDH paralogs encoded by gpdh-2 and gpdh-3 and the GPO paralogs encoded by gpo-2 and gpo-3 are expressed only in the testes. Gene trees for the GPDH and GPO paralogs indicate that the genes expressed in the flight muscles are evolving very slowly presumably under strong purifying selection whereas the paralogs expressed in the testes are evolving more rapidly. The concordance between species and gene trees, d(N)/d(S) ratios, phylogenetic analysis by maximum likelihood-based tests, and analyses of radical and conservative substitutions all indicate that the additional GPDH and GPO paralogs are also evolving under purifying selection.
Molecular Evolution and Functional Diversification of Replication Protein A1 in Plants
Aklilu, Behailu B.; Culligan, Kevin M.
2016-01-01
Replication protein A (RPA) is a heterotrimeric, single-stranded DNA binding complex required for eukaryotic DNA replication, repair, and recombination. RPA is composed of three subunits, RPA1, RPA2, and RPA3. In contrast to single RPA subunit genes generally found in animals and yeast, plants encode multiple paralogs of RPA subunits, suggesting subfunctionalization. Genetic analysis demonstrates that five Arabidopsis thaliana RPA1 paralogs (RPA1A to RPA1E) have unique and overlapping functions in DNA replication, repair, and meiosis. We hypothesize here that RPA1 subfunctionalities will be reflected in major structural and sequence differences among the paralogs. To address this, we analyzed amino acid and nucleotide sequences of RPA1 paralogs from 25 complete genomes representing a wide spectrum of plants and unicellular green algae. We find here that the plant RPA1 gene family is divided into three general groups termed RPA1A, RPA1B, and RPA1C, which likely arose from two progenitor groups in unicellular green algae. In the family Brassicaceae the RPA1B and RPA1C groups have further expanded to include two unique sub-functional paralogs RPA1D and RPA1E, respectively. In addition, RPA1 groups have unique domains, motifs, cis-elements, gene expression profiles, and pattern of conservation that are consistent with proposed functions in monocot and dicot species, including a novel C-terminal zinc-finger domain found only in plant RPA1C-like sequences. These results allow for improved prediction of RPA1 subunit functions in newly sequenced plant genomes, and potentially provide a unique molecular tool to improve classification of Brassicaceae species. PMID:26858742
Berg, Jordan A.; Merrill, Bryan D.; Crockett, Justin T.; Esplin, Kyle P.; Evans, Marlee R.; Heaton, Karli E.; Hilton, Jared A.; Hyde, Jonathan R.; McBride, Morgan S.; Schouten, Jordan T.; Simister, Austin R.; Thurgood, Trever L.; Ward, Andrew T.; Breakwell, Donald P.; Hope, Sandra; Grose, Julianne H.
2016-01-01
Brevibacillus laterosporus is a spore-forming bacterium that causes a secondary infection in beehives following European Foulbrood disease. To better understand the contributions of Brevibacillus bacteriophages to the evolution of their hosts, five novel phages (Jenst, Osiris, Powder, SecTim467, and Sundance) were isolated and characterized. When compared with the five Brevibacillus phages currently in NCBI, these phages were assigned to clusters based on whole genome and proteome synteny. Powder and Osiris, both myoviruses, were assigned to the previously described Jimmer-like cluster. SecTim467 and Jenst, both siphoviruses, formed a novel phage cluster. Sundance, a siphovirus, was assigned as a singleton phage along with the previously isolated singleton, Emery. In addition to characterizing the basic relationships between these phages, several genomic features were observed. A motif repeated throughout phages Jenst and SecTim467 was frequently upstream of genes predicted to function in DNA replication, nucleotide metabolism, and transcription, suggesting transcriptional co-regulation. In addition, paralogous gene pairs that encode a putative transcriptional regulator were identified in four Brevibacillus phages. These paralogs likely evolved to bind different DNA sequences due to variation at amino acid residues predicted to bind specific nucleotides. Finally, a putative transposable element was identified in SecTim467 and Sundance that carries genes homologous to those found in Brevibacillus chromosomes. Remnants of this transposable element were also identified in phage Jenst. These discoveries provide a greater understanding of the diversity of phages, their behavior, and their evolutionary relationships to one another and to their host. In addition, they provide a foundation with which further Brevibacillus phages can be compared. PMID:27304881
Nguyen, Hoang T; Merriman, Tony R; Black, Michael A
2014-01-01
Recent advances in high-throughout sequencing technologies have made it possible to accurately assign copy number (CN) at CN variable loci. However, current analytic methods often perform poorly in regions in which complex CN variation is observed. Here we report the development of a read depth-based approach, CNVrd2, for investigation of CN variation using high-throughput sequencing data. This methodology was developed using data from the 1000 Genomes Project from the CCL3L1 locus, and tested using data from the DEFB103A locus. In both cases, samples were selected for which paralog ratio test data were also available for comparison. The CNVrd2 method first uses observed read-count ratios to refine segmentation results in one population. Then a linear regression model is applied to adjust the results across multiple populations, in combination with a Bayesian normal mixture model to cluster segmentation scores into groups for individual CN counts. The performance of CNVrd2 was compared to that of two other read depth-based methods (CNVnator, cn.mops) at the CCL3L1 and DEFB103A loci. The highest concordance with the paralog ratio test method was observed for CNVrd2 (77.8/90.4% for CNVrd2, 36.7/4.8% for cn.mops and 7.2/1% for CNVnator at CCL3L1 and DEF103A). CNVrd2 is available as an R package as part of the Bioconductor project: http://www.bioconductor.org/packages/release/bioc/html/CNVrd2.html.
Roman-Padilla, J; Rodríguez-Rua, A; Claros, M G; Hachero-Cruzado, I; Manchado, M
2016-01-01
The apolipoprotein A-IV (ApoA-IV) plays a key role in lipid transport and feed intake regulation. In this work, four cDNA sequences encoding ApoA-IV paralogs were identified. Sequence analysis revealed conserved structural features including the common 33-codon block and nine repeated motifs. Gene structure analysis identified four exons and three introns except for apoA-IVAa1 (with only 3 exons). Synteny analysis showed that the four paralogs were structured into two clusters (cluster A containing apoA-IVAa1 and apoA-IVAa2 and cluster B with apoA-IVBa3 and apoA-IVBa4) linked to an apolipoprotein E. Phylogenetic analysis clearly separated the paralogs according to their cluster organization as well as revealed four subclades highly conserved in Acanthopterygii. Whole-mount analyses (WISH) in early larvae (0 and 1day post-hatch (dph)) showed that the four paralogs were mainly expressed in yolk syncytial layer surrounding the oil globules. Later, at 3 and 5dph, the four paralogs were mainly expressed in liver and intestine although with differences in their relative abundance and temporal expression patterns. Diet supply triggered the intensity of WISH signals in the intestine of the four paralogs. Quantification of mRNA abundance by qPCR using whole larvae only detected the induction by diet at 5dph. Moreover, transcript levels increased progressively with age except for apoA-IVAa2, which appeared as a low-expressed isoform. Expression analysis in juvenile tissues confirmed that the four paralogs were mainly expressed in liver and intestine and secondary in other tissues. The role of these ApoA-IV genes in lipid transport and the possible role of apoA-IVAa2 as a regulatory form are discussed. Copyright © 2015 Elsevier Inc. All rights reserved.
Expanded microbial genome coverage and improved protein family annotation in the COG database
Galperin, Michael Y.; Makarova, Kira S.; Wolf, Yuri I.; Koonin, Eugene V.
2015-01-01
Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics. PMID:25428365
Orthology and paralogy constraints: satisfiability and consistency.
Lafond, Manuel; El-Mabrouk, Nadia
2014-01-01
A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family G. But is a given set C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for G? While previous studies have focused on full sets of constraints, here we consider the general case where C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is C satisfiable, i.e. can we find an event-labeled gene tree G inducing C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree? Previous results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships.
Orthology and paralogy constraints: satisfiability and consistency
2014-01-01
Background A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family G. But is a given set C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for G? While previous studies have focused on full sets of constraints, here we consider the general case where C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is C satisfiable, i.e. can we find an event-labeled gene tree G inducing C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree? Results Previous results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships. PMID:25572629
Tlapakova, Tereza; Krylov, Vladimir; Macha, Jaroslav
2005-01-01
Two paralogous mitochondrial malate dehydrogenase 2 (Mdh2) genes of Xenopus laevis have been cloned and sequenced, revealing 95% identity. Fluorescence in-situ hybridization (FISH) combined with tyramide amplification discriminates both genes; Mdh2a was localized into chromosome q3 and Mdh2b into chromosome q8. One kb cDNA probes detect both genes with 85% accuracy. The remaining signals were on the paralogous counterpart. Introns interrupt coding sequences at the same nucleotide as defined for mouse. Restriction polymorphism has been detected in the first intron of Mdh2a, while the individual variability in intron 6 of Mdh2b gene is represented by an insertion of incomplete retrotransposon L1Xl. Rates of nucleotide substitutions indicate that both genes are under similar evolutionary constraints. X. laevis Mdh2 genes can be used as markers for physical mapping and linkage analysis.
Johnson, Matthew G.; Gardner, Elliot M.; Liu, Yang; Medina, Rafael; Goffinet, Bernard; Shaw, A. Jonathan; Zerega, Nyree J. C.; Wickett, Norman J.
2016-01-01
Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper. PMID:27437175
Bengtsson, Johan; Eriksson, K Martin; Hartmann, Martin; Wang, Zheng; Shenoy, Belle Damodara; Grelet, Gwen-Aëlle; Abarenkov, Kessy; Petri, Anna; Rosenblad, Magnus Alm; Nilsson, R Henrik
2011-10-01
The ribosomal small subunit (SSU) rRNA gene has emerged as an important genetic marker for taxonomic identification in environmental sequencing datasets. In addition to being present in the nucleus of eukaryotes and the core genome of prokaryotes, the gene is also found in the mitochondria of eukaryotes and in the chloroplasts of photosynthetic eukaryotes. These three sets of genes are conceptually paralogous and should in most situations not be aligned and analyzed jointly. To identify the origin of SSU sequences in complex sequence datasets has hitherto been a time-consuming and largely manual undertaking. However, the present study introduces Metaxa ( http://microbiology.se/software/metaxa/ ), an automated software tool to extract full-length and partial SSU sequences from larger sequence datasets and assign them to an archaeal, bacterial, nuclear eukaryote, mitochondrial, or chloroplast origin. Using data from reference databases and from full-length organelle and organism genomes, we show that Metaxa detects and scores SSU sequences for origin with very low proportions of false positives and negatives. We believe that this tool will be useful in microbial and evolutionary ecology as well as in metagenomics.
Sanzol, Javier
2010-05-14
Gene duplication is central to genome evolution. In plants, genes can be duplicated through small-scale events and large-scale duplications often involving polyploidy. The apple belongs to the subtribe Pyrinae (Rosaceae), a diverse lineage that originated via allopolyploidization. Both small-scale duplications and polyploidy may have been important mechanisms shaping the genome of this species. This study evaluates the gene duplication and polyploidy history of the apple by characterizing duplicated genes in this species using EST data. Overall, 68% of the apple genes were clustered into families with a mean copy-number of 4.6. Analysis of the age distribution of gene duplications supported a continuous mode of small-scale duplications, plus two episodes of large-scale duplicates of vastly different ages. The youngest was consistent with the polyploid origin of the Pyrinae 37-48 MYBP, whereas the older may be related to gamma-triplication; an ancient hexapolyploidization previously characterized in the four sequenced eurosid genomes and basal to the eurosid-asterid divergence. Duplicated genes were studied for functional diversification with an emphasis on young paralogs; those originated during or after the formation of the Pyrinae lineage. Unequal assignment of single-copy genes and gene families to Gene Ontology categories suggested functional bias in the pattern of gene retention of paralogs. Young paralogs related to signal transduction, metabolism, and energy pathways have been preferentially retained. Non-random retention of duplicated genes seems to have mediated the expansion of gene families, some of which may have substantially increased their members after the origin of the Pyrinae. The joint analysis of over-duplicated functional categories and phylogenies, allowed evaluation of the role of both polyploidy and small-scale duplications during this process. Finally, gene expression analysis indicated that 82% of duplicated genes, including 80% of young paralogs, showed uncorrelated expression profiles, suggesting extensive subfunctionalization and a role of gene duplication in the acquisition of novel patterns of gene expression. This study reports a genome-wide analysis of the mode of gene duplication in the apple, and provides evidence for its role in genome functional diversification by characterising three major processes: selective retention of paralogs, amplification of gene families, and changes in gene expression.
Starrett, James; Hedin, Marshal; Ayoub, Nadia; Hayashi, Cheryl Y
2013-07-25
Hemocyanins are multimeric copper-containing hemolymph proteins involved in oxygen binding and transport in all major arthropod lineages. Most arachnids have seven primary subunits (encoded by paralogous genes a-g), which combine to form a 24-mer (4×6) quaternary structure. Within some spider lineages, however, hemocyanin evolution has been a dynamic process with extensive paralog duplication and loss. We have obtained hemocyanin gene sequences from numerous representatives of the spider infraorders Mygalomorphae and Araneomorphae in order to infer the evolution of the hemocyanin gene family and estimate spider relationships using these conserved loci. Our hemocyanin gene tree is largely consistent with the previous hypotheses of paralog relationships based on immunological studies, but reveals some discrepancies in which paralog types have been lost or duplicated in specific spider lineages. Analyses of concatenated hemocyanin sequences resolved deep nodes in the spider phylogeny and recovered a number of clades that are supported by other molecular studies, particularly for mygalomorph taxa. The concatenated data set is also used to estimate dates of higher-level spider divergences and suggests that the diversification of extant mygalomorphs preceded that of extant araneomorphs. Spiders are diverse in behavior and respiratory morphology, and our results are beneficial for comparative analyses of spider respiration. Lastly, the conserved hemocyanin sequences allow for the inference of spider relationships and ancient divergence dates. Copyright © 2013 Elsevier B.V. All rights reserved.
Unusual Intron Conservation near Tissue-Regulated Exons Found by Splicing Microarrays
Sugnet, Charles W; Srinivasan, Karpagam; Clark, Tyson A; O'Brien, Georgeann; Cline, Melissa S; Wang, Hui; Williams, Alan; Kulp, David; Blume, John E; Haussler, David; Ares, Manuel
2006-01-01
Alternative splicing contributes to both gene regulation and protein diversity. To discover broad relationships between regulation of alternative splicing and sequence conservation, we applied a systems approach, using oligonucleotide microarrays designed to capture splicing information across the mouse genome. In a set of 22 adult tissues, we observe differential expression of RNA containing at least two alternative splice junctions for about 40% of the 6,216 alternative events we could detect. Statistical comparisons identify 171 cassette exons whose inclusion or skipping is different in brain relative to other tissues and another 28 exons whose splicing is different in muscle. A subset of these exons is associated with unusual blocks of intron sequence whose conservation in vertebrates rivals that of protein-coding exons. By focusing on sets of exons with similar regulatory patterns, we have identified new sequence motifs implicated in brain and muscle splicing regulation. Of note is a motif that is strikingly similar to the branchpoint consensus but is located downstream of the 5′ splice site of exons included in muscle. Analysis of three paralogous membrane-associated guanylate kinase genes reveals that each contains a paralogous tissue-regulated exon with a similar tissue inclusion pattern. While the intron sequences flanking these exons remain highly conserved among mammalian orthologs, the paralogous flanking intron sequences have diverged considerably, suggesting unusually complex evolution of the regulation of alternative splicing in multigene families. PMID:16424921
Jørgensen, Kirsten; Morant, Anne Vinther; Morant, Marc; Jensen, Niels Bjerg; Olsen, Carl Erik; Kannangara, Rubini; Motawia, Mohammed Saddik; Møller, Birger Lindberg; Bak, Søren
2011-01-01
Cassava (Manihot esculenta) is a eudicotyledonous plant that produces the valine- and isoleucine-derived cyanogenic glucosides linamarin and lotaustralin with the corresponding oximes and cyanohydrins as key intermediates. CYP79 enzymes catalyzing amino acid-to-oxime conversion in cyanogenic glucoside biosynthesis are known from several plants including cassava. The enzyme system converting oxime into cyanohydrin has previously only been identified in the monocotyledonous plant great millet (Sorghum bicolor). Using this great millet CYP71E1 sequence as a query in a Basic Local Alignment Search Tool-p search, a putative functional homolog that exhibited an approximately 50% amino acid sequence identity was found in cassava. The corresponding full-length cDNA clone was obtained from a plasmid library prepared from cassava shoot tips and was assigned CYP71E7. Heterologous expression of CYP71E7 in yeast afforded microsomes converting 2-methylpropanal oxime (valine-derived oxime) and 2-methylbutanal oxime (isoleucine-derived oxime) to the corresponding cyanohydrins, which dissociate into acetone and 2-butanone, respectively, and hydrogen cyanide. The volatile ketones were detected as 2.4-dinitrophenylhydrazone derivatives by liquid chromatography-mass spectrometry. A KS of approximately 0.9 μm was determined for 2-methylbutanal oxime based on substrate-binding spectra. CYP71E7 exhibits low specificity for the side chain of the substrate and catalyzes the conversion of aliphatic and aromatic oximes with turnovers of approximately 21, 17, 8, and 1 min−1 for the oximes derived from valine, isoleucine, tyrosine, and phenylalanine, respectively. A second paralog of CYP71E7 was identified by database searches and showed approximately 90% amino acid sequence identity. In tube in situ polymerase chain reaction showed that in nearly unfolded leaves, the CYP71E7 paralogs are preferentially expressed in specific cells in the endodermis and in most cells in the first cortex cell layer. In fully unfolded leaves, the expression is pronounced in the cortex cell layer just beside the epidermis and in specific cells in the vascular tissue cortex cells. Thus, the transcripts of the CYP71E7 paralogs colocalize with CYP79D1 and CYP79D2. We conclude that CYP71E7 is the oxime-metabolizing enzyme in cyanogenic glucoside biosynthesis in cassava. PMID:21045121
DOE Office of Scientific and Technical Information (OSTI.GOV)
Volkov, Oleg A.; Kinch, Lisa; Ariagno, Carson
Catalytically inactive enzyme paralogs occur in many genomes. Some regulate their active counterparts but the structural principles of this regulation remain largely unknown. We report X-ray structures ofTrypanosoma brucei S-adenosylmethionine decarboxylase alone and in functional complex with its catalytically dead paralogous partner, prozyme. We show monomericTbAdoMetDC is inactive because of autoinhibition by its N-terminal sequence. Heterodimerization with prozyme displaces this sequence from the active site through a complex mechanism involving acis-to-transproline isomerization, reorganization of a β-sheet, and insertion of the N-terminal α-helix into the heterodimer interface, leading to enzyme activation. We propose that the evolution of this intricate regulatory mechanismmore » was facilitated by the acquisition of the dimerization domain, a single step that can in principle account for the divergence of regulatory schemes in the AdoMetDC enzyme family. These studies elucidate an allosteric mechanism in an enzyme and a plausible scheme by which such complex cooperativity evolved.« less
Margam, Venu M.; Coates, Brad S.; Bayles, Darrell O.; Hellmich, Richard L.; Agunbiade, Tolulope; Seufferheld, Manfredo J.; Sun, Weilin; Kroemer, Jeremy A.; Ba, Malick N.; Binso-Dabire, Clementine L.; Baoua, Ibrahim; Ishiyaku, Mohammad F.; Covas, Fernando G.; Srinivasan, Ramasamy; Armstrong, Joel; Murdock, Larry L.; Pittendrigh, Barry R.
2011-01-01
The legume pod borer, Maruca vitrata (Lepidoptera: Crambidae), is an insect pest species of crops grown by subsistence farmers in tropical regions of Africa. We present the de novo assembly of 3729 contigs from 454- and Sanger-derived sequencing reads for midgut, salivary, and whole adult tissues of this non-model species. Functional annotation predicted that 1320 M. vitrata protein coding genes are present, of which 631 have orthologs within the Bombyx mori gene model. A homology-based analysis assigned M. vitrata genes into a group of paralogs, but these were subsequently partitioned into putative orthologs following phylogenetic analyses. Following sequence quality filtering, a total of 1542 putative single nucleotide polymorphisms (SNPs) were predicted within M. vitrata contig assemblies. Seventy one of 1078 designed molecular genetic markers were used to screen M. vitrata samples from five collection sites in West Africa. Population substructure may be present with significant implications in the insect resistance management recommendations pertaining to the release of biological control agents or transgenic cowpea that express Bacillus thuringiensis crystal toxins. Mutation data derived from transcriptome sequencing is an expeditious and economical source for genetic markers that allow evaluation of ecological differentiation. PMID:21754987
Expanded microbial genome coverage and improved protein family annotation in the COG database.
Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V
2015-01-01
Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by US Government employees and is in the public domain in the US.
Rooting phylogenies using gene duplications: an empirical example from the bees (Apoidea).
Brady, Seán G; Litman, Jessica R; Danforth, Bryan N
2011-09-01
The placement of the root node in a phylogeny is fundamental to characterizing evolutionary relationships. The root node of bee phylogeny remains unclear despite considerable previous attention. In order to test alternative hypotheses for the location of the root node in bees, we used the F1 and F2 paralogs of elongation factor 1-alpha (EF-1α) to compare the tree topologies that result when using outgroup versus paralogous rooting. Fifty-two taxa representing each of the seven bee families were sequenced for both copies of EF-1α. Two datasets were analyzed. In the first (the "concatenated" dataset), the F1 and F2 copies for each species were concatenated and the tree was rooted using appropriate outgroups (sphecid and crabronid wasps). In the second dataset (the "duplicated" dataset), the F1 and F2 copies were aligned to each another and each copy for all taxa were treated as separate terminals. In this dataset, the root was placed between the F1 and F2 copies (e.g., paralog rooting). Bayesian analyses demonstrate that the outgroup rooting approach outperforms paralog rooting, recovering deeper clades and showing stronger support for groups well established by both morphological and other molecular data. Sequence characteristics of the two copies were compared at the amino acid level, but little evidence was found to suggest that one copy is more functionally conserved. Although neither approach yields an unambiguous root to the tree, both approaches strongly indicate that the root of bee phylogeny does not fall near Colletidae, as has been previously proposed. We discuss paralog rooting as a general strategy and why this approach performs relatively poorly with our particular dataset. Copyright © 2011 Elsevier Inc. All rights reserved.
Aagaard, Jan E.; Vacquier, Victor D.; MacCoss, Michael J.; Swanson, Willie J.
2010-01-01
Identifying fertilization molecules is key to our understanding of reproductive biology, yet only a few examples of interacting sperm and egg proteins are known. One of the best characterized comes from the invertebrate archeogastropod abalone (Haliotis spp.), where sperm lysin mediates passage through the protective egg vitelline envelope (VE) by binding to the VE protein vitelline envelope receptor for lysin (VERL). Rapid adaptive divergence of abalone lysin and VERL are an example of positive selection on interacting fertilization proteins contributing to reproductive isolation. Previously, we characterized a subset of the abalone VE proteins that share a structural feature, the zona pellucida (ZP) domain, which is common to VERL and the egg envelopes of vertebrates. Here, we use additional expressed sequence tag sequencing and shotgun proteomics to characterize this family of proteins in the abalone egg VE. We expand 3-fold the number of known ZP domain proteins present within the VE (now 30 in total) and identify a paralog of VERL (vitelline envelope zona pellucida domain protein [VEZP] 14) that contains a putative lysin-binding motif. We find that, like VERL, the divergence of VEZP14 among abalone species is driven by positive selection on the lysin-binding motif alone and that these paralogous egg VE proteins bind a similar set of sperm proteins including a rapidly evolving 18-kDa paralog of lysin, which may mediate sperm–egg fusion. This work identifies an egg coat paralog of VERL under positive selection and the candidate sperm proteins with which it may interact during abalone fertilization. PMID:19767347
Heterogeneous conservation of Dlx paralog co-expression in jawed vertebrates.
Debiais-Thibaud, Mélanie; Metcalfe, Cushla J; Pollack, Jacob; Germon, Isabelle; Ekker, Marc; Depew, Michael; Laurenti, Patrick; Borday-Birraux, Véronique; Casane, Didier
2013-01-01
The Dlx gene family encodes transcription factors involved in the development of a wide variety of morphological innovations that first evolved at the origins of vertebrates or of the jawed vertebrates. This gene family expanded with the two rounds of genome duplications that occurred before jawed vertebrates diversified. It includes at least three bigene pairs sharing conserved regulatory sequences in tetrapods and teleost fish, but has been only partially characterized in chondrichthyans, the third major group of jawed vertebrates. Here we take advantage of developmental and molecular tools applied to the shark Scyliorhinus canicula to fill in the gap and provide an overview of the evolution of the Dlx family in the jawed vertebrates. These results are analyzed in the theoretical framework of the DDC (Duplication-Degeneration-Complementation) model. The genomic organisation of the catshark Dlx genes is similar to that previously described for tetrapods. Conserved non-coding elements identified in bony fish were also identified in catshark Dlx clusters and showed regulatory activity in transgenic zebrafish. Gene expression patterns in the catshark showed that there are some expression sites with high conservation of the expressed paralog(s) and other expression sites with events of paralog sub-functionalization during jawed vertebrate diversification, resulting in a wide variety of evolutionary scenarios within this gene family. Dlx gene expression patterns in the catshark show that there has been little neo-functionalization in Dlx genes over gnathostome evolution. In most cases, one tandem duplication and two rounds of vertebrate genome duplication have led to at least six Dlx coding sequences with redundant expression patterns followed by some instances of paralog sub-functionalization. Regulatory constraints such as shared enhancers, and functional constraints including gene pleiotropy, may have contributed to the evolutionary inertia leading to high redundancy between gene expression patterns.
Mastretta-Yanes, Alicia; Zamudio, Sergio; Jorgensen, Tove H.; Arrigo, Nils; Alvarez, Nadir; Piñero, Daniel; Emerson, Brent C.
2014-01-01
Gene duplication leads to paralogy, which complicates the de novo assembly of genotyping-by-sequencing (GBS) data. The issue of paralogous genes is exacerbated in plants, because they are particularly prone to gene duplication events. Paralogs are normally filtered from GBS data before undertaking population genomics or phylogenetic analyses. However, gene duplication plays an important role in the functional diversification of genes and it can also lead to the formation of postzygotic barriers. Using populations and closely related species of a tropical mountain shrub, we examine 1) the genomic differentiation produced by putative orthologs, and 2) the distribution of recent gene duplication among lineages and geography. We find high differentiation among populations from isolated mountain peaks and species-level differentiation within what is morphologically described as a single species. The inferred distribution of paralogs among populations is congruent with taxonomy and shows that GBS could be used to examine recent gene duplication as a source of genomic differentiation of nonmodel species. PMID:25223767
Stevens, Charles M; Rayani, Kaveh; Genge, Christine E; Singh, Gurpreet; Liang, Bo; Roller, Janine M; Li, Cindy; Li, Alison Yueh; Tieleman, D Peter; van Petegem, Filip; Tibbits, Glen F
2016-07-12
Zebrafish, as a model for teleost fish, have two paralogous troponin C (TnC) genes that are expressed in the heart differentially in response to temperature acclimation. Upon Ca(2+) binding, TnC changes conformation and exposes a hydrophobic patch that interacts with troponin I and initiates cardiac muscle contraction. Teleost-specific TnC paralogs have not yet been functionally characterized. In this study we have modeled the structures of the paralogs using molecular dynamics simulations at 18°C and 28°C and calculated the different Ca(2+)-binding properties between the teleost cardiac (cTnC or TnC1a) and slow-skeletal (ssTnC or TnC1b) paralogs through potential-of-mean-force calculations. These values are compared with thermodynamic binding properties obtained through isothermal titration calorimetry (ITC). The modeled structures of each of the paralogs are similar at each temperature, with the exception of helix C, which flanks the Ca(2+) binding site; this region is also home to paralog-specific sequence substitutions that we predict have an influence on protein function. The short timescale of the potential-of-mean-force calculation precludes the inclusion of the conformational change on the ΔG of Ca(2+) interaction, whereas the ITC analysis includes the Ca(2+) binding and conformational change of the TnC molecule. ITC analysis has revealed that ssTnC has higher Ca(2+) affinity than cTnC for Ca(2+) overall, whereas each of the paralogs has increased affinity at 28°C compared to 18°C. Microsecond-timescale simulations have calculated that the cTnC paralog transitions from the closed to the open state more readily than the ssTnC paralog, an unfavorable transition that would decrease the ITC-derived Ca(2+) affinity while simultaneously increasing the Ca(2+) sensitivity of the myofilament. We propose that the preferential expression of cTnC at lower temperatures increases myofilament Ca(2+) sensitivity by this mechanism, despite the lower Ca(2+) affinity that we have measured by ITC. Copyright © 2016 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Question 7: Comparative Genomics and Early Cell Evolution: A Cautionary Methodological Note
NASA Astrophysics Data System (ADS)
Islas, Sara; Hernández-Morales, Ricardo; Lazcano, Antonio
2007-10-01
Inventories of the gene content of the last common ancestor (LCA), i.e., the cenancestor, include sequences that may have undergone horizontal transfer events, as well as sequences that have originated in different pre-cenancestral epochs. However, the universal distribution of highly conserved genes involved in RNA metabolism provide insights into early stages of cell evolution during which RNA played a much more conspicuous biological role, and is consistent with the hypothesis that extant living systems were preceded by an RNA/protein world. Insights into the traits of primitive entities from which the LCA evolved may be derived from the analysis of paralogous gene families, including those formed by sequences that resulted from internal elongation events. Three major types of paralogous gene families can be recognized. The importance of this grouping for understanding the traits of early cells is discussed.
Axelsen, Jacob Bock; Yan, Koon-Kiu; Maslov, Sergei
2007-01-01
Background The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic parameters of these three basic processes is contained in the proteome-wide distribution of sequence identities of pairs of paralogous proteins. Results We introduce a simple mathematical framework based on a stochastic birth-and-death model that allows one to extract some of this information and apply it to the set of all pairs of paralogous proteins in H. pylori, E. coli, S. cerevisiae, C. elegans, D. melanogaster, and H. sapiens. It was found that the histogram of sequence identities p generated by an all-to-all alignment of all protein sequences encoded in a genome is well fitted with a power-law form ~ p-γ with the value of the exponent γ around 4 for the majority of organisms used in this study. This implies that the intra-protein variability of substitution rates is best described by the Gamma-distribution with the exponent α ≈ 0.33. Different features of the shape of such histograms allow us to quantify the ratio between the genome-wide average deletion/duplication rates and the amino-acid substitution rate. Conclusion We separately measure the short-term ("raw") duplication and deletion rates rdup∗, rdel∗ which include gene copies that will be removed soon after the duplication event and their dramatically reduced long-term counterparts rdup, rdel. High deletion rate among recently duplicated proteins is consistent with a scenario in which they didn't have enough time to significantly change their functional roles and thus are to a large degree disposable. Systematic trends of each of the four duplication/deletion rates with the total number of genes in the genome were analyzed. All but the deletion rate of recent duplicates rdel∗ were shown to systematically increase with Ngenes. Abnormally flat shapes of sequence identity histograms observed for yeast and human are consistent with lineages leading to these organisms undergoing one or more whole-genome duplications. This interpretation is corroborated by our analysis of the genome of Paramecium tetraurelia where the p-4 profile of the histogram is gradually restored by the successive removal of paralogs generated in its four known whole-genome duplication events. PMID:18039386
The Last Common Ancestor of Most Bilaterian Animals Possessed at Least Nine Opsins
Pairett, Autum N.; Pankey, M. Sabrina; Serb, Jeanne M.; Speiser, Daniel I.; Swafford, Andrew J.
2016-01-01
Abstract The opsin gene family encodes key proteins animals use to sense light and has expanded dramatically as it originated early in animal evolution. Understanding the origins of opsin diversity can offer clues to how separate lineages of animals have repurposed different opsin paralogs for different light-detecting functions. However, the more we look for opsins outside of eyes and from additional animal phyla, the more opsins we uncover, suggesting we still do not know the true extent of opsin diversity, nor the ancestry of opsin diversity in animals. To estimate the number of opsin paralogs present in both the last common ancestor of the Nephrozoa (bilaterians excluding Xenoacoelomorpha), and the ancestor of Cnidaria + Bilateria, we reconstructed a reconciled opsin phylogeny using sequences from 14 animal phyla, especially the traditionally poorly-sampled echinoderms and molluscs. Our analysis strongly supports a repertoire of at least nine opsin paralogs in the bilaterian ancestor and at least four opsin paralogs in the last common ancestor of Cnidaria + Bilateria. Thus, the kernels of extant opsin diversity arose much earlier in animal history than previously known. Further, opsins likely duplicated and were lost many times, with different lineages of animals maintaining different repertoires of opsin paralogs. This phylogenetic information can inform hypotheses about the functions of different opsin paralogs and can be used to understand how and when opsins were incorporated into complex traits like eyes and extraocular sensors. PMID:28172965
Morin, Ryan D.; Chang, Elbert; Petrescu, Anca; Liao, Nancy; Griffith, Malachi; Kirkpatrick, Robert; Butterfield, Yaron S.; Young, Alice C.; Stott, Jeffrey; Barber, Sarah; Babakaiff, Ryan; Dickson, Mark C.; Matsuo, Corey; Wong, David; Yang, George S.; Smailus, Duane E.; Wetherby, Keith D.; Kwong, Peggy N.; Grimwood, Jane; Brinkley, Charles P.; Brown-John, Mabel; Reddix-Dugue, Natalie D.; Mayo, Michael; Schmutz, Jeremy; Beland, Jaclyn; Park, Morgan; Gibson, Susan; Olson, Teika; Bouffard, Gerard G.; Tsai, Miranda; Featherstone, Ruth; Chand, Steve; Siddiqui, Asim S.; Jang, Wonhee; Lee, Ed; Klein, Steven L.; Blakesley, Robert W.; Zeeberg, Barry R.; Narasimhan, Sudarshan; Weinstein, John N.; Pennacchio, Christa Prange; Myers, Richard M.; Green, Eric D.; Wagner, Lukas; Gerhard, Daniela S.; Marra, Marco A.; Jones, Steven J.M.; Holt, Robert A.
2006-01-01
Sequencing of full-insert clones from full-length cDNA libraries from both Xenopus laevis and Xenopus tropicalis has been ongoing as part of the Xenopus Gene Collection Initiative. Here we present 10,967 full ORF verified cDNA clones (8049 from X. laevis and 2918 from X. tropicalis) as a community resource. Because the genome of X. laevis, but not X. tropicalis, has undergone allotetraploidization, comparison of coding sequences from these two clawed (pipid) frogs provides a unique angle for exploring the molecular evolution of duplicate genes. Within our clone set, we have identified 445 gene trios, each comprised of an allotetraploidization-derived X. laevis gene pair and their shared X. tropicalis ortholog. Pairwise dN/dS, comparisons within trios show strong evidence for purifying selection acting on all three members. However, dN/dS ratios between X. laevis gene pairs are elevated relative to their X. tropicalis ortholog. This difference is highly significant and indicates an overall relaxation of selective pressures on duplicated gene pairs. We have found that the paralogs that have been lost since the tetraploidization event are enriched for several molecular functions, but have found no such enrichment in the extant paralogs. Approximately 14% of the paralogous pairs analyzed here also show differential expression indicative of subfunctionalization. PMID:16672307
Rooting gene trees without outgroups: EP rooting.
Sinsheimer, Janet S; Little, Roderick J A; Lake, James A
2012-01-01
Gene sequences are routinely used to determine the topologies of unrooted phylogenetic trees, but many of the most important questions in evolution require knowing both the topologies and the roots of trees. However, general algorithms for calculating rooted trees from gene and genomic sequences in the absence of gene paralogs are few. Using the principles of evolutionary parsimony (EP) (Lake JA. 1987a. A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol. 4:167-181) and its extensions (Cavender, J. 1989. Mechanized derivation of linear invariants. Mol Biol Evol. 6:301-316; Nguyen T, Speed TP. 1992. A derivation of all linear invariants for a nonbalanced transversion model. J Mol Evol. 35:60-76), we explicitly enumerate all linear invariants that solely contain rooting information and derive algorithms for rooting gene trees directly from gene and genomic sequences. These new EP linear rooting invariants allow one to determine rooted trees, even in the complete absence of outgroups and gene paralogs. EP rooting invariants are explicitly derived for three taxon trees, and rules for their extension to four or more taxa are provided. The method is demonstrated using 18S ribosomal DNA to illustrate how the new animal phylogeny (Aguinaldo AMA et al. 1997. Evidence for a clade of nematodes, arthropods, and other moulting animals. Nature 387:489-493; Lake JA. 1990. Origin of the metazoa. Proc Natl Acad Sci USA 87:763-766) may be rooted directly from sequences, even when they are short and paralogs are unavailable. These results are consistent with the current root (Philippe H et al. 2011. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470:255-260).
Rooting Gene Trees without Outgroups: EP Rooting
Sinsheimer, Janet S.; Little, Roderick J. A.; Lake, James A.
2012-01-01
Gene sequences are routinely used to determine the topologies of unrooted phylogenetic trees, but many of the most important questions in evolution require knowing both the topologies and the roots of trees. However, general algorithms for calculating rooted trees from gene and genomic sequences in the absence of gene paralogs are few. Using the principles of evolutionary parsimony (EP) (Lake JA. 1987a. A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol. 4:167–181) and its extensions (Cavender, J. 1989. Mechanized derivation of linear invariants. Mol Biol Evol. 6:301–316; Nguyen T, Speed TP. 1992. A derivation of all linear invariants for a nonbalanced transversion model. J Mol Evol. 35:60–76), we explicitly enumerate all linear invariants that solely contain rooting information and derive algorithms for rooting gene trees directly from gene and genomic sequences. These new EP linear rooting invariants allow one to determine rooted trees, even in the complete absence of outgroups and gene paralogs. EP rooting invariants are explicitly derived for three taxon trees, and rules for their extension to four or more taxa are provided. The method is demonstrated using 18S ribosomal DNA to illustrate how the new animal phylogeny (Aguinaldo AMA et al. 1997. Evidence for a clade of nematodes, arthropods, and other moulting animals. Nature 387:489–493; Lake JA. 1990. Origin of the metazoa. Proc Natl Acad Sci USA 87:763–766) may be rooted directly from sequences, even when they are short and paralogs are unavailable. These results are consistent with the current root (Philippe H et al. 2011. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470:255–260). PMID:22593551
Mastretta-Yanes, Alicia; Zamudio, Sergio; Jorgensen, Tove H; Arrigo, Nils; Alvarez, Nadir; Piñero, Daniel; Emerson, Brent C
2014-09-14
Gene duplication leads to paralogy, which complicates the de novo assembly of genotyping-by-sequencing (GBS) data. The issue of paralogous genes is exacerbated in plants, because they are particularly prone to gene duplication events. Paralogs are normally filtered from GBS data before undertaking population genomics or phylogenetic analyses. However, gene duplication plays an important role in the functional diversification of genes and it can also lead to the formation of postzygotic barriers. Using populations and closely related species of a tropical mountain shrub, we examine 1) the genomic differentiation produced by putative orthologs, and 2) the distribution of recent gene duplication among lineages and geography. We find high differentiation among populations from isolated mountain peaks and species-level differentiation within what is morphologically described as a single species. The inferred distribution of paralogs among populations is congruent with taxonomy and shows that GBS could be used to examine recent gene duplication as a source of genomic differentiation of nonmodel species. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Barta, Andrea; Kalyna, Maria; Reddy, Anireddy S N
2010-09-01
Growing interest in alternative splicing in plants and the extensive sequencing of new plant genomes necessitate more precise definition and classification of genes coding for splicing factors. SR proteins are a family of RNA binding proteins, which function as essential factors for constitutive and alternative splicing. We propose a unified nomenclature for plant SR proteins, taking into account the newly revised nomenclature of the mammalian SR proteins and a number of plant-specific properties of the plant proteins. We identify six subfamilies of SR proteins in Arabidopsis thaliana and rice (Oryza sativa), three of which are plant specific. The proposed subdivision of plant SR proteins into different subfamilies will allow grouping of paralogous proteins and simple assignment of newly discovered SR orthologs from other plant species and will promote functional comparisons in diverse plant species.
USDA-ARS?s Scientific Manuscript database
Salmonid genomes are considered to be in a pseudo-tetraploid state as a result of an evolutionarily recent genome duplication event. This situation complicates single nucleotide polymorphism (SNP) discovery in rainbow trout as many putative SNPs are actually paralogous sequence variants (PSVs) and ...
Adebali, Ogun; Reznik, Alexander O.; Ory, Daniel S.; ...
2016-02-18
Here, predicting the phenotypic effects of mutations has become an important application in clinical genetic diagnostics. Computational tools evaluate the behavior of the variant over evolutionary time and assume that variations seen during the course of evolution are probably benign in humans. However, current tools do not take into account orthologous/paralogous relationships. Paralogs have dramatically different roles in Mendelian diseases. For example, whereas inactivating mutations in the NPC1 gene cause the neurodegenerative disorder Niemann-Pick C, inactivating mutations in its paralog NPC1L1 are not disease-causing and, moreover, are implicated in protection from coronary heart disease. Methods: We identified major events inmore » NPC1 evolution and revealed and compared orthologs and paralogs of the human NPC1 gene through phylogenetic and protein sequence analyses. We predicted whether an amino acid substitution affects protein function by reducing the organism s fitness. As a result, removing the paralogs and distant homologs improved the overall performance of categorizing disease-causing and benign amino acid substitutions. In conclusion, the results show that a thorough evolutionary analysis followed by identification of orthologs improves the accuracy in predicting disease-causing missense mutations. We anticipate that this approach will be used as a reference in the interpretation of variants in other genetic diseases as well.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adebali, Ogun; Reznik, Alexander O.; Ory, Daniel S.
Here, predicting the phenotypic effects of mutations has become an important application in clinical genetic diagnostics. Computational tools evaluate the behavior of the variant over evolutionary time and assume that variations seen during the course of evolution are probably benign in humans. However, current tools do not take into account orthologous/paralogous relationships. Paralogs have dramatically different roles in Mendelian diseases. For example, whereas inactivating mutations in the NPC1 gene cause the neurodegenerative disorder Niemann-Pick C, inactivating mutations in its paralog NPC1L1 are not disease-causing and, moreover, are implicated in protection from coronary heart disease. Methods: We identified major events inmore » NPC1 evolution and revealed and compared orthologs and paralogs of the human NPC1 gene through phylogenetic and protein sequence analyses. We predicted whether an amino acid substitution affects protein function by reducing the organism s fitness. As a result, removing the paralogs and distant homologs improved the overall performance of categorizing disease-causing and benign amino acid substitutions. In conclusion, the results show that a thorough evolutionary analysis followed by identification of orthologs improves the accuracy in predicting disease-causing missense mutations. We anticipate that this approach will be used as a reference in the interpretation of variants in other genetic diseases as well.« less
Hellmuth, Marc; Wieseke, Nicolas; Lechner, Marcus; Lenhof, Hans-Peter; Middendorf, Martin; Stadler, Peter F.
2015-01-01
Phylogenomics heavily relies on well-curated sequence data sets that comprise, for each gene, exclusively 1:1 orthologos. Paralogs are treated as a dangerous nuisance that has to be detected and removed. We show here that this severe restriction of the data sets is not necessary. Building upon recent advances in mathematical phylogenetics, we demonstrate that gene duplications convey meaningful phylogenetic information and allow the inference of plausible phylogenetic trees, provided orthologs and paralogs can be distinguished with a degree of certainty. Starting from tree-free estimates of orthology, cograph editing can sufficiently reduce the noise to find correct event-annotated gene trees. The information of gene trees can then directly be translated into constraints on the species trees. Although the resolution is very poor for individual gene families, we show that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees, even in the presence of horizontal gene transfer. PMID:25646426
Extensive Local Gene Duplication and Functional Divergence among Paralogs in Atlantic Salmon
Warren, Ian A.; Ciborowski, Kate L.; Casadei, Elisa; Hazlerigg, David G.; Martin, Sam; Jordan, William C.; Sumner, Seirian
2014-01-01
Many organisms can generate alternative phenotypes from the same genome, enabling individuals to exploit diverse and variable environments. A prevailing hypothesis is that such adaptation has been favored by gene duplication events, which generate redundant genomic material that may evolve divergent functions. Vertebrate examples of recent whole-genome duplications are sparse although one example is the salmonids, which have undergone a whole-genome duplication event within the last 100 Myr. The life-cycle of the Atlantic salmon, Salmo salar, depends on the ability to produce alternating phenotypes from the same genome, to facilitate migration and maintain its anadromous life history. Here, we investigate the hypothesis that genome-wide and local gene duplication events have contributed to the salmonid adaptation. We used high-throughput sequencing to characterize the transcriptomes of three key organs involved in regulating migration in S. salar: Brain, pituitary, and olfactory epithelium. We identified over 10,000 undescribed S. salar sequences and designed an analytic workflow to distinguish between paralogs originating from local gene duplication events or from whole-genome duplication events. These data reveal that substantial local gene duplications took place shortly after the whole-genome duplication event. Many of the identified paralog pairs have either diverged in function or become noncoding. Future functional genomics studies will reveal to what extent this rich source of divergence in genetic sequence is likely to have facilitated the evolution of extreme phenotypic plasticity required for an anadromous life-cycle. PMID:24951567
[Divergence of paralogous growth-hormone-encoding genes and their promoters in Salmonidae].
Kamenskaya, D N; Pankova, M V; Atopkin, D M; Brykov, V A
2017-01-01
In many fish species, including salmonids, the growth-hormone is encoded by two duplicated paralogous genes, gh1 and gh2. Both genes were already in place at the time of divergence of species in this group. A comparison of the entire sequence of these genes of salmonids has shown that their conserved regions are associated with exons, while their most variable regions correspond to introns. Introns C and D include putative regulatory elements (sites Pit-1, CRE, and ERE), that are also conserved. In chars, the degree of polymorphism of gh2 gene is 2-3 times as large as that in gh1 gene. However, a comparison across all Salmonidae species would not extent this observation to other species. In both these chars' genes, the promoters are conserved mainly because they correspond to putative regulatory sequences (TATA box, binding sites for the pituitary transcription factor Pit-1 (F1-F4), CRE, GRE and RAR/RXR elements). The promoter of gh2 gene has a greater degree of polymorphism compared with gh1 gene promoter in all investigated species of salmonids. The observed differences in the rates of accumulation of changes in growth hormone encoding paralogs could be explained by differences in the intensity of selection.
Intragenome Diversity of Gene Families Encoding Toxin-like Proteins in Venomous Animals.
Rodríguez de la Vega, Ricardo C; Giraud, Tatiana
2016-11-01
The evolution of venoms is the story of how toxins arise and of the processes that generate and maintain their diversity. For animal venoms these processes include recruitment for expression in the venom gland, neofunctionalization, paralogous expansions, and functional divergence. The systematic study of these processes requires the reliable identification of the venom components involved in antagonistic interactions. High-throughput sequencing has the potential of uncovering the entire set of toxins in a given organism, yet the existence of non-venom toxin paralogs and the misleading effects of partial census of the molecular diversity of toxins make necessary to collect complementary evidence to distinguish true toxins from their non-venom paralogs. Here, we analyzed the whole genomes of two scorpions, one spider and one snake, aiming at the identification of the full repertoires of genes encoding toxin-like proteins. We classified the entire set of protein-coding genes into paralogous groups and monotypic genes, identified genes encoding toxin-like proteins based on known toxin families, and quantified their expression in both venom-glands and pooled tissues. Our results confirm that genes encoding toxin-like proteins are part of multigene families, and that these families arise by recruitment events from non-toxin genes followed by limited expansions of the toxin-like protein coding genes. We also show that failing to account for sequence similarity with non-toxin proteins has a considerable misleading effect that can be greatly reduced by comparative transcriptomics. Our study overall contributes to the understanding of the evolutionary dynamics of proteins involved in antagonistic interactions. © The Author 2016. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.
Lyubetsky, Vassily; Gershgorin, Roman; Gorbunov, Konstantin
2017-12-06
Chromosome structure is a very limited model of the genome including the information about its chromosomes such as their linear or circular organization, the order of genes on them, and the DNA strand encoding a gene. Gene lengths, nucleotide composition, and intergenic regions are ignored. Although highly incomplete, such structure can be used in many cases, e.g., to reconstruct phylogeny and evolutionary events, to identify gene synteny, regulatory elements and promoters (considering highly conserved elements), etc. Three problems are considered; all assume unequal gene content and the presence of gene paralogs. The distance problem is to determine the minimum number of operations required to transform one chromosome structure into another and the corresponding transformation itself including the identification of paralogs in two structures. We use the DCJ model which is one of the most studied combinatorial rearrangement models. Double-, sesqui-, and single-operations as well as deletion and insertion of a chromosome region are considered in the model; the single ones comprise cut and join. In the reconstruction problem, a phylogenetic tree with chromosome structures in the leaves is given. It is necessary to assign the structures to inner nodes of the tree to minimize the sum of distances between terminal structures of each edge and to identify the mutual paralogs in a fairly large set of structures. A linear algorithm is known for the distance problem without paralogs, while the presence of paralogs makes it NP-hard. If paralogs are allowed but the insertion and deletion operations are missing (and special constraints are imposed), the reduction of the distance problem to integer linear programming is known. Apparently, the reconstruction problem is NP-hard even in the absence of paralogs. The problem of contigs is to find the optimal arrangements for each given set of contigs, which also includes the mutual identification of paralogs. We proved that these problems can be reduced to integer linear programming formulations, which allows an algorithm to redefine the problems to implement a very special case of the integer linear programming tool. The results were tested on synthetic and biological samples. Three well-known problems were reduced to a very special case of integer linear programming, which is a new method of their solutions. Integer linear programming is clearly among the main computational methods and, as generally accepted, is fast on average; in particular, computation systems specifically targeted at it are available. The challenges are to reduce the size of the corresponding integer linear programming formulations and to incorporate a more detailed biological concept in our model of the reconstruction.
Guimaraes, Ana M S; Toth, Balazs; Santos, Andrea P; do Nascimento, Naíla C; Kritchevsky, Janice E; Messick, Joanne B
2012-11-01
We report the complete genome sequence of "Candidatus Mycoplasma haemolamae," an endemic red-cell pathogen of camelids. The single, circular chromosome has 756,845 bp, a 39.3% G+C content, and 925 coding sequences (CDSs). A great proportion (49.1%) of these CDSs are organized into paralogous gene families, which can now be further explored with regard to antigenic variation.
The unusual S locus of Leavenworthia is composed of two sets of paralogous loci.
Chantha, Sier-Ching; Herman, Adam C; Castric, Vincent; Vekemans, Xavier; Marande, William; Schoen, Daniel J
2017-12-01
The Leavenworthia self-incompatibility locus (S locus) consists of paralogs (Lal2, SCRL) of the canonical Brassicaceae S locus genes (SRK, SCR), and is situated in a genomic position that differs from the ancestral one in the Brassicaceae. Unexpectedly, in a small number of Leavenworthia alabamica plants examined, sequences closely resembling exon 1 of SRK have been found, but the function of these has remained unclear. BAC cloning and expression analyses were employed to characterize these SRK-like sequences. An SRK-positive Bacterial Artificial Chromosome clone was found to contain complete SRK and SCR sequences located close by one another in the derived genomic position of the Leavenworthia S locus, and in place of the more typical Lal2 and SCRL sequences. These sequences are expressed in stigmas and anthers, respectively, and crossing data show that the SRK/SCR haplotype is functional in self-incompatibility. Population surveys indicate that < 5% of Leavenworthia S loci possess such alleles. An ancestral translocation or recombination event involving SRK/SCR and Lal2/SCRL likely occurred, together with neofunctionalization of Lal2/SCRL, and both haplotype groups now function as Leavenworthia S locus alleles. These findings suggest that S locus alleles can have distinctly different evolutionary origins. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.
Toth, Balazs; Santos, Andrea P.; do Nascimento, Naíla C.; Kritchevsky, Janice E.
2012-01-01
We report the complete genome sequence of “Candidatus Mycoplasma haemolamae,” an endemic red-cell pathogen of camelids. The single, circular chromosome has 756,845 bp, a 39.3% G+C content, and 925 coding sequences (CDSs). A great proportion (49.1%) of these CDSs are organized into paralogous gene families, which can now be further explored with regard to antigenic variation. PMID:23105057
Yue, Y; Grossmann, B; Tsend-Ayush, E; Grützner, F; Ferguson-Smith, M A; Yang, F; Haaf, T
2005-01-01
Intrachromosomal duplications play a significant role in human genome pathology and evolution. To better understand the molecular basis of evolutionary chromosome rearrangements, we performed molecular cytogenetic and sequence analyses of the breakpoint region that distinguishes human chromosome 3p12.3 and orangutan chromosome 2. FISH with region-specific BAC clones demonstrated that the breakpoint-flanking sequences are duplicated intrachromosomally on orangutan 2 and human 3q21 as well as at many pericentromeric and subtelomeric sites throughout the genomes. Breakage and rearrangement of the human 3p12.3-homologous region in the orangutan lineage were associated with a partial loss of duplicated sequences in the breakpoint region. Consistent with our FISH mapping results, computational analysis of the human chromosome 3 genomic sequence revealed three 3p12.3-paralogous sequence blocks on human chromosome 3q21 and smaller blocks on the short arm end 3p26-->p25. This is consistent with the view that sequences from an ancestral site at 3q21 were duplicated at 3p12.3 in a common ancestor of orangutan and humans. Our results show that evolutionary chromosome rearrangements are associated with microduplications and microdeletions, contributing to the DNA differences between closely related species. Copyright (c) 2005 S. Karger AG, Basel.
Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants
Conte, Matthieu G; Gaillard, Sylvain; Droc, Gaetan; Perin, Christophe
2008-01-01
Background Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations. Results We developed a procedure for ortholog prediction between Oryza sativa and Arabidopsis thaliana. Firstly, we established an efficient method to cluster A. thaliana and O. sativa full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions. Conclusion Our procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods. PMID:18426584
Insights into three whole-genome duplications gleaned from the Paramecium caudatum genome sequence.
McGrath, Casey L; Gout, Jean-Francois; Doak, Thomas G; Yanagi, Akira; Lynch, Michael
2014-08-01
Paramecium has long been a model eukaryote. The sequence of the Paramecium tetraurelia genome reveals a history of three successive whole-genome duplications (WGDs), and the sequences of P. biaurelia and P. sexaurelia suggest that these WGDs are shared by all members of the aurelia species complex. Here, we present the genome sequence of P. caudatum, a species closely related to the P. aurelia species group. P. caudatum shares only the most ancient of the three WGDs with the aurelia complex. We found that P. caudatum maintains twice as many paralogs from this early event as the P. aurelia species, suggesting that post-WGD gene retention is influenced by subsequent WGDs and supporting the importance of selection for dosage in gene retention. The availability of P. caudatum as an outgroup allows an expanded analysis of the aurelia intermediate and recent WGD events. Both the Guanine+Cytosine (GC) content and the expression level of preduplication genes are significant predictors of duplicate retention. We find widespread asymmetrical evolution among aurelia paralogs, which is likely caused by gradual pseudogenization rather than by neofunctionalization. Finally, cases of divergent resolution of intermediate WGD duplicates between aurelia species implicate this process acts as an ongoing reinforcement mechanism of reproductive isolation long after a WGD event. Copyright © 2014 by the Genetics Society of America.
Conlon, J Michael
2008-10-01
Frogs belonging to the extensive family Ranidae represent a valuable source of antimicrobial peptides with therapeutic potential but there is currently no consistent system of nomenclature to describe these peptides. Terminology based solely on species name does not reflect the evolutionary relationships existing between peptides encoded by orthologous and paralogous genes. On the basis of limited structural similarity, at least 14 well-established peptide families have been identified (brevinin-1, brevinin-2, esculentin-1, esculentin-2, japonicin-1, japonicin-2, nigrocin-2, palustrin-1, palustrin-2, ranacyclin, ranalexin, ranatuerin-1, ranatuerin-2, temporin). It is proposed that terms that are synonymous with these names should no longer be used. Orthologous peptides from different species may be characterized by the initial letter of that species, set in upper case, with paralogs belonging to the same peptide family being assigned letters set in lower case, e.g. brevinin-1Pa, brevinin-1Pb, etc. When two species begin with the same initial letter, two letters may be used, e.g. P for pipiens and PL for palustris. Species names and assignments to genera may be obtained from Amphibian Species of the World Electronic Database, accessible at http://research.amnh.org/herpetology/amphibia/index.php. American Museum of Natural History, New York, USA.
El-Sherry, Shiem; Ogedengbe, Mosun E; Hafeez, Mian A; Barta, John R
2013-07-01
Multiple 18S rDNA sequences were obtained from two single-oocyst-derived lines of each of Eimeria meleagrimitis and Eimeria adenoeides. After analysing the 15 new 18S rDNA sequences from two lines of E. meleagrimitis and 17 new sequences from two lines of E. adenoeides, there were clear indications that divergent, paralogous 18S rDNA copies existed within the nuclear genome of E. meleagrimitis. In contrast, mitochondrial cytochrome c oxidase subunit I (COI) partial sequences from all lines of a particular Eimeria sp. were identical and, in phylogenetic analyses, COI sequences clustered unambiguously in monophyletic and highly-supported clades specific to individual Eimeria sp. Phylogenetic analysis of the new 18S rDNA sequences from E. meleagrimitis showed that they formed two distinct clades: Type A with four new sequences; and Type B with nine new sequences; both Types A and B sequences were obtained from each of the single-oocyst-derived lines of E. meleagrimitis. Together these rDNA types formed a well-supported E. meleagrimitis clade. Types A and B 18S rDNA sequences from E. meleagrimitis had a mean sequence identity of only 97.4% whereas mean sequence identity within types was 99.1-99.3%. The observed intraspecific sequence divergence among E. meleagrimitis 18S rDNA sequence types was even higher (approximately 2.6%) than the interspecific sequence divergence present between some well-recognized species such as Eimeria tenella and Eimeria necatrix (1.1%). Our observations suggest that, unlike COI sequences, 18S rDNA sequences are not reliable molecular markers to be used alone for species identification with coccidia, although 18S rDNA sequences have clear utility for phylogenetic reconstruction of apicomplexan parasites at the genus and higher taxonomic ranks. Copyright © 2013. Published by Elsevier Ltd.
Law, Sheran Hiu Wan; Redelings, Benjamin David; Kullman, Seth William
2012-01-15
The availability of multiple teleost (bony fish) genomes is providing unprecedented opportunities to understand the diversity and function of gene duplication events using comparative genomics. Here we examine multiple paralogous genes of γ-glutamyl transferase (GGT) in several distantly related teleost species including medaka, stickleback, green spotted pufferfish, fugu, and zebrafish. Through mining genome databases, we have identified multiple GGT orthologs. Duplicate (paralogous) GGT sequences for GGT1 (GGT1 a and b), GGTL1 (GGTL1 a and b), and GGTL3 (GGTL3 a and b) were identified for each species. Phylogenetic analysis suggests that GGTs are ancient proteins conserved across most metazoan phyla and those paralogous GGTs in teleosts likely arose from the serial 3R genome duplication events. A third GGTL1 gene (GGTL1c) was found in green spotted pufferfish; however, this gene is not present in medaka, stickleback, or fugu. Similarly, one or both paralogs of GGTL3 appear to have been lost in green spotted pufferfish, fugu, and zebrafish. Syntenic relationships were highly maintained between duplicated teleost chromosomes, among teleosts and across ray-finned (Actinopterygii) and lobe-finned (Sarcopterygii) species. To assess subfunction partitioning, six medaka GGT genes were cloned and assessed for developmental and tissue-specific expression. On the basis of these data, we propose a modification of the "duplication-degeneration-complementation" model of subfunction partitioning where quantitative differences rather than absolute differences in gene expression are observed between gene paralogs. Our results demonstrate that multiple GGT genes have been retained within teleost genomes. Questions remain, however, regarding the functional roles of multiple GGTs in these species. Copyright © 2011 Wiley Periodicals, Inc., A Wiley Company.
Complexity of Gene Expression Evolution after Duplication: Protein Dosage Rebalancing
Rogozin, Igor B.
2014-01-01
Ongoing debates about functional importance of gene duplications have been recently intensified by a heated discussion of the “ortholog conjecture” (OC). Under the OC, which is central to functional annotation of genomes, orthologous genes are functionally more similar than paralogous genes at the same level of sequence divergence. However, a recent study challenged the OC by reporting a greater functional similarity, in terms of gene ontology (GO) annotations and expression profiles, among within-species paralogs compared to orthologs. These findings were taken to indicate that functional similarity of homologous genes is primarily determined by the cellular context of the genes, rather than evolutionary history. Subsequent studies suggested that the OC appears to be generally valid when applied to mammalian evolution but the complete picture of evolution of gene expression also has to incorporate lineage-specific aspects of paralogy. The observed complexity of gene expression evolution after duplication can be explained through selection for gene dosage effect combined with the duplication-degeneration-complementation model. This paper discusses expression divergence of recent duplications occurring before functional divergence of proteins encoded by duplicate genes. PMID:25197576
Mapping proteins in the presence of paralogs using units of coevolution
2013-01-01
Background We study the problem of mapping proteins between two protein families in the presence of paralogs. This problem occurs as a difficult subproblem in coevolution-based computational approaches for protein-protein interaction prediction. Results Similar to prior approaches, our method is based on the idea that coevolution implies equal rates of sequence evolution among the interacting proteins, and we provide a first attempt to quantify this notion in a formal statistical manner. We call the units that are central to this quantification scheme the units of coevolution. A unit consists of two mapped protein pairs and its score quantifies the coevolution of the pairs. This quantification allows us to provide a maximum likelihood formulation of the paralog mapping problem and to cast it into a binary quadratic programming formulation. Conclusion CUPID, our software tool based on a Lagrangian relaxation of this formulation, makes it, for the first time, possible to compute state-of-the-art quality pairings in a few minutes of runtime. In summary, we suggest a novel alternative to the earlier available approaches, which is statistically sound and computationally feasible. PMID:24564758
Nagano, Yukio; Furuhashi, Hirofumi; Inaba, Takehito; Sasaki, Yukiko
2001-01-01
Complementary DNA encoding a DNA-binding protein, designated PLATZ1 (plant AT-rich sequence- and zinc-binding protein 1), was isolated from peas. The amino acid sequence of the protein is similar to those of other uncharacterized proteins predicted from the genome sequences of higher plants. However, no paralogous sequences have been found outside the plant kingdom. Multiple alignments among these paralogous proteins show that several cysteine and histidine residues are invariant, suggesting that these proteins are a novel class of zinc-dependent DNA-binding proteins with two distantly located regions, C-x2-H-x11-C-x2-C-x(4–5)-C-x2-C-x(3–7)-H-x2-H and C-x2-C-x(10–11)-C-x3-C. In an electrophoretic mobility shift assay, the zinc chelator 1,10-o-phenanthroline inhibited DNA binding, and two distant zinc-binding regions were required for DNA binding. A protein blot with 65ZnCl2 showed that both regions are required for zinc-binding activity. The PLATZ1 protein non-specifically binds to A/T-rich sequences, including the upstream region of the pea GTPase pra2 and plastocyanin petE genes. Expression of the PLATZ1 repressed those of the reporter constructs containing the coding sequence of luciferase gene driven by the cauliflower mosaic virus (CaMV) 35S90 promoter fused to the tandem repeat of the A/T-rich sequences. These results indicate that PLATZ1 is a novel class of plant-specific zinc-dependent DNA-binding protein responsible for A/T-rich sequence-mediated transcriptional repression. PMID:11600698
2011-01-01
Background The genus Pyrus belongs to the tribe Pyreae (the former subfamily Maloideae) of the family Rosaceae, and includes one of the most important commercial fruit crops, pear. The phylogeny of Pyrus has not been definitively reconstructed. In our previous efforts, the internal transcribed spacer region (ITS) revealed a poorly resolved phylogeny due to non-concerted evolution of nrDNA arrays. Therefore, introns of low copy nuclear genes (LCNG) are explored here for improved resolution. However, paralogs and lineage sorting are still two challenges for applying LCNGs in phylogenetic studies, and at least two independent nuclear loci should be compared. In this work the second intron of LEAFY and the alcohol dehydrogenase gene (Adh) were selected to investigate their molecular evolution and phylogenetic utility. Results DNA sequence analyses revealed a complex ortholog and paralog structure of Adh genes in Pyrus and Malus, the pears and apples. Comparisons between sequences from RT-PCR and genomic PCR indicate that some Adh homologs are putatively nonfunctional. A partial region of Adh1 was sequenced for 18 Pyrus species and three subparalogs representing Adh1-1 were identified. These led to poorly resolved phylogenies due to low sequence divergence and the inclusion of putative recombinants. For the second intron of LEAFY, multiple inparalogs were discovered for both LFY1int2 and LFY2int2. LFY1int2 is inadequate for phylogenetic analysis due to lineage sorting of two inparalogs. LFY2int2-N, however, showed a relatively high sequence divergence and led to the best-resolved phylogeny. This study documents the coexistence of outparalogs and inparalogs, and lineage sorting of these paralogs and orthologous copies. It reveals putative recombinants that can lead to incorrect phylogenetic inferences, and presents an improved phylogenetic resolution of Pyrus using LFY2int2-N. Conclusions Our study represents the first phylogenetic analyses based on LCNGs in Pyrus. Ancient and recent duplications lead to a complex structure of Adh outparalogs and inparalogs in Pyrus and Malus, resulting in neofunctionalization, nonfunctionalization and possible subfunctionalization. Among all investigated orthologs, LFY2int2-N is the best nuclear marker for phylogenetic reconstruction of Pyrus due to suitable sequence divergence and the absence of lineage sorting. PMID:21917170
Comprehensive analysis of orthologous protein domains using the HOPS database.
Storm, Christian E V; Sonnhammer, Erik L L
2003-10-01
One of the most reliable methods for protein function annotation is to transfer experimentally known functions from orthologous proteins in other organisms. Most methods for identifying orthologs operate on a subset of organisms with a completely sequenced genome, and treat proteins as single-domain units. However, it is well known that proteins are often made up of several independent domains, and there is a wealth of protein sequences from genomes that are not completely sequenced. A comprehensive set of protein domain families is found in the Pfam database. We wanted to apply orthology detection to Pfam families, but first some issues needed to be addressed. First, orthology detection becomes impractical and unreliable when too many species are included. Second, shorter domains contain less information. It is therefore important to assess the quality of the orthology assignment and avoid very short domains altogether. We present a database of orthologous protein domains in Pfam called HOPS: Hierarchical grouping of Orthologous and Paralogous Sequences. Orthology is inferred in a hierarchic system of phylogenetic subgroups using ortholog bootstrapping. To avoid the frequent errors stemming from horizontally transferred genes in bacteria, the analysis is presently limited to eukaryotic genes. The results are accessible in the graphical browser NIFAS, a Java tool originally developed for analyzing phylogenetic relations within Pfam families. The method was tested on a set of curated orthologs with experimentally verified function. In comparison to tree reconciliation with a complete species tree, our approach finds significantly more orthologs in the test set. Examples for investigating gene fusions and domain recombination using HOPS are given.
Genes encoding calmodulin-binding proteins in the Arabidopsis genome
NASA Technical Reports Server (NTRS)
Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.
2002-01-01
Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.
Functional Annotations of Paralogs: A Blessing and a Curse
Zallot, Rémi; Harrison, Katherine J.; Kolaczkowski, Bryan; de Crécy-Lagard, Valérie
2016-01-01
Gene duplication followed by mutation is a classic mechanism of neofunctionalization, producing gene families with functional diversity. In some cases, a single point mutation is sufficient to change the substrate specificity and/or the chemistry performed by an enzyme, making it difficult to accurately separate enzymes with identical functions from homologs with different functions. Because sequence similarity is often used as a basis for assigning functional annotations to genes, non-isofunctional gene families pose a great challenge for genome annotation pipelines. Here we describe how integrating evolutionary and functional information such as genome context, phylogeny, metabolic reconstruction and signature motifs may be required to correctly annotate multifunctional families. These integrative analyses can also lead to the discovery of novel gene functions, as hints from specific subgroups can guide the functional characterization of other members of the family. We demonstrate how careful manual curation processes using comparative genomics can disambiguate subgroups within large multifunctional families and discover their functions. We present the COG0720 protein family as a case study. We also discuss strategies to automate this process to improve the accuracy of genome functional annotation pipelines. PMID:27618105
Corradi, Nicolas; Hijri, Mohamed; Fumagalli, Luca; Sanders, Ian R
2004-11-01
The genes encoding alpha- and beta-tubulins have been widely sampled in most major fungal phyla and they are useful tools for fungal phylogeny. Here, we report the first isolation of alpha-tubulin sequences from arbuscular mycorrhizal fungi (AMF). In parallel, AMF beta-tubulins were sampled and analysed to identify the presence of paralogs of this gene. The AMF alpha-tubulin amino acid phylogeny was congruent with the results previously reported for AMF beta-tubulins and showed that AMF tubulins group together at a basal position in the fungal clade and showed high sequence similarities with members of the Chytridiomycota. This is in contrast with phylogenies for other regions of the AMF genome. The amount and nature of substitutions are consistent with an ancient divergence of both orthologs and paralogs of AMF tubulins. At the amino acid level, however, AMF tubulins have hardly evolved from those of the chytrids. This is remarkable given that these two groups are ancient and the monophyletic Glomeromycota probably diverged from basal fungal ancestors at least 500 million years ago. The specific primers we designed for the AMF tubulins, together with the high molecular variation we found among the AMF species we analysed, make AMF tubulin sequences potentially useful for AMF identification purposes.
Cuartas, Paola E.; Barrera, Gloria P.; Belaich, Mariano N.; Barreto, Emiliano; Ghiringhelli, Pablo D.; Villamizar, Laura F.
2015-01-01
Spodoptera frugiperda (Lepidoptera: Noctuidae) is a major pest in maize crops in Colombia, and affects several regions in America. A granulovirus isolated from S. frugiperda (SfGV VG008) has potential as an enhancer of insecticidal activity of previously described nucleopolyhedrovirus from the same insect species (SfMNPV). The SfGV VG008 genome was sequenced and analyzed showing circular double stranded DNA of 140,913 bp encoding 146 putative ORFs that include 37 Baculoviridae core genes, 88 shared with betabaculoviruses, two shared only with betabaculoviruses from Noctuide insects, two shared with alphabaculoviruses, three copies of own genes (paralogs) and the other 14 corresponding to unique genes without representation in the other baculovirus species. Particularly, the genome encodes for important virulence factors such as 4 chitinases and 2 enhancins. The sequence analysis revealed the existence of eight homologous regions (hrs) and also suggests processes of gene acquisition by horizontal transfer including the SfGV VG008 ORFs 046/047 (paralogs), 059, 089 and 099. The bioinformatics evidence indicates that the genome donors of mentioned genes could be alpha- and/or betabaculovirus species. The previous reported ability of SfGV VG008 to naturally co-infect the same host with other virus show a possible mechanism to capture genes and thus improve its fitness. PMID:25609309
Cuartas, Paola E; Barrera, Gloria P; Belaich, Mariano N; Barreto, Emiliano; Ghiringhelli, Pablo D; Villamizar, Laura F
2015-01-20
Spodoptera frugiperda (Lepidoptera: Noctuidae) is a major pest in maize crops in Colombia, and affects several regions in America. A granulovirus isolated from S. frugiperda (SfGV VG008) has potential as an enhancer of insecticidal activity of previously described nucleopolyhedrovirus from the same insect species (SfMNPV). The SfGV VG008 genome was sequenced and analyzed showing circular double stranded DNA of 140,913 bp encoding 146 putative ORFs that include 37 Baculoviridae core genes, 88 shared with betabaculoviruses, two shared only with betabaculoviruses from Noctuide insects, two shared with alphabaculoviruses, three copies of own genes (paralogs) and the other 14 corresponding to unique genes without representation in the other baculovirus species. Particularly, the genome encodes for important virulence factors such as 4 chitinases and 2 enhancins. The sequence analysis revealed the existence of eight homologous regions (hrs) and also suggests processes of gene acquisition by horizontal transfer including the SfGV VG008 ORFs 046/047 (paralogs), 059, 089 and 099. The bioinformatics evidence indicates that the genome donors of mentioned genes could be alpha- and/or betabaculovirus species. The previous reported ability of SfGV VG008 to naturally co-infect the same host with other virus show a possible mechanism to capture genes and thus improve its fitness.
Detecting false positive sequence homology: a machine learning approach.
Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Bybee, Seth M
2016-02-24
Accurate detection of homologous relationships of biological sequences (DNA or amino acid) amongst organisms is an important and often difficult task that is essential to various evolutionary studies, ranging from building phylogenies to predicting functional gene annotations. There are many existing heuristic tools, most commonly based on bidirectional BLAST searches that are used to identify homologous genes and combine them into two fundamentally distinct classes: orthologs and paralogs. Due to only using heuristic filtering based on significance score cutoffs and having no cluster post-processing tools available, these methods can often produce multiple clusters constituting unrelated (non-homologous) sequences. Therefore sequencing data extracted from incomplete genome/transcriptome assemblies originated from low coverage sequencing or produced by de novo processes without a reference genome are susceptible to high false positive rates of homology detection. In this paper we develop biologically informative features that can be extracted from multiple sequence alignments of putative homologous genes (orthologs and paralogs) and further utilized in context of guided experimentation to verify false positive outcomes. We demonstrate that our machine learning method trained on both known homology clusters obtained from OrthoDB and randomly generated sequence alignments (non-homologs), successfully determines apparent false positives inferred by heuristic algorithms especially among proteomes recovered from low-coverage RNA-seq data. Almost ~42 % and ~25 % of predicted putative homologies by InParanoid and HaMStR respectively were classified as false positives on experimental data set. Our process increases the quality of output from other clustering algorithms by providing a novel post-processing method that is both fast and efficient at removing low quality clusters of putative homologous genes recovered by heuristic-based approaches.
Al-Saadi, Abdulwahid; Reddy, Joseph D; Duan, Yong P; Brunings, Asha M; Yuan, Qiaoping; Gabriel, Dean W
2007-08-01
Citrus canker disease is caused by five groups of Xanthomonas citri strains that are distinguished primarily by host range: three from Asia (A, A*, and A(w)) and two that form a phylogenetically distinct clade and originated in South America (B and C). Every X. citri strain carries multiple DNA fragments that hybridize with pthA, which is essential for the pathogenicity of wide-host-range X. citri group A strain 3213. DNA fragments that hybridized with pthA were cloned from a representative strain from all five groups. Each strain carried one and only one pthA homolog that functionally complemented a knockout mutation of pthA in 3213. Every complementing homolog was of identical size to pthA and carried 17.5 nearly identical, direct tandem repeats, including three new genes from narrow-host-range groups C (pthC), A(w) (pthAW), and A* (pthA*). Every noncomplementing paralog was of a different size; one of these was sequenced from group A* (pthA*-2) and was found to have an intact promoter and full-length reading frame but with 15.5 repeats. None of the complementing homologs nor any of the noncomplementing paralogs conferred avirulence to 3213 on grapefruit or suppressed avirulence of a group A* strain on grapefruit. A knockout mutation of pthC in a group C strain resulted in loss of pathogenicity on lime, but the strain was unaffected in ability to elicit an HR on grapefruit. This pthC- mutant was fully complemented by pthA, pthB, or pthC. Analysis of the predicted amino-acid sequences of all functional pthA homologs and nonfunctional paralogs indicated that the specific sequence of the 17th repeat may be essential for pathogenicity of X. citri on citrus.
Ardui, Simon; Ameur, Adam; Vermeesch, Joris R; Hestand, Matthew S
2018-01-01
Abstract Short read massive parallel sequencing has emerged as a standard diagnostic tool in the medical setting. However, short read technologies have inherent limitations such as GC bias, difficulties mapping to repetitive elements, trouble discriminating paralogous sequences, and difficulties in phasing alleles. Long read single molecule sequencers resolve these obstacles. Moreover, they offer higher consensus accuracies and can detect epigenetic modifications from native DNA. The first commercially available long read single molecule platform was the RS system based on PacBio's single molecule real-time (SMRT) sequencing technology, which has since evolved into their RSII and Sequel systems. Here we capsulize how SMRT sequencing is revolutionizing constitutional, reproductive, cancer, microbial and viral genetic testing. PMID:29401301
Rana, Satiander; Lattoo, Surrinder K.; Dhar, Niha; Razdan, Sumeer; Bhat, Wajid Waheed; Dhar, Rekha S.; Vishwakarma, Ram
2013-01-01
Withania somnifera (L.) Dunal, a highly reputed medicinal plant, synthesizes a large array of steroidal lactone triterpenoids called withanolides. Although its chemical profile and pharmacological activities have been studied extensively during the last two decades, limited attempts have been made to decipher the biosynthetic route and identification of key regulatory genes involved in withanolide biosynthesis. Cytochrome P450 reductase is the most imperative redox partner of multiple P450s involved in primary and secondary metabolite biosynthesis. We describe here the cloning and characterization of two paralogs of cytochrome P450 reductase from W. somnifera. The full length paralogs of WsCPR1 and WsCPR2 have open reading frames of 2058 and 2142 bp encoding 685 and 713 amino acid residues, respectively. Phylogenetic analysis demonstrated that grouping of dual CPRs was in accordance with class I and class II of eudicotyledon CPRs. The corresponding coding sequences were expressed in Escherichia coli as glutathione-S-transferase fusion proteins, purified and characterized. Recombinant proteins of both the paralogs were purified with their intact membrane anchor regions and it is hitherto unreported for other CPRs which have been purified from microsomal fraction. Southern blot analysis suggested that two divergent isoforms of CPR exist independently in Withania genome. Quantitative real-time PCR analysis indicated that both genes were widely expressed in leaves, stalks, roots, flowers and berries with higher expression level of WsCPR2 in comparison to WsCPR1. Similar to CPRs of other plant species, WsCPR1 was un-inducible while WsCPR2 transcript level increased in a time-dependent manner after elicitor treatments. High performance liquid chromatography of withanolides extracted from elicitor-treated samples showed a significant increase in two of the key withanolides, withanolide A and withaferin A, possibly indicating the role of WsCPR2 in withanolide biosynthesis. Present investigation so far is the only report of characterization of CPR paralogs from W. somnifera. PMID:23437311
Dos Santos, Helena G; Siltberg-Liberles, Jessica
2016-09-19
One of the largest multigene families in Metazoa are the tyrosine kinases (TKs). These are important multifunctional proteins that have evolved as dynamic switches that perform tyrosine phosphorylation and other noncatalytic activities regulated by various allosteric mechanisms. TKs interact with each other and with other molecules, ultimately activating and inhibiting different signaling pathways. TKs are implicated in cancer and almost 30 FDA-approved TK inhibitors are available. However, specific binding is a challenge when targeting an active site that has been conserved in multiple protein paralogs for millions of years. A cassette domain (CD) containing SH3-SH2-Tyrosine Kinase domains reoccurs in vertebrate nonreceptor TKs. Although part of the CD function is shared between TKs, it also presents TK specific features. Here, the evolutionary dynamics of sequence, structure, and phosphorylation across the CD in 17 TK paralogs have been investigated in a large-scale study. We establish that TKs often have ortholog-specific structural disorder and phosphorylation patterns, while secondary structure elements, as expected, are highly conserved. Further, domain-specific differences are at play. Notably, we found the catalytic domain to fluctuate more in certain secondary structure elements than the regulatory domains. By elucidating how different properties evolve after gene duplications and which properties are specifically conserved within orthologs, the mechanistic understanding of protein evolution is enriched and regions supposedly critical for functional divergence across paralogs are highlighted. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Samson, Marie-Laure
2008-01-01
Background The Drosophila gene embryonic lethal abnormal visual system (elav) is the prototype of a gene family present in all metazoans. Its members encode structurally conserved neuronal proteins with three RNA Recognition Motifs (RRM) but they paradoxically act at diverse levels of post-transcriptional regulation. In an attempt to understand the history of this family, we searched for orthologs in eleven completely sequenced genomes, including those of humans, D. melanogaster and C. elegans, for which cDNAs are available. Results We analyzed 23 orthologs/paralogs of elav, and found evidence of gain/loss of gene copy number. For one set of genes, including elav itself, the coding sequences are free of introns and their products most resemble ELAV. The remaining genes show remarkable conservation of their exon organization, and their products most resemble FNE and RBP9, proteins encoded by the two elav paralogs of Drosophila. Remarkably, three of the conserved exon junctions are both close to structural elements, involved respectively in protein-RNA interactions and in the regulation of sub-cellular localization, and in the vicinity of diverse sequence variations. Conclusion The data indicate that the essential elav gene of Drosophila is newly emerged, restricted to dipterans and of retrotransposed origin. We propose that the conserved exon junctions constitute potential sites for sequence/function modifications, and that RRM binding proteins, whose function relies upon plastic RNA-protein interactions, may have played an important role in brain evolution. PMID:18715504
Vidal, Ramon Oliveira; Mondego, Jorge Maurício Costa; Pot, David; Ambrósio, Alinne Batista; Andrade, Alan Carvalho; Pereira, Luiz Filipe Protasio; Colombo, Carlos Augusto; Vieira, Luiz Gonzaga Esteves; Carazzolle, Marcelo Falsarella; Pereira, Gonçalo Amarante Guimarães
2010-01-01
Polyploidization constitutes a common mode of evolution in flowering plants. This event provides the raw material for the divergence of function in homeologous genes, leading to phenotypic novelty that can contribute to the success of polyploids in nature or their selection for use in agriculture. Mounting evidence underlined the existence of homeologous expression biases in polyploid genomes; however, strategies to analyze such transcriptome regulation remained scarce. Important factors regarding homeologous expression biases remain to be explored, such as whether this phenomenon influences specific genes, how paralogs are affected by genome doubling, and what is the importance of the variability of homeologous expression bias to genotype differences. This study reports the expressed sequence tag assembly of the allopolyploid Coffea arabica and one of its direct ancestors, Coffea canephora. The assembly was used for the discovery of single nucleotide polymorphisms through the identification of high-quality discrepancies in overlapped expressed sequence tags and for gene expression information indirectly estimated by the transcript redundancy. Sequence diversity profiles were evaluated within C. arabica (Ca) and C. canephora (Cc) and used to deduce the transcript contribution of the Coffea eugenioides (Ce) ancestor. The assignment of the C. arabica haplotypes to the C. canephora (CaCc) or C. eugenioides (CaCe) ancestral genomes allowed us to analyze gene expression contributions of each subgenome in C. arabica. In silico data were validated by the quantitative polymerase chain reaction and allele-specific combination TaqMAMA-based method. The presence of differential expression of C. arabica homeologous genes and its implications in coffee gene expression, ontology, and physiology are discussed. PMID:20864545
Moore, Abigail J; Vos, Jurriaan M De; Hancock, Lillian P; Goolsby, Eric; Edwards, Erika J
2018-05-01
Hybrid enrichment is an increasingly popular approach for obtaining hundreds of loci for phylogenetic analysis across many taxa quickly and cheaply. The genes targeted for sequencing are typically single-copy loci, which facilitate a more straightforward sequence assembly and homology assignment process. However, this approach limits the inclusion of most genes of functional interest, which often belong to multi-gene families. Here, we demonstrate the feasibility of including large gene families in hybrid enrichment protocols for phylogeny reconstruction and subsequent analyses of molecular evolution, using a new set of bait sequences designed for the "portullugo" (Caryophyllales), a moderately sized lineage of flowering plants (~ 2200 species) that includes the cacti and harbors many evolutionary transitions to C$_{\\mathrm{4}}$ and CAM photosynthesis. Including multi-gene families allowed us to simultaneously infer a robust phylogeny and construct a dense sampling of sequences for a major enzyme of C$_{\\mathrm{4}}$ and CAM photosynthesis, which revealed the accumulation of adaptive amino acid substitutions associated with C$_{\\mathrm{4}}$ and CAM origins in particular paralogs. Our final set of matrices for phylogenetic analyses included 75-218 loci across 74 taxa, with ~ 50% matrix completeness across data sets. Phylogenetic resolution was greatly improved across the tree, at both shallow and deep levels. Concatenation and coalescent-based approaches both resolve the sister lineage of the cacti with strong support: Anacampserotaceae $+$ Portulacaceae, two lineages of mostly diminutive succulent herbs of warm, arid regions. In spite of this congruence, BUCKy concordance analyses demonstrated strong and conflicting signals across gene trees. Our results add to the growing number of examples illustrating the complexity of phylogenetic signals in genomic-scale data.
Antanaviciute, Laima; Fernández-Fernández, Felicidad; Jansen, Johannes; Banchi, Elisa; Evans, Katherine M; Viola, Roberto; Velasco, Riccardo; Dunwell, Jim M; Troggio, Michela; Sargent, Daniel J
2012-05-25
A whole-genome genotyping array has previously been developed for Malus using SNP data from 28 Malus genotypes. This array offers the prospect of high throughput genotyping and linkage map development for any given Malus progeny. To test the applicability of the array for mapping in diverse Malus genotypes, we applied the array to the construction of a SNP-based linkage map of an apple rootstock progeny. Of the 7,867 Malus SNP markers on the array, 1,823 (23.2%) were heterozygous in one of the two parents of the progeny, 1,007 (12.8%) were heterozygous in both parental genotypes, whilst just 2.8% of the 921 Pyrus SNPs were heterozygous. A linkage map spanning 1,282.2 cM was produced comprising 2,272 SNP markers, 306 SSR markers and the S-locus. The length of the M432 linkage map was increased by 52.7 cM with the addition of the SNP markers, whilst marker density increased from 3.8 cM/marker to 0.5 cM/marker. Just three regions in excess of 10 cM remain where no markers were mapped. We compared the positions of the mapped SNP markers on the M432 map with their predicted positions on the 'Golden Delicious' genome sequence. A total of 311 markers (13.7% of all mapped markers) mapped to positions that conflicted with their predicted positions on the 'Golden Delicious' pseudo-chromosomes, indicating the presence of paralogous genomic regions or mis-assignments of genome sequence contigs during the assembly and anchoring of the genome sequence. We incorporated data for the 2,272 SNP markers onto the map of the M432 progeny and have presented the most complete and saturated map of the full 17 linkage groups of M. pumila to date. The data were generated rapidly in a high-throughput semi-automated pipeline, permitting significant savings in time and cost over linkage map construction using microsatellites. The application of the array will permit linkage maps to be developed for QTL analyses in a cost-effective manner, and the identification of SNPs that have been assigned erroneous positions on the 'Golden Delicious' reference sequence will assist in the continued improvement of the genome sequence assembly for that variety.
BAC sequencing using pooled methods.
Saski, Christopher A; Feltus, F Alex; Parida, Laxmi; Haiminen, Niina
2015-01-01
Shotgun sequencing and assembly of a large, complex genome can be both expensive and challenging to accurately reconstruct the true genome sequence. Repetitive DNA arrays, paralogous sequences, polyploidy, and heterozygosity are main factors that plague de novo genome sequencing projects that typically result in highly fragmented assemblies and are difficult to extract biological meaning. Targeted, sub-genomic sequencing offers complexity reduction by removing distal segments of the genome and a systematic mechanism for exploring prioritized genomic content through BAC sequencing. If one isolates and sequences the genome fraction that encodes the relevant biological information, then it is possible to reduce overall sequencing costs and efforts that target a genomic segment. This chapter describes the sub-genome assembly protocol for an organism based upon a BAC tiling path derived from a genome-scale physical map or from fine mapping using BACs to target sub-genomic regions. Methods that are described include BAC isolation and mapping, DNA sequencing, and sequence assembly.
Kongchum, Pawapol; Hallerman, Eric M; Hulata, Gideon; David, Lior; Palti, Yniv
2011-01-01
Induction of innate immune pathways is critical for early host defense, but there is limited understanding of how teleost fishes recognize pathogen molecules and activate these pathways. In mammals, cells of the innate immune system detect pathogenic molecular structures using pattern recognition receptors (PRRs). TLR9 functions as a PRR that recognizes CpG motifs in bacterial and viral DNA and requires adaptor molecules MyD88 and TRAF6 for signal transduction. Here we report full-length cDNA isolation, structural characterization and tissue mRNA expression analysis of the common carp (cc) TLR9, MyD88 and TRAF6 gene orthologs. The ccTLR9 open-reading frame (ORF) is predicted to encode a 1064-amino acid (aa) protein. We found that MyD88 and TRAF6 genes are duplicated in common carp. This is the first report of TRAF6 duplication in a vertebrate genome and stronger evidence in support of MyD88 duplication is provided. The ccMyD88a and b ORFs are predicted to encode 288-aa and 284-aa peptides, respectively. They share 91% aa sequence identity between paralogs. The ccTRAF6a and b ORFs are both predicted to encode 543-aa peptides sharing 95% aa sequence identity between paralogs. The ccTLR9 gene is contained in a single large exon. The ccMyD88a and ccMyD88b coding sequences span five exons. The TRAF6b gene spans six exons. PCR amplification to obtain the entire coding sequence of ccTRAF6a gene was not successful. The 2104-bp fragment amplified covers the 3' end of the gene and it contains a partial sequence of one exon and three complete exons. The predicated protein domains of the ccTLR9, ccMyD88 and ccTRAF6 are conserved and resemble orthologs from other vertebrates. Real-time quantitative PCR assays of the ccTLR9, MyD88a and b, and TRAF6a and b gene transcripts in healthy common carp indicated that mRNA expression varied between tissues. Differential expression of duplicate copies were found for ccMyD88 and ccTRAF6 in white and red muscle tissues, suggesting that paralogs may have evolved and attained a new function. The genomic information we describe in this paper provides evidence of sequence and structural conservation of immune response genes in common carp. Published by Elsevier Ltd.
Manzanilla, Vincent; Bruneau, Anne
2012-10-01
The Caesalpinieae grade (Leguminosae) forms a morphologically and ecologically diverse group of mostly tropical tree species with a complex evolutionary history. This grade comprises several distinct lineages, but the exact delimitation of the group relative to subfamily Mimosoideae and other members of subfamily Caesalpinioideae, as well as phylogenetic relationships among the lineages are uncertain. With the aim of better resolving phylogenetic relationships within the Caesalpinieae grade, we investigated the utility of several nuclear markers developed from genomic studies in the Papilionoideae. We cloned and sequenced the low copy nuclear gene sucrose synthase (SUSY) and combined the data with plastid trnL and matK sequences. SUSY has two paralogs in the Caesalpinieae grade and in the Mimosoideae, but occurs as a single copy in all other legumes tested. Bayesian and maximum likelihood phylogenetic analyses suggest the two nuclear markers are congruent with plastid DNA data. The Caesalpinieae grade is divided into four well-supported clades (Cassia, Caesalpinia, Tachigali and Peltophorum clades), a poorly supported clade of Dimorphandra Group genera, and two paraphyletic groups, one with other Dimorphandra Group genera and the other comprising genera previously recognized as the Umtiza clade. A selection analysis of the paralogs, using selection models from PAML, suggests that SUSY genes are subjected to a purifying selection. One of the SUSY paralogs, under slightly stronger positive selection, may be undergoing subfunctionalization. The low copy SUSY gene is useful for phylogeny reconstruction in the Caesalpinieae despite the presence of duplicate copies. This study confirms that the Caesalpinieae grade is an artificial group, and highlights the need for further analyses of lineages at the base of the Mimosoideae. Copyright © 2012 Elsevier Inc. All rights reserved.
The FRAGILE FIBER8 gene was previously shown to be required for the biosynthesis of the reducing end tetrasaccharide sequence of glucuronoxylan (GX) in Arabidopsis thaliana. Here, we demonstrate that F8H, a close homolog of FRA8, is a functional ortholog of FRA8 involved in GX bi...
Analysis of the reptile CD1 genes: evolutionary implications.
Yang, Zhi; Wang, Chunyan; Wang, Tao; Bai, Jianhui; Zhao, Yu; Liu, Xuhan; Ma, Qingwei; Wu, Xiaobing; Guo, Ying; Zhao, Yaofeng; Ren, Liming
2015-06-01
CD1, as the third family of antigen-presenting molecules, is previously only found in mammals and chickens, which suggests that the chicken and mammalian CD1 shared a common ancestral gene emerging at least 310 million years ago. Here, we describe CD1 genes in the green anole lizard and Crocodylia, demonstrating that CD1 is ubiquitous in mammals, birds, and reptiles. Although the reptilian CD1 protein structures are predicted to be similar to human CD1d and chicken CD1.1, CD1 isotypes are not found to be orthologous between mammals, birds, and reptiles according to phylogenetic analyses, suggesting an independent diversification of CD1 isotypes during the speciation of mammals, birds, and reptiles. In the green anole lizard, although the single CD1 locus and MHC I gene are located on the same chromosome, there is an approximately 10-Mb-long sequence in between, and interestingly, several genes flanking the CD1 locus belong to the MHC paralogous region on human chromosome 19. The CD1 genes in Crocodylia are located in two loci, respectively linked to the MHC region and MHC paralogous region (corresponding to the MHC paralogous region on chromosome 19). These results provide new insights for studying the origin and evolution of CD1.
Evolution of the vertebrate insulin receptor substrate (Irs) gene family.
Al-Salam, Ahmad; Irwin, David M
2017-06-23
Insulin receptor substrate (Irs) proteins are essential for insulin signaling as they allow downstream effectors to dock with, and be activated by, the insulin receptor. A family of four Irs proteins have been identified in mice, however the gene for one of these, IRS3, has been pseudogenized in humans. While it is known that the Irs gene family originated in vertebrates, it is not known when it originated and which members are most closely related to each other. A better understanding of the evolution of Irs genes and proteins should provide insight into the regulation of metabolism by insulin. Multiple genes for Irs proteins were identified in a wide variety of vertebrate species. Phylogenetic and genomic neighborhood analyses indicate that this gene family originated very early in vertebrae evolution. Most Irs genes were duplicated and retained in fish after the fish-specific genome duplication. Irs genes have been lost of various lineages, including Irs3 in primates and birds and Irs1 in most fish. Irs3 and Irs4 experienced an episode of more rapid protein sequence evolution on the ancestral mammalian lineage. Comparisons of the conservation of the proteins sequences among Irs paralogs show that domains involved in binding to the plasma membrane and insulin receptors are most strongly conserved, while divergence has occurred in sequences involved in interacting with downstream effector proteins. The Irs gene family originated very early in vertebrate evolution, likely through genome duplications, and in parallel with duplications of other components of the insulin signaling pathway, including insulin and the insulin receptor. While the N-terminal sequences of these proteins are conserved among the paralogs, changes in the C-terminal sequences likely allowed changes in biological function.
Increased taxon sampling reveals thousands of hidden orthologs in flatworms
2017-01-01
Gains and losses shape the gene complement of animal lineages and are a fundamental aspect of genomic evolution. Acquiring a comprehensive view of the evolution of gene repertoires is limited by the intrinsic limitations of common sequence similarity searches and available databases. Thus, a subset of the gene complement of an organism consists of hidden orthologs, i.e., those with no apparent homology to sequenced animal lineages—mistakenly considered new genes—but actually representing rapidly evolving orthologs or undetected paralogs. Here, we describe Leapfrog, a simple automated BLAST pipeline that leverages increased taxon sampling to overcome long evolutionary distances and identify putative hidden orthologs in large transcriptomic databases by transitive homology. As a case study, we used 35 transcriptomes of 29 flatworm lineages to recover 3427 putative hidden orthologs, some unidentified by OrthoFinder and HaMStR, two common orthogroup inference algorithms. Unexpectedly, we do not observe a correlation between the number of putative hidden orthologs in a lineage and its “average” evolutionary rate. Hidden orthologs do not show unusual sequence composition biases that might account for systematic errors in sequence similarity searches. Instead, gene duplication with divergence of one paralog and weak positive selection appear to underlie hidden orthology in Platyhelminthes. By using Leapfrog, we identify key centrosome-related genes and homeodomain classes previously reported as absent in free-living flatworms, e.g., planarians. Altogether, our findings demonstrate that hidden orthologs comprise a significant proportion of the gene repertoire in flatworms, qualifying the impact of gene losses and gains in gene complement evolution. PMID:28400424
DOE Office of Scientific and Technical Information (OSTI.GOV)
Devos, Nicolas; Szövényi, Péter; Weston, David J.
In this study, the goal of this research was to investigate whether there has been a whole-genome duplication (WGD) in the ancestry of Sphagnum (peatmoss) or the class Sphagnopsida, and to determine if the timing of any such duplication(s) and patterns of paralog retention could help explain the rapid radiation and current ecological dominance of peatmosses.
Orthologs and paralogs - we need to get it right
Jensen, Roy A
2001-01-01
A response to Homologuephobia, by Gregory A Petsko, Genome Biology 2001 2:comment1002.1-1002.2, to An apology for orthologs - or brave new memes by Eugene V Koonin, Genome Biology 2001, 2:comment1005.1-1005.2, and to Can sequence determine function? by John A Gerlt and Patricia C Babbitt, Genome Biology 2000, 1:reviews0005.1-0005.10. PMID:11532207
Devos, Nicolas; Szövényi, Péter; Weston, David J.; ...
2016-02-22
In this study, the goal of this research was to investigate whether there has been a whole-genome duplication (WGD) in the ancestry of Sphagnum (peatmoss) or the class Sphagnopsida, and to determine if the timing of any such duplication(s) and patterns of paralog retention could help explain the rapid radiation and current ecological dominance of peatmosses.
Shiina, Takashi; Ando, Asako; Suto, Yumiko; Kasai, Fumio; Shigenari, Atsuko; Takishima, Nobusada; Kikkawa, Eri; Iwata, Kyoko; Kuwano, Yuko; Kitamura, Yuka; Matsuzawa, Yumiko; Sano, Kazumi; Nogami, Masahiro; Kawata, Hisako; Li, Suyun; Fukuzumi, Yasuhito; Yamazaki, Masaaki; Tashiro, Hiroyuki; Tamiya, Gen; Kohda, Atsushi; Okumura, Katsuzumi; Ikemura, Toshimichi; Soeda, Eiichi; Mizuki, Nobuhisa; Kimura, Minoru; Bahram, Seiamak; Inoko, Hidetoshi
2001-01-01
Human chromosomes 1q21–q25, 6p21.3–22.2, 9q33–q34, and 19p13.1–p13.4 carry clusters of paralogous loci, to date best defined by the flagship 6p MHC region. They have presumably been created by two rounds of large-scale genomic duplications around the time of vertebrate emergence. Phylogenetically, the 1q21–25 region seems most closely related to the 6p21.3 MHC region, as it is only the MHC paralogous region that includes bona fide MHC class I genes, the CD1 and MR1 loci. Here, to clarify the genomic structure of this model MHC paralogous region as well as to gain insight into the evolutionary dynamics of the entire quadriplication process, a detailed analysis of a critical 1.7 megabase (Mb) region was performed. To this end, a composite, deep, YAC, BAC, and PAC contig encompassing all five CD1 genes and linking the centromeric +P5 locus to the telomeric KRTC7 locus was constructed. Within this contig a 1.1-Mb BAC and PAC core segment joining CD1D to FCER1A was fully sequenced and thoroughly analyzed. This led to the mapping of a total of 41 genes (12 expressed genes, 12 possibly expressed genes, and 17 pseudogenes), among which 31 were novel. The latter include 20 olfactory receptor (OR) genes, 9 of which are potentially expressed. Importantly, CD1, SPTA1, OR, and FCERIA belong to multigene families, which have paralogues in the other three regions. Furthermore, it is noteworthy that 12 of the 13 expressed genes in the 1q21–q22 region around the CD1 loci are immunologically relevant. In addition to CD1A-E, these include SPTA1, MNDA, IFI-16, AIM2, BL1A, FY and FCERIA. This functional convergence of structurally unrelated genes is reminiscent of the 6p MHC region, and perhaps represents the emergence of yet another antigen presentation gene cluster, in this case dedicated to lipid/glycolipid antigens rather than antigen-derived peptides. [The nucleotide sequence data reported in this paper have been submitted to the DDBJ, EMBL, and GenBank databases under accession nos. AB045357–AB045365.] PMID:11337475
Balasubramaniam, Shandiya; Bray, Rebecca D; Mulder, Raoul A; Sunnucks, Paul; Pavlova, Alexandra; Melville, Jane
2016-05-21
The major histocompatibility complex (MHC) plays a crucial role in the adaptive immune system and has been extensively studied across vertebrate taxa. Although the function of MHC genes appears to be conserved across taxa, there is great variation in the number and organisation of these genes. Among avian species, for instance, there are notable differences in MHC structure between passerine and non-passerine lineages: passerines typically have a high number of highly polymorphic MHC paralogs whereas non-passerines have fewer loci and lower levels of polymorphism. Although the occurrence of highly polymorphic MHC paralogs in passerines is well documented, their evolutionary origins are relatively unexplored. The majority of studies have focussed on the more derived passerine lineages and there is very little empirical information on the diversity of the MHC in basal passerine lineages. We undertook a study of MHC diversity and evolutionary relationships across seven species from four families (Climacteridae, Maluridae, Pardalotidae, Meliphagidae) that comprise a prominent component of the basal passerine lineages. We aimed to determine if highly polymorphic MHC paralogs have an early evolutionary origin within passerines or are a more derived feature of the infraorder Passerida. We identified 177 alleles of the MHC class II β exon 2 in seven basal passerine species, with variation in numbers of alleles across individuals and species. Overall, we found evidence of multiple gene loci, pseudoalleles, trans-species polymorphism and high allelic diversity in these basal lineages. Phylogenetic reconstruction of avian lineages based on MHC class II β exon 2 sequences strongly supported the monophyletic grouping of basal and derived passerine species. Our study provides evidence of a large number of highly polymorphic MHC paralogs in seven basal passerine species, with strong similarities to the MHC described in more derived passerine lineages rather than the simpler MHC in non-passerine lineages. These findings indicate an early evolutionary origin of highly polymorphic MHC paralogs in passerines and shed light on the evolutionary forces shaping the avian MHC.
Hu, Ruibo; Chi, Xiaoyuan; Chai, Guohua; Kong, Yingzhen; He, Guo; Wang, Xiaoyu; Shi, Dachuan; Zhang, Dongyuan; Zhou, Gongke
2012-01-01
Background Homeodomain-leucine zipper (HD-ZIP) proteins are plant-specific transcriptional factors known to play crucial roles in plant development. Although sequence phylogeny analysis of Populus HD-ZIPs was carried out in a previous study, no systematic analysis incorporating genome organization, gene structure, and expression compendium has been conducted in model tree species Populus thus far. Principal Findings In this study, a comprehensive analysis of Populus HD-ZIP gene family was performed. Sixty-three full-length HD-ZIP genes were found in Populus genome. These Populus HD-ZIP genes were phylogenetically clustered into four distinct subfamilies (HD-ZIP I–IV) and predominately distributed across 17 linkage groups (LG). Fifty genes from 25 Populus paralogous pairs were located in the duplicated blocks of Populus genome and then preferentially retained during the sequential evolutionary courses. Genomic organization analyses indicated that purifying selection has played a pivotal role in the retention and maintenance of Populus HD-ZIP gene family. Microarray analysis has shown that 21 Populus paralogous pairs have been differentially expressed across different tissues and under various stresses, with five paralogous pairs showing nearly identical expression patterns, 13 paralogous pairs being partially redundant and three paralogous pairs diversifying significantly. Quantitative real-time RT-PCR (qRT-PCR) analysis performed on 16 selected Populus HD-ZIP genes in different tissues and under both drought and salinity stresses confirms their tissue-specific and stress-inducible expression patterns. Conclusions Genomic organizations indicated that segmental duplications contributed significantly to the expansion of Populus HD-ZIP gene family. Exon/intron organization and conserved motif composition of Populus HD-ZIPs are highly conservative in the same subfamily, suggesting the members in the same subfamilies may also have conservative functionalities. Microarray and qRT-PCR analyses showed that 89% (56 out of 63) of Populus HD-ZIPs were duplicate genes that might have been retained by substantial subfunctionalization. Taken together, these observations may lay the foundation for future functional analysis of Populus HD-ZIP genes to unravel their biological roles. PMID:22359569
Zhao, Man; Gu, Yongzhe; He, Lingli; Chen, Qingshan; He, Chaoying
2015-05-15
The DA1 gene family is plant-specific and Arabidopsis DA1 regulates seed and organ size, but the functions in soybeans are unknown. The cultivated soybean (Glycine max) is believed to be domesticated from the annual wild soybeans (Glycine soja). To evaluate whether DA1-like genes were involved in the evolution of soybeans, we compared variation at both sequence and expression levels of DA1-like genes from G. max (GmaDA1) and G. soja (GsoDA1). Sequence identities were extremely high between the orthologous pairs between soybeans, while the paralogous copies in a soybean species showed a relatively high divergence. Moreover, the expression variation of DA1-like paralogous genes in soybean was much greater than the orthologous gene pairs between the wild and cultivated soybeans during development and challenging abiotic stresses such as salinity. We further found that overexpressing GsoDA1 genes did not affect seed size. Nevertheless, overexpressing them reduced transgenic Arabidopsis seed germination sensitivity to salt stress. Moreover, most of these genes could improve salt tolerance of the transgenic Arabidopsis plants, corroborated by a detection of expression variation of several key genes in the salt-tolerance pathways. Our work suggested that expression diversification of DA1-like genes is functionally associated with adaptive radiation of soybeans, reinforcing that the plant-specific DA1 gene family might have contributed to the successful adaption to complex environments and radiation of the plants.
Preston, Jill C.; Kellogg, Elizabeth A.
2006-01-01
Gene duplication is an important mechanism for the generation of evolutionary novelty. Paralogous genes that are not silenced may evolve new functions (neofunctionalization) that will alter the developmental outcome of preexisting genetic pathways, partition ancestral functions (subfunctionalization) into divergent developmental modules, or function redundantly. Functional divergence can occur by changes in the spatio-temporal patterns of gene expression and/or by changes in the activities of their protein products. We reconstructed the evolutionary history of two paralogous monocot MADS-box transcription factors, FUL1 and FUL2, and determined the evolution of sequence and gene expression in grass AP1/FUL-like genes. Monocot AP1/FUL-like genes duplicated at the base of Poaceae and codon substitutions occurred under relaxed selection mostly along the branch leading to FUL2. Following the duplication, FUL1 was apparently lost from early diverging taxa, a pattern consistent with major changes in grass floral morphology. Overlapping gene expression patterns in leaves and spikelets indicate that FUL1 and FUL2 probably share some redundant functions, but that FUL2 may have become temporally restricted under partial subfunctionalization to particular stages of floret development. These data have allowed us to reconstruct the history of AP1/FUL-like genes in Poaceae and to hypothesize a role for this gene duplication in the evolution of the grass spikelet. PMID:16816429
Gene duplications in prokaryotes can be associated with environmental adaptation
2010-01-01
Background Gene duplication is a normal evolutionary process. If there is no selective advantage in keeping the duplicated gene, it is usually reduced to a pseudogene and disappears from the genome. However, some paralogs are retained. These gene products are likely to be beneficial to the organism, e.g. in adaptation to new environmental conditions. The aim of our analysis is to investigate the properties of paralog-forming genes in prokaryotes, and to analyse the role of these retained paralogs by relating gene properties to life style of the corresponding prokaryotes. Results Paralogs were identified in a number of prokaryotes, and these paralogs were compared to singletons of persistent orthologs based on functional classification. This showed that the paralogs were associated with for example energy production, cell motility, ion transport, and defence mechanisms. A statistical overrepresentation analysis of gene and protein annotations was based on paralogs of the 200 prokaryotes with the highest fraction of paralog-forming genes. Biclustering of overrepresented gene ontology terms versus species was used to identify clusters of properties associated with clusters of species. The clusters were classified using similarity scores on properties and species to identify interesting clusters, and a subset of clusters were analysed by comparison to literature data. This analysis showed that paralogs often are associated with properties that are important for survival and proliferation of the specific organisms. This includes processes like ion transport, locomotion, chemotaxis and photosynthesis. However, the analysis also showed that the gene ontology terms sometimes were too general, imprecise or even misleading for automatic analysis. Conclusions Properties described by gene ontology terms identified in the overrepresentation analysis are often consistent with individual prokaryote lifestyles and are likely to give a competitive advantage to the organism. Paralogs and singletons dominate different categories of functional classification, where paralogs in particular seem to be associated with processes involving interaction with the environment. PMID:20961426
Gene duplications in prokaryotes can be associated with environmental adaptation.
Bratlie, Marit S; Johansen, Jostein; Sherman, Brad T; Huang, Da Wei; Lempicki, Richard A; Drabløs, Finn
2010-10-20
Gene duplication is a normal evolutionary process. If there is no selective advantage in keeping the duplicated gene, it is usually reduced to a pseudogene and disappears from the genome. However, some paralogs are retained. These gene products are likely to be beneficial to the organism, e.g. in adaptation to new environmental conditions. The aim of our analysis is to investigate the properties of paralog-forming genes in prokaryotes, and to analyse the role of these retained paralogs by relating gene properties to life style of the corresponding prokaryotes. Paralogs were identified in a number of prokaryotes, and these paralogs were compared to singletons of persistent orthologs based on functional classification. This showed that the paralogs were associated with for example energy production, cell motility, ion transport, and defence mechanisms. A statistical overrepresentation analysis of gene and protein annotations was based on paralogs of the 200 prokaryotes with the highest fraction of paralog-forming genes. Biclustering of overrepresented gene ontology terms versus species was used to identify clusters of properties associated with clusters of species. The clusters were classified using similarity scores on properties and species to identify interesting clusters, and a subset of clusters were analysed by comparison to literature data. This analysis showed that paralogs often are associated with properties that are important for survival and proliferation of the specific organisms. This includes processes like ion transport, locomotion, chemotaxis and photosynthesis. However, the analysis also showed that the gene ontology terms sometimes were too general, imprecise or even misleading for automatic analysis. Properties described by gene ontology terms identified in the overrepresentation analysis are often consistent with individual prokaryote lifestyles and are likely to give a competitive advantage to the organism. Paralogs and singletons dominate different categories of functional classification, where paralogs in particular seem to be associated with processes involving interaction with the environment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mangelsen, Elke; Kilian, Joachim; Berendzen, Kenneth W.
2008-02-01
WRKY proteins belong to the WRKY-GCM1 superfamily of zinc finger transcription factors that have been subject to a large plant-specific diversification. For the cereal crop barley (Hordeum vulgare), three different WRKY proteins have been characterized so far, as regulators in sucrose signaling, in pathogen defense, and in response to cold and drought, respectively. However, their phylogenetic relationship remained unresolved. In this study, we used the available sequence information to identify a minimum number of 45 barley WRKY transcription factor (HvWRKY) genes. According to their structural features the HvWRKY factors were classified into the previously defined polyphyletic WRKY subgroups 1 tomore » 3. Furthermore, we could assign putative orthologs of the HvWRKY proteins in Arabidopsis and rice. While in most cases clades of orthologous proteins were formed within each group or subgroup, other clades were composed of paralogous proteins for the grasses and Arabidopsis only, which is indicative of specific gene radiation events. To gain insight into their putative functions, we examined expression profiles of WRKY genes from publicly available microarray data resources and found group specific expression patterns. While putative orthologs of the HvWRKY transcription factors have been inferred from phylogenetic sequence analysis, we performed a comparative expression analysis of WRKY genes in Arabidopsis and barley. Indeed, highly correlative expression profiles were found between some of the putative orthologs. HvWRKY genes have not only undergone radiation in monocot or dicot species, but exhibit evolutionary traits specific to grasses. HvWRKY proteins exhibited not only sequence similarities between orthologs with Arabidopsis, but also relatedness in their expression patterns. This correlative expression is indicative for a putative conserved function of related WRKY proteins in mono- and dicot species.« less
Devos, Nicolas; Szövényi, Péter; Weston, David J; Rothfels, Carl J; Johnson, Matthew G; Shaw, A Jonathan
2016-07-01
The goal of this research was to investigate whether there has been a whole-genome duplication (WGD) in the ancestry of Sphagnum (peatmoss) or the class Sphagnopsida, and to determine if the timing of any such duplication(s) and patterns of paralog retention could help explain the rapid radiation and current ecological dominance of peatmosses. RNA sequencing (RNA-seq) data were generated for nine taxa in Sphagnopsida (Bryophyta). Analyses of frequency plots for synonymous substitutions per synonymous site (Ks ) between paralogous gene pairs and reconciliation of 578 gene trees were conducted to assess evidence of large-scale or genome-wide duplication events in each transcriptome. Both Ks frequency plots and gene tree-based analyses indicate multiple duplication events in the history of the Sphagnopsida. The most recent WGD event predates divergence of Sphagnum from the two other genera of Sphagnopsida. Duplicate retention is highly variable across species, which might be best explained by local adaptation. Our analyses indicate that the last WGD could have been an important factor underlying the diversification of peatmosses and facilitated their rise to ecological dominance in peatlands. The timing of the duplication events and their significance in the evolutionary history of peat mosses are discussed. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
Ancient DNA sequence revealed by error-correcting codes.
Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo
2015-07-10
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.
Ancient DNA sequence revealed by error-correcting codes
Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo
2015-01-01
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228
Zimmer, Christoph T; Garrood, William T; Singh, Kumar Saurabh; Randall, Emma; Lueke, Bettina; Gutbrod, Oliver; Matthiesen, Svend; Kohler, Maxie; Nauen, Ralf; Davies, T G Emyr; Bass, Chris
2018-01-22
Gene duplication is a major source of genetic variation that has been shown to underpin the evolution of a wide range of adaptive traits [1, 2]. For example, duplication or amplification of genes encoding detoxification enzymes has been shown to play an important role in the evolution of insecticide resistance [3-5]. In this context, gene duplication performs an adaptive function as a result of its effects on gene dosage and not as a source of functional novelty [3, 6-8]. Here, we show that duplication and neofunctionalization of a cytochrome P450, CYP6ER1, led to the evolution of insecticide resistance in the brown planthopper. Considerable genetic variation was observed in the coding sequence of CYP6ER1 in populations of brown planthopper collected from across Asia, but just two sequence variants are highly overexpressed in resistant strains and metabolize imidacloprid. Both variants are characterized by profound amino-acid alterations in substrate recognition sites, and the introduction of these mutations into a susceptible P450 sequence is sufficient to confer resistance. CYP6ER1 is duplicated in resistant strains with individuals carrying paralogs with and without the gain-of-function mutations. Despite numerical parity in the genome, the susceptible and mutant copies exhibit marked asymmetry in their expression with the resistant paralogs overexpressed. In the primary resistance-conferring CYP6ER1 variant, this results from an extended region of novel sequence upstream of the gene that provides enhanced expression. Our findings illustrate the versatility of gene duplication in providing opportunities for functional and regulatory innovation during the evolution of an adaptive trait. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Troggio, Michela; Surbanovski, Nada; Bianco, Luca; Moretto, Marco; Giongo, Lara; Banchi, Elisa; Viola, Roberto; Fernández, Felicdad Fernández; Costa, Fabrizio; Velasco, Riccardo; Cestaro, Alessandro; Sargent, Daniel James
2013-01-01
High throughput arrays for the simultaneous genotyping of thousands of single-nucleotide polymorphisms (SNPs) have made the rapid genetic characterisation of plant genomes and the development of saturated linkage maps a realistic prospect for many plant species of agronomic importance. However, the correct calling of SNP genotypes in divergent polyploid genomes using array technology can be problematic due to paralogy, and to divergence in probe sequences causing changes in probe binding efficiencies. An Illumina Infinium II whole-genome genotyping array was recently developed for the cultivated apple and used to develop a molecular linkage map for an apple rootstock progeny (M432), but a large proportion of segregating SNPs were not mapped in the progeny, due to unexpected genotype clustering patterns. To investigate the causes of this unexpected clustering we performed BLAST analysis of all probe sequences against the 'Golden Delicious' genome sequence and discovered evidence for paralogous annealing sites and probe sequence divergence for a high proportion of probes contained on the array. Following visual re-evaluation of the genotyping data generated for 8,788 SNPs for the M432 progeny using the array, we manually re-scored genotypes at 818 loci and mapped a further 797 markers to the M432 linkage map. The newly mapped markers included the majority of those that could not be mapped previously, as well as loci that were previously scored as monomorphic, but which segregated due to divergence leading to heterozygosity in probe annealing sites. An evaluation of the 8,788 probes in a diverse collection of Malus germplasm showed that more than half the probes returned genotype clustering patterns that were difficult or impossible to interpret reliably, highlighting implications for the use of the array in genome-wide association studies.
2012-01-01
Background A whole-genome genotyping array has previously been developed for Malus using SNP data from 28 Malus genotypes. This array offers the prospect of high throughput genotyping and linkage map development for any given Malus progeny. To test the applicability of the array for mapping in diverse Malus genotypes, we applied the array to the construction of a SNP-based linkage map of an apple rootstock progeny. Results Of the 7,867 Malus SNP markers on the array, 1,823 (23.2%) were heterozygous in one of the two parents of the progeny, 1,007 (12.8%) were heterozygous in both parental genotypes, whilst just 2.8% of the 921 Pyrus SNPs were heterozygous. A linkage map spanning 1,282.2 cM was produced comprising 2,272 SNP markers, 306 SSR markers and the S-locus. The length of the M432 linkage map was increased by 52.7 cM with the addition of the SNP markers, whilst marker density increased from 3.8 cM/marker to 0.5 cM/marker. Just three regions in excess of 10 cM remain where no markers were mapped. We compared the positions of the mapped SNP markers on the M432 map with their predicted positions on the ‘Golden Delicious’ genome sequence. A total of 311 markers (13.7% of all mapped markers) mapped to positions that conflicted with their predicted positions on the ‘Golden Delicious’ pseudo-chromosomes, indicating the presence of paralogous genomic regions or mis-assignments of genome sequence contigs during the assembly and anchoring of the genome sequence. Conclusions We incorporated data for the 2,272 SNP markers onto the map of the M432 progeny and have presented the most complete and saturated map of the full 17 linkage groups of M. pumila to date. The data were generated rapidly in a high-throughput semi-automated pipeline, permitting significant savings in time and cost over linkage map construction using microsatellites. The application of the array will permit linkage maps to be developed for QTL analyses in a cost-effective manner, and the identification of SNPs that have been assigned erroneous positions on the ‘Golden Delicious’ reference sequence will assist in the continued improvement of the genome sequence assembly for that variety. PMID:22631220
Vermaak, Danielle; Henikoff, Steven; Malik, Harmit S
2005-01-01
Heterochromatin comprises a significant component of many eukaryotic genomes. In comparison to euchromatin, heterochromatin is gene poor, transposon rich, and late replicating. It serves many important biological roles, from gene silencing to accurate chromosome segregation, yet little is known about the evolutionary constraints that shape heterochromatin. A complementary approach to the traditional one of directly studying heterochromatic DNA sequence is to study the evolution of proteins that bind and define heterochromatin. One of the best markers for heterochromatin is the heterochromatin protein 1 (HP1), which is an essential, nonhistone chromosomal protein. Here we investigate the molecular evolution of five HP1 paralogs present in Drosophila melanogaster. Three of these paralogs have ubiquitous expression patterns in adult Drosophila tissues, whereas HP1D/rhino and HP1E are expressed predominantly in ovaries and testes respectively. The HP1 paralogs also have distinct localization preferences in Drosophila cells. Thus, Rhino localizes to the heterochromatic compartment in Drosophila tissue culture cells, but in a pattern distinct from HP1A and lysine-9 dimethylated H3. Using molecular evolution and population genetic analyses, we find that rhino has been subject to positive selection in all three domains of the protein: the N-terminal chromo domain, the C-terminal chromo-shadow domain, and the hinge region that connects these two modules. Maximum likelihood analysis of rhino sequences from 20 species of Drosophila reveals that a small number of residues of the chromo and shadow domains have been subject to repeated positive selection. The rapid and positive selection of rhino is highly unusual for a gene encoding a chromosomal protein and suggests that rhino is involved in a genetic conflict that affects the germline, belying the notion that heterochromatin is simply a passive recipient of “junk DNA” in eukaryotic genomes. PMID:16103923
Guselnikov, S.V.; Grayfer, L.; De Jesús Andino, F.; Rogozin, I.B.; Robert, J.; Taranin, A.V.
2015-01-01
The ITAM-bearing transmembrane signaling subunits (TSS) are indispensable components of activating leukocyte receptor complexes. The TSS-encoding genes map to paralogous chromosomal regions, which are thought to arise from ancient genome tetraploidization(s). To assess a possible role of tetraploidization in the TSS evolution, we studied TSS and other functionally linked genes in the amphibian species Xenopus laevis whose genome was duplicated about 40 MYR ago. We found that X. laevis has retained a duplicated set of sixteen TSS genes, all except one being transcribed. Furthermore, duplicated TCRα loci and genes encoding TSS-coupling protein kinases have also been retained. No clear evidence for functional divergence of the TSS paralogs was obtained from gene expression and sequence analyses. We suggest that the main factor of maintenance of duplicated TSS genes in X. laevis was a protein dosage effect and that this effect might have facilitated the TSS set expansion in early vertebrates. PMID:26170006
Human structural variation: mechanisms of chromosome rearrangements
Weckselblatt, Brooke; Rudd, M. Katharine
2015-01-01
Chromosome structural variation (SV) is a normal part of variation in the human genome, but some classes of SV can cause neurodevelopmental disorders. Analysis of the DNA sequence at SV breakpoints can reveal mutational mechanisms and risk factors for chromosome rearrangement. Large-scale SV breakpoint studies have become possible recently owing to advances in next-generation sequencing (NGS) including whole-genome sequencing (WGS). These findings have shed light on complex forms of SV such as triplications, inverted duplications, insertional translocations, and chromothripsis. Sequence-level breakpoint data resolve SV structure and determine how genes are disrupted, fused, and/or misregulated by breakpoints. Recent improvements in breakpoint sequencing have also revealed non-allelic homologous recombination (NAHR) between paralogous long interspersed nuclear element (LINE) or human endogenous retrovirus (HERV) repeats as a cause of deletions, duplications, and translocations. This review covers the genomic organization of simple and complex constitutional SVs, as well as the molecular mechanisms of their formation. PMID:26209074
Burkholderia xenovorans LB400 harbors a multi-replicon, 9.73-Mbp genome shaped for versatility
Chain, Patrick S. G.; Denef, Vincent J.; Konstantinidis, Konstantinos T.; Vergez, Lisa M.; Agulló, Loreine; Reyes, Valeria Latorre; Hauser, Loren; Córdova, Macarena; Gómez, Luis; González, Myriam; Land, Miriam; Lao, Victoria; Larimer, Frank; LiPuma, John J.; Mahenthiralingam, Eshwar; Malfatti, Stephanie A.; Marx, Christopher J.; Parnell, J. Jacob; Ramette, Alban; Richardson, Paul; Seeger, Michael; Smith, Daryl; Spilker, Theodore; Sul, Woo Jun; Tsoi, Tamara V.; Ulrich, Luke E.; Zhulin, Igor B.; Tiedje, James M.
2006-01-01
Burkholderia xenovorans LB400 (LB400), a well studied, effective polychlorinated biphenyl-degrader, has one of the two largest known bacterial genomes and is the first nonpathogenic Burkholderia isolate sequenced. From an evolutionary perspective, we find significant differences in functional specialization between the three replicons of LB400, as well as a more relaxed selective pressure for genes located on the two smaller vs. the largest replicon. High genomic plasticity, diversity, and specialization within the Burkholderia genus are exemplified by the conservation of only 44% of the genes between LB400 and Burkholderia cepacia complex strain 383. Even among four B. xenovorans strains, genome size varies from 7.4 to 9.73 Mbp. The latter is largely explained by our findings that >20% of the LB400 sequence was recently acquired by means of lateral gene transfer. Although a range of genetic factors associated with in vivo survival and intercellular interactions are present, these genetic factors are likely related to niche breadth rather than determinants of pathogenicity. The presence of at least eleven “central aromatic” and twenty “peripheral aromatic” pathways in LB400, among the highest in any sequenced bacterial genome, supports this hypothesis. Finally, in addition to the experimentally observed redundancy in benzoate degradation and formaldehyde oxidation pathways, the fact that 17.6% of proteins have a better LB400 paralog than an ortholog in a different genome highlights the importance of gene duplication and repeated acquirement, which, coupled with their divergence, raises questions regarding the role of paralogs and potential functional redundancies in large-genome microbes. PMID:17030797
Burkholderia xernovorans LB400 harbors a multi-replicon, 9.73-Mbp genome shaped for versatility
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chain, Patrick S. G.; Denef, Vincent; Konstantinidis, Konstantinos T
2006-01-01
Burkholderia xenovorans LB400 (LB400), a well studied, effective polychlorinated biphenyl-degrader, has one of the two largest known bacterial genomes and is the first nonpathogenic Burkholderia isolate sequenced. From an evolutionary perspective, we find significant differences in functional specialization between the three replicons of LB400, as well as a more relaxed selective pressure for genes located on the two smaller vs. the largest replicon. High genomic plasticity, diversity, and specialization within the Burkholderia genus are exemplified by the conservation of only 44% of the genes between LB400 and Burkholderia cepacia complex strain 383. Even among four B. xenovorans strains, genome sizemore » varies from 7.4 to 9.73 Mbp. The latter is largely explained by our findings that >20% of the LB400 sequence was recently acquired by means of lateral gene transfer. Although a range of genetic factors associated with in vivo survival and intercellular interactions are present, these genetic factors are likely related to niche breadth rather than determinants of pathogenicity. The presence of at least eleven 'central aromatic' and twenty 'peripheral aromatic' pathways in LB400, among the highest in any sequenced bacterial genome, supports this hypothesis. Finally, in addition to the experimentally observed redundancy in benzoate degradation and formaldehyde oxidation pathways, the fact that 17.6% of proteins have a better LB400 paralog than an ortholog in a different genome highlights the importance of gene duplication and repeated acquirement, which, coupled with their divergence, raises questions regarding the role of paralogs and potential functional redundancies in large-genome microbes.« less
Bode, Nadine J; Chan, Kun-Wei; Kong, Xiang-Peng; Pearson, Melanie M
2016-08-01
Proteus mirabilis contributes to a significant number of catheter-associated urinary tract infections, where coordinated regulation of adherence and motility is critical for ascending disease progression. Previously, the mannose-resistant Proteus-like (MR/P) fimbria-associated transcriptional regulator MrpJ has been shown to both repress motility and directly induce the transcription of its own operon; in addition, it affects the expression of a wide range of cellular processes. Interestingly, 14 additional mrpJ paralogs are included in the P. mirabilis genome. Looking at a selection of MrpJ paralogs, we discovered that these proteins, which consistently repress motility, also have nonidentical functions that include cross-regulation of fimbrial operons. A subset of paralogs, including AtfJ (encoded by the ambient temperature fimbrial operon), Fim8J, and MrpJ, are capable of autoinduction. We identified an element of the atf promoter extending from 487 to 655 nucleotides upstream of the transcriptional start site that is responsive to AtfJ, and we found that AtfJ directly binds this fragment. Mutational analysis of AtfJ revealed that its two identified functions, autoregulation and motility repression, are not invariably linked. Residues within the DNA-binding helix-turn-helix domain are required for motility repression but not necessarily autoregulation. Likewise, the C-terminal domain is dispensable for motility repression but is essential for autoregulation. Supported by a three-dimensional (3D) structural model, we hypothesize that the C-terminal domain confers unique regulatory capacities on the AtfJ family of regulators. Balancing adherence with motility is essential for uropathogens to successfully establish a foothold in their host. Proteus mirabilis uses a fimbria-associated transcriptional regulator to switch between these antagonistic processes by increasing fimbrial adherence while simultaneously downregulating flagella. The discovery of multiple related proteins, many of which also function as motility repressors, encoded in the P. mirabilis genome has raised considerable interest as to their functionality and potential redundancy in this organism. This study provides an important advance in this field by elucidating the nonidentical effects of these paralogs on a molecular level. Our mechanistic studies of one member of this group, AtfJ, shed light on how these differing functions may be conferred despite the limited sequence variety exhibited by the paralogous proteins. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Prigoda, Nadia L; Nassuth, Annette; Mable, Barbara K
2005-07-01
The highly divergent alleles of the SRK gene in outcrossing Arabidopsis lyrata have provided important insights into the evolutionary history of self-incompatibility (SI) alleles and serve as an ideal model for studies of the evolutionary and molecular interactions between alleles in cell-cell recognition systems in general. One tantalizing question is how new specificities arise in systems that require coordination between male and female components. Allelic recruitment via gene conversion has been proposed as one possibility, based on the division of DNA sequences at the SRK locus into two distinctive groups: (1) sequences whose relationships are not well resolved and display the long branch lengths expected for a gene under balancing selection (Class A); and (2) sequences falling into a well-supported group with shorter branch lengths (Class B) that are closely related to an unlinked paralogous locus. The purpose of this study was to determine if differences in phenotype (site of expression assayed using allele-specific reverse transcription-polymerase chain reaction) or function (dominance relationships assayed through controlled pollinations) accompany the sequence-based classification. Expression of Class A alleles was restricted to floral tissues, as predicted for genes involved in the SI response. In contrast, Class B alleles, despite being tightly linked to the SI phenotype, were unexpectedly expressed in both leaves and floral tissues; the same pattern found for a related unlinked paralogous sequence. Whereas Class A included haplotypes in three different dominance classes, all Class B haplotypes were found to be recessive to all except one Class A haplotype. In addition, mapping of expression and dominance patterns onto an S-domain-based genealogy suggested that allelic dominance may be determined more by evolutionary history than by frequency-dependent selection for lowered dominance as some theories suggest. The possibility that interlocus gene conversion might have contributed to allelic diversity is discussed.
Tümpel, Stefan; Maconochie, Mark; Wiedemann, Leanne M; Krumlauf, Robb
2002-06-01
The Hoxa2 and Hoxb2 genes are members of paralogy group II and display segmental patterns of expression in the developing vertebrate hindbrain and cranial neural crest cells. Functional analyses have demonstrated that these genes play critical roles in regulating morphogenetic pathways that direct the regional identity and anteroposterior character of hindbrain rhombomeres and neural crest-derived structures. Transgenic regulatory studies have also begun to characterize enhancers and cis-elements for those mouse and chicken genes that direct restricted patterns of expression in the hindbrain and neural crest. In light of the conserved role of Hoxa2 in neural crest patterning in vertebrates and the similarities between paralogs, it is important to understand the extent to which common regulatory networks and elements have been preserved between species and between paralogs. To investigate this problem, we have cloned and sequenced the intergenic region between Hoxa2 and Hoxa3 in the chick HoxA complex and used it for making comparative analyses with the respective human, mouse, and horn shark regions. We have also used transgenic assays in mouse and chick embryos to test the functional activity of Hoxa2 enhancers in heterologous species. Our analysis reveals that three of the critical individual components of the Hoxa2 enhancer region from mouse necessary for hindbrain expression (Krox20, BoxA, and TCT motifs) have been partially conserved. However, their number and organization are highly varied for the same gene in different species and between paralogs within a species. Other essential mouse elements appear to have diverged or are absent in chick and shark. We find the mouse r3/r5 enhancer fails to work in chick embryos and the chick enhancer works poorly in mice. This implies that new motifs have been recruited or utilized to mediate restricted activity of the enhancer in other species. With respect to neural crest regulation, cis-components are embedded among the hindbrain control elements and are highly diverged between species. Hence, there has been no widespread conservation of sequence identity over the entire enhancer domain from shark to humans, despite the common function of these genes in head patterning. This provides insight into how apparently equivalent regulatory regions from the same gene in different species have evolved different components to potentiate their activity in combination with a selection of core components. (c) 2002 Elsevier Science (USA).
COGNAT: a web server for comparative analysis of genomic neighborhoods.
Klimchuk, Olesya I; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Dibrova, Daria V; Mulkidjanian, Armen Y
2017-11-22
In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families. We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server. Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002). This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.
When outgroups fail; phylogenomics of rooting the emerging pathogen, Coxiella burnetii.
Pearson, Talima; Hornstra, Heidie M; Sahl, Jason W; Schaack, Sarah; Schupp, James M; Beckstrom-Sternberg, Stephen M; O'Neill, Matthew W; Priestley, Rachael A; Champion, Mia D; Beckstrom-Sternberg, James S; Kersh, Gilbert J; Samuel, James E; Massung, Robert F; Keim, Paul
2013-09-01
Rooting phylogenies is critical for understanding evolution, yet the importance, intricacies and difficulties of rooting are often overlooked. For rooting, polymorphic characters among the group of interest (ingroup) must be compared to those of a relative (outgroup) that diverged before the last common ancestor (LCA) of the ingroup. Problems arise if an outgroup does not exist, is unknown, or is so distant that few characters are shared, in which case duplicated genes originating before the LCA can be used as proxy outgroups to root diverse phylogenies. Here, we describe a genome-wide expansion of this technique that can be used to solve problems at the other end of the evolutionary scale: where ingroup individuals are all very closely related to each other, but the next closest relative is very distant. We used shared orthologous single nucleotide polymorphisms (SNPs) from 10 whole genome sequences of Coxiella burnetii, the causative agent of Q fever in humans, to create a robust, but unrooted phylogeny. To maximize the number of characters informative about the rooting, we searched entire genomes for polymorphic duplicated regions where orthologs of each paralog could be identified so that the paralogs could be used to root the tree. Recent radiations, such as those of emerging pathogens, often pose rooting challenges due to a lack of ingroup variation and large genomic differences with known outgroups. Using a phylogenomic approach, we created a robust, rooted phylogeny for C. burnetii. [Coxiella burnetii; paralog SNPs; pathogen evolution; phylogeny; recent radiation; root; rooting using duplicated genes.].
Banday, Abdul Rouf; Baumgartner, Marybeth; Al Seesi, Sahar; Karunakaran, Devi Krishna Priya; Venkatesh, Aditya; Congdon, Sean; Lemoine, Christopher; Kilcollins, Ashley M; Mandoiu, Ion; Punzo, Claudio; Kanadia, Rahul N
2014-01-01
In the mammalian genome, each histone family contains multiple replication-dependent paralogs, which are found in clusters where their transcription is thought to be coupled to the cell cycle. Here, we wanted to interrogate the transcriptional regulation of these paralogs during retinal development and aging. We employed deep sequencing, quantitative PCR, in situ hybridization (ISH), and microarray analysis, which revealed that replication-dependent histone genes were not only transcribed in progenitor cells but also in differentiating neurons. Specifically, by ISH analysis we found that different histone genes were actively transcribed in a subset of neurons between postnatal day 7 and 14. Interestingly, within a histone family, not all paralogs were transcribed at the same level during retinal development. For example, expression of Hist1h1b was higher embryonically, while that of Hist1h1c was higher postnatally. Finally, expression of replication-dependent histone genes was also observed in the aging retina. Moreover, transcription of replication-dependent histones was independent of rapamycin-mediated mTOR pathway inactivation. Overall, our data suggest the existence of variant nucleosomes produced by the differential expression of the replication-dependent histone genes across retinal development. Also, the expression of a subset of replication-dependent histone isotypes in senescent neurons warrants re-examining these genes as “replication-dependent.” Thus, our findings underscore the importance of understanding the transcriptional regulation of replication-dependent histone genes in the maintenance and functioning of neurons. PMID:25486194
The evolution of duplicate gene expression in mammalian organs
Guschanski, Katerina; Warnefors, Maria; Kaessmann, Henrik
2017-01-01
Gene duplications generate genomic raw material that allows the emergence of novel functions, likely facilitating adaptive evolutionary innovations. However, global assessments of the functional and evolutionary relevance of duplicate genes in mammals were until recently limited by the lack of appropriate comparative data. Here, we report a large-scale study of the expression evolution of DNA-based functional gene duplicates in three major mammalian lineages (placental mammals, marsupials, egg-laying monotremes) and birds, on the basis of RNA sequencing (RNA-seq) data from nine species and eight organs. We observe dynamic changes in tissue expression preference of paralogs with different duplication ages, suggesting differential contribution of paralogs to specific organ functions during vertebrate evolution. Specifically, we show that paralogs that emerged in the common ancestor of bony vertebrates are enriched for genes with brain-specific expression and provide evidence for differential forces underlying the preferential emergence of young testis- and liver-specific expressed genes. Further analyses uncovered that the overall spatial expression profiles of gene families tend to be conserved, with several exceptions of pronounced tissue specificity shifts among lineage-specific gene family expansions. Finally, we trace new lineage-specific genes that may have contributed to the specific biology of mammalian organs, including the little-studied placenta. Overall, our study provides novel and taxonomically broad evidence for the differential contribution of duplicate genes to tissue-specific transcriptomes and for their importance for the phenotypic evolution of vertebrates. PMID:28743766
Paralogic Hermeneutics and the Possibilities of Rhetoric.
ERIC Educational Resources Information Center
Kent, Thomas
1989-01-01
Explains how the Sophistic tradition, an alternative to the Platonic-Aristotelian rhetorical tradition, provides the historical foundation for a paralogic rhetoric that treats discourse production and analysis as open-ended dialogic activities and not as a codifiable system. Argues that teachers must examine the powerful paralogic/hermeneutic…
Modos, Dezso; Brooks, Johanne; Fazekas, David; Ari, Eszter; Vellai, Tibor; Csermely, Peter; Korcsmaros, Tamas; Lenti, Katalin
2016-01-01
Extensive cross-talk between signaling pathways is required to integrate the myriad of extracellular signal combinations at the cellular level. Gene duplication events may lead to the emergence of novel functions, leaving groups of similar genes - termed paralogs - in the genome. To distinguish critical paralog groups (CPGs) from other paralogs in human signaling networks, we developed a signaling network-based method using cross-talk annotation and tissue-specific signaling flow analysis. 75 CPGs were found with higher degree, betweenness centrality, closeness, and ‘bowtieness’ when compared to other paralogs or other proteins in the signaling network. CPGs had higher diversity in all these measures, with more varied biological functions and more specific post-transcriptional regulation than non-critical paralog groups (non-CPG). Using TGF-beta, Notch and MAPK pathways as examples, SMAD2/3, NOTCH1/2/3 and MEK3/6-p38 CPGs were found to regulate the signaling flow of their respective pathways. Additionally, CPGs showed a higher mutation rate in both inherited diseases and cancer, and were enriched in drug targets. In conclusion, the results revealed two distinct types of paralog groups in the signaling network: CPGs and non-CPGs. Thus highlighting the importance of CPGs as compared to non-CPGs in drug discovery and disease pathogenesis. PMID:27922122
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cuomo, Christina A.; Guldener, Ulrich; Xu, Jin Rong
2007-09-07
We sequenced and annotated the genome of the filamentous fungus Fusarium graminearum, a major pathogen of cultivated cereals. Very few repetitive sequences were detected, and the process of repeat-induced point mutation, in which duplicated sequences are subject to extensive mutation, may partially account for the reduced repeat content and apparent low number of paralogous (ancestrally duplicated) genes. A second strain of F. graminearum contained more than 10,000 single-nucleotide polymorphisms, which were frequently located near telomeres and within other discrete chromosomal segments. Many highly polymorphic regions contained sets of genes implicated in plant-fungus interactions and were unusually divergent, with higher ratesmore » of recombination. These regions of genome innovation may result from selection due to interactions of F. graminearum with its plant hosts.« less
Sánchez, Cecilia Castaño; Smith, Timothy P L; Wiedmann, Ralph T; Vallejo, Roger L; Salem, Mohamed; Yao, Jianbo; Rexroad, Caird E
2009-11-25
To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs) have been used for single nucleotide polymorphism (SNP) discovery in salmonids. In those strategies, the salmonid semi-tetraploid genomes often led to assemblies of paralogous sequences and therefore resulted in a high rate of false positive SNP identification. Sequencing genomic DNA using primers identified from ESTs proved to be an effective but time consuming methodology of SNP identification in rainbow trout, therefore not suitable for high throughput SNP discovery. In this study, we employed a high-throughput strategy that used pyrosequencing technology to generate data from a reduced representation library constructed with genomic DNA pooled from 96 unrelated rainbow trout that represent the National Center for Cool and Cold Water Aquaculture (NCCCWA) broodstock population. The reduced representation library consisted of 440 bp fragments resulting from complete digestion with the restriction enzyme HaeIII; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends). Three independent data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 independent contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map. Approximately 48% (183) of the putative SNPs were validated; 167 markers were successfully incorporated into the rainbow trout linkage map. In addition, 2% of the sequences from the validated markers were associated with rainbow trout transcripts. The use of reduced representation libraries and pyrosequencing technology proved to be an effective strategy for the discovery of a high number of putative SNPs in rainbow trout; however, modifications to the technique to decrease the false discovery rate resulting from the evolutionary recent genome duplication would be desirable.
Ream, Thomas S.; Haag, Jeremy R.; Pontvianne, Frederic; Nicora, Carrie D.; Norbeck, Angela D.; Paša-Tolić, Ljiljana; Pikaard, Craig S.
2015-01-01
Using affinity purification and mass spectrometry, we identified the subunits of Arabidopsis thaliana multisubunit RNA polymerases I and III (abbreviated as Pol I and Pol III), the first analysis of their physical compositions in plants. In all eukaryotes examined to date, AC40 and AC19 subunits are common to Pol I (a.k.a. Pol A) and Pol III (a.k.a. Pol C) and are encoded by single genes. Surprisingly, A. thaliana and related species express two distinct AC40 paralogs, one of which assembles into Pol I and the other of which assembles into Pol III. Changes at eight amino acid positions correlate with the functional divergence of Pol I- and Pol III-specific AC40 paralogs. Two genes encode homologs of the yeast C53 subunit and either protein can assemble into Pol III. By contrast, only one of two potential C17 variants, and one of two potential C31 variants were detected in Pol III. We introduce a new nomenclature system for plant Pol I and Pol III subunits in which the 12 subunits that are structurally and functionally homologous among Pols I through V are assigned equivalent numbers. PMID:25813043
Walker, Michael B; King, Benjamin L; Paigen, Kenneth
2012-01-01
Arrangements of genes along chromosomes are a product of evolutionary processes, and we can expect that preferable arrangements will prevail over the span of evolutionary time, often being reflected in the non-random clustering of structurally and/or functionally related genes. Such non-random arrangements can arise by two distinct evolutionary processes: duplications of DNA sequences that give rise to clusters of genes sharing both sequence similarity and common sequence features and the migration together of genes related by function, but not by common descent. To provide a background for distinguishing between the two, which is important for future efforts to unravel the evolutionary processes involved, we here provide a description of the extent to which ancestrally related genes are found in proximity.Towards this purpose, we combined information from five genomic datasets, InterPro, SCOP, PANTHER, Ensembl protein families, and Ensembl gene paralogs. The results are provided in publicly available datasets (http://cgd.jax.org/datasets/clustering/paraclustering.shtml) describing the extent to which ancestrally related genes are in proximity beyond what is expected by chance (i.e. form paraclusters) in the human and nine other vertebrate genomes, as well as the D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae genomes. With the exception of Saccharomyces, paraclusters are a common feature of the genomes we examined. In the human genome they are estimated to include at least 22% of all protein coding genes. Paraclusters are far more prevalent among some gene families than others, are highly species or clade specific and can evolve rapidly, sometimes in response to environmental cues. Altogether, they account for a large portion of the functional clustering previously reported in several genomes.
Recruitment of the proneural gene scute to the Drosophila sex-determination pathway.
Wrischnik, Lisa A; Timmer, John R; Megna, Lisa A; Cline, Thomas W
2003-01-01
In flies, scute (sc) works with its paralogs in the achaete-scute-complex (ASC) to direct neuronal development. However, in the family Drosophilidae, sc also acquired a role in the primary event of sex determination, X chromosome counting, by becoming an X chromosome signal element (XSE)-an evolutionary step shown here to have occurred after sc diverged from its closest paralog, achaete (ac). Two temperature-sensitive alleles, sc(sisB2) and sc(sisB3), which disrupt only sex determination, were recovered in a powerful F1 genetic selection and used to investigate how sc was recruited to the sex-determination pathway. sc(sisB2) revealed 3' nontranscribed regulatory sequences likely to be involved. The sc(sisB2) lesion abolished XSE activity when combined with mutations engineered in a sequence upstream of all XSEs. In contrast, changes in Sc protein sequence seem not to have been important for recruitment. The observation that the other new allele, sc(sisB3), eliminates the C-terminal half of Sc without affecting neurogenesis and that sc(sisB1), the most XSE-specific allele previously available, is a nonsense mutant, would seem to suggest the opposite, but we show that housefly Sc can substitute for fruit fly Sc in sex determination, despite lacking Drosophilidae-specific conserved residues in its C-terminal half. Lack of synergistic lethality among mutations in sc, twist, and dorsal argue against a proposed role for sc in mesoderm formation that had seemed potentially relevant to sex-pathway recruitment. The screen that yielded new sc alleles also generated autosomal duplications that argue against the textbook view that fruit fly sex signal evolution recruited a set of autosomal signal elements comparable to the XSEs. PMID:14704182
Karn, Robert C; Laukaitis, Christina M
2012-01-01
Three proteinaceous pheromone families, the androgen-binding proteins (ABPs), the exocrine-gland secreting peptides (ESPs) and the major urinary proteins (MUPs) are encoded by large gene families in the genomes of Mus musculus and Rattus norvegicus. We studied the evolutionary histories of the Mup and Esp genes and compared them with what is known about the Abp genes. Apparently gene conversion has played little if any role in the expansion of the mouse Class A and Class B Mup genes and pseudogenes, and the rat Mups. By contrast, we found evidence of extensive gene conversion in many Esp genes although not in all of them. Our studies of selection identified at least two amino acid sites in β-sheets as having evolved under positive selection in the mouse Class A and Class B MUPs and in rat MUPs. We show that selection may have acted on the ESPs by determining K(a)/K(s) for Exon 3 sequences with and without the converted sequence segment. While it appears that purifying selection acted on the ESP signal peptides, the secreted portions of the ESPs probably have undergone much more rapid evolution. When the inner gene converted fragment sequences were removed, eleven Esp paralogs were present in two or more pairs with K(a)/K(s) >1.0 and thus we propose that positive selection is detectable by this means in at least some mouse Esp paralogs. We compare and contrast the evolutionary histories of all three mouse pheromone gene families in light of their proposed functions in mouse communication.
Grone, Brian P; Maruska, Karen P
2015-05-01
To investigate the origins of the vertebrate stress-response system, we searched sequenced vertebrate genomes for genes resembling corticotropin-releasing hormone (CRH). We found that vertebrate genomes possess, in addition to CRH, another gene that resembles CRH in sequence and syntenic environment. This paralogous gene was previously identified only in the elephant shark (a holocephalan), but we find it also in marsupials, monotremes, lizards, turtles, birds, and fishes. We examined the relationship of this second vertebrate CRH gene, which we name CRH2, to CRH1 (previously known as CRH) and urocortin1/urotensin1 (UCN1/UTS1) in primitive fishes, teleosts, and tetrapods. The paralogs CRH1 and CRH2 likely evolved via duplication of CRH during a whole-genome duplication early in the vertebrate lineage. CRH2 was subsequently lost in both teleost fishes and eutherian mammals but retained in other lineages. To determine where CRH2 is expressed relative to CRH1 and UTS1, we used in situ hybridization on brain tissue from spotted gar (Lepisosteus oculatus), a neopterygian fish closely related to teleosts. In situ hybridization revealed widespread distribution of both crh1 and uts1 in the brain. Expression of crh2 was restricted to the putative secondary gustatory/secondary visceral nucleus, which also expressed calcitonin-related polypeptide alpha (calca), a marker of parabrachial nucleus in mammals. Thus, the evolutionary history of CRH2 includes restricted expression in the brain, sequence changes, and gene loss, likely reflecting release of selective constraints following whole-genome duplication. The discovery of CRH2 opens many new possibilities for understanding the diverse functions of the CRH family of peptides across vertebrates. © 2015 Wiley Periodicals, Inc.
UHRF2 regulates local 5-methylcytosine and suppresses spontaneous seizures
Liu, Yidan; Zhang, Bin; Meng, Xiaoyu; Korn, Matthew J.; Parent, Jack M.; Lu, Lin-Yu; Yu, Xiaochun
2017-01-01
ABSTRACT The 5-methylcytosine (5mC) modification regulates multiple cellular processes and is faithfully maintained following DNA replication. In addition to DNA methyltransferase (DNMT) family proteins, ubiquitin-like PHD and ring finger domain-containing protein 1 (UHRF1) plays an important role in the maintenance of 5mC levels. Loss of UHRF1 abolishes 5mC in cells and leads to embryonic lethality in mice. Interestingly, UHRF1 has a paralog, UHRF2, that has similar sequence and domain architecture, but its biologic function is not clear. Here, we have generated Uhrf2 knockout mice and characterized the role of UHRF2 in vivo. Uhrf2 knockout mice are viable, but the adult mice develop frequent spontaneous seizures and display abnormal electrical activities in brain. Despite no global DNA methylation changes, 5mC levels are decreased at certain genomic loci in the brains of Uhrf2 knockout mice. Therefore, our study has revealed a unique role of UHRF2 in the maintenance of local 5mC levels in brain that is distinct from that of its paralog UHRF1. PMID:28402695
Sequence Search and Comparative Genomic Analysis of SUMO-Activating Enzymes Using CoGe.
Carretero-Paulet, Lorenzo; Albert, Victor A
2016-01-01
The growing number of genome sequences completed during the last few years has made necessary the development of bioinformatics tools for the easy access and retrieval of sequence data, as well as for downstream comparative genomic analyses. Some of these are implemented as online platforms that integrate genomic data produced by different genome sequencing initiatives with data mining tools as well as various comparative genomic and evolutionary analysis possibilities.Here, we use the online comparative genomics platform CoGe ( http://www.genomevolution.org/coge/ ) (Lyons and Freeling. Plant J 53:661-673, 2008; Tang and Lyons. Front Plant Sci 3:172, 2012) (1) to retrieve the entire complement of orthologous and paralogous genes belonging to the SUMO-Activating Enzymes 1 (SAE1) gene family from a set of species representative of the Brassicaceae plant eudicot family with genomes fully sequenced, and (2) to investigate the history, timing, and molecular mechanisms of the gene duplications driving the evolutionary expansion and functional diversification of the SAE1 family in Brassicaceae.
When Outgroups Fail; Phylogenomics of Rooting the Emerging Pathogen, Coxiella burnetii
Pearson, Talima; Hornstra, Heidie M.; Sahl, Jason W.; Schaack, Sarah; Schupp, James M.; Beckstrom-Sternberg, Stephen M.; O'Neill, Matthew W.; Priestley, Rachael A.; Champion, Mia D.; Beckstrom-Sternberg, James S.; Kersh, Gilbert J.; Samuel, James E.; Massung, Robert F.; Keim, Paul
2013-01-01
Rooting phylogenies is critical for understanding evolution, yet the importance, intricacies and difficulties of rooting are often overlooked. For rooting, polymorphic characters among the group of interest (ingroup) must be compared to those of a relative (outgroup) that diverged before the last common ancestor (LCA) of the ingroup. Problems arise if an outgroup does not exist, is unknown, or is so distant that few characters are shared, in which case duplicated genes originating before the LCA can be used as proxy outgroups to root diverse phylogenies. Here, we describe a genome-wide expansion of this technique that can be used to solve problems at the other end of the evolutionary scale: where ingroup individuals are all very closely related to each other, but the next closest relative is very distant. We used shared orthologous single nucleotide polymorphisms (SNPs) from 10 whole genome sequences of Coxiella burnetii, the causative agent of Q fever in humans, to create a robust, but unrooted phylogeny. To maximize the number of characters informative about the rooting, we searched entire genomes for polymorphic duplicated regions where orthologs of each paralog could be identified so that the paralogs could be used to root the tree. Recent radiations, such as those of emerging pathogens, often pose rooting challenges due to a lack of ingroup variation and large genomic differences with known outgroups. Using a phylogenomic approach, we created a robust, rooted phylogeny for C. burnetii. [Coxiella burnetii; paralog SNPs; pathogen evolution; phylogeny; recent radiation; root; rooting using duplicated genes.] PMID:23736103
Discriminating the reaction types of plant type III polyketide synthases
Shimizu, Yugo; Ogata, Hiroyuki; Goto, Susumu
2017-01-01
Abstract Motivation: Functional prediction of paralogs is challenging in bioinformatics because of rapid functional diversification after gene duplication events combined with parallel acquisitions of similar functions by different paralogs. Plant type III polyketide synthases (PKSs), producing various secondary metabolites, represent a paralogous family that has undergone gene duplication and functional alteration. Currently, there is no computational method available for the functional prediction of type III PKSs. Results: We developed a plant type III PKS reaction predictor, pPAP, based on the recently proposed classification of type III PKSs. pPAP combines two kinds of similarity measures: one calculated by profile hidden Markov models (pHMMs) built from functionally and structurally important partial sequence regions, and the other based on mutual information between residue positions. pPAP targets PKSs acting on ring-type starter substrates, and classifies their functions into four reaction types. The pHMM approach discriminated two reaction types with high accuracy (97.5%, 39/40), but its accuracy decreased when discriminating three reaction types (87.8%, 43/49). When combined with a correlation-based approach, all 49 PKSs were correctly discriminated, and pPAP was still highly accurate (91.4%, 64/70) even after adding other reaction types. These results suggest pPAP, which is based on linear discriminant analyses of similarity measures, is effective for plant type III PKS function prediction. Availability and Implementation: pPAP is freely available at ftp://ftp.genome.jp/pub/tools/ppap/ Contact: goto@kuicr.kyoto-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28334262
MicroRNA duplication accelerates the recruitment of new targets during vertebrate evolution
Luo, Junjie; Wang, Yirong; Yuan, Jian
2018-01-01
The repertoire of miRNAs has considerably expanded during metazoan evolution, and duplication is an important mechanism for generating new functional miRNAs. However, relatively little is known about the functional divergence between paralogous miRNAs and the possible coevolution between duplicated miRNAs and the genomic contexts. By systematically examining small RNA expression profiles across various human tissues and interrogating the publicly available miRNA:mRNA pairing chimeras, we found that changes in expression patterns and targeting preferences are widespread for duplicated miRNAs in vertebrates. Both the empirical interactions and target predictions suggest that evolutionarily conserved homo-seed duplicated miRNAs pair with significantly higher numbers of target sites compared to the single-copy miRNAs. Our birth-and-death evolutionary analysis revealed that the new target sites of miRNAs experienced frequent gains and losses during function development. Our results suggest that a newly emerged target site has a higher probability to be functional and maintained by natural selection if it is paired to a seed shared by multiple paralogous miRNAs rather than being paired to a single-copy miRNA. We experimentally verified the divergence in target repression between two paralogous miRNAs by transfecting let-7a and let-7b mimics into kidney-derived cell lines of four mammalian species and measuring the resulting transcriptome alterations by extensive high-throughput sequencing. Our results also suggest that the gains and losses of let-7 target sites might be associated with the evolution of repressiveness of let-7 across mammalian species. PMID:29511046
Bae, Hansol; Kim, Sung Keun; Cho, Seok Keun; Kang, Bin Goo; Kim, Woo Taek
2011-06-01
CaRma1H1 was previously identified as a hot pepper drought-induced RING E3 Ub ligase. We have identified five putative proteins that display a significant sequence identity with CaRma1H1 in the rice genome database (http://signal.salk.edu/cgi-bin/RiceGE). These five rice paralogs possess a single RING motif in their N-terminal regions, consistent with the notion that RING proteins are encoded by a multi-gene family. Therefore, these proteins were named OsRDCPs (Oryza sativa RING domain-containing proteins). Among these paralogs, OsRDCP1 was induced by drought stress, whereas the other OsRDCP members were constitutively expressed, with OsRDCP4 transcripts expressed at the highest level in rice seedlings. osrdcp1 loss-of-function knockout mutant and OsRDCP1-overexpressing transgenic rice plants were developed. Phenotypic analysis showed that wild-type plants and the homozygous osrdcp1 G2 mutant line displayed similar phenotypes under normal growth conditions and in response to drought stress. This may be due to complementation by other OsRDCP paralogs. In contrast, 35S:OsRDCP1 T2 transgenic rice plants exhibited improved tolerance to severe water deficits. Although the physiological function of OsRDCP1 remains unclear, there are several possible mechanisms for its involvement in a subset of physiological responses to counteract dehydration stress in rice plants. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Pereira, Joana; Johnson, Warren E.; O’Brien, Stephen J.; Jarvis, Erich D.; Zhang, Guojie; Gilbert, M. Thomas P.; Vasconcelos, Vitor; Antunes, Agostinho
2014-01-01
The Hedgehog (Hh) gene family codes for a class of secreted proteins composed of two active domains that act as signalling molecules during embryo development, namely for the development of the nervous and skeletal systems and the formation of the testis cord. While only one Hh gene is found typically in invertebrate genomes, most vertebrates species have three (Sonic hedgehog – Shh; Indian hedgehog – Ihh; and Desert hedgehog – Dhh), each with different expression patterns and functions, which likely helped promote the increasing complexity of vertebrates and their successful diversification. In this study, we used comparative genomic and adaptive evolutionary analyses to characterize the evolution of the Hh genes in vertebrates following the two major whole genome duplication (WGD) events. To overcome the lack of Hh-coding sequences on avian publicly available databases, we used an extensive dataset of 45 avian and three non-avian reptilian genomes to show that birds have all three Hh paralogs. We find suggestions that following the WGD events, vertebrate Hh paralogous genes evolved independently within similar linkage groups and under different evolutionary rates, especially within the catalytic domain. The structural regions around the ion-binding site were identified to be under positive selection in the signaling domain. These findings contrast with those observed in invertebrates, where different lineages that experienced gene duplication retained similar selective constraints in the Hh orthologs. Our results provide new insights on the evolutionary history of the Hh gene family, the functional roles of these paralogs in vertebrate species, and on the location of mutational hotspots. PMID:25549322
Ewen-Campen, Ben; Mohr, Stephanie E; Hu, Yanhui; Perrimon, Norbert
2017-10-09
Single-gene knockout experiments can fail to reveal function in the context of redundancy, which is frequently observed among duplicated genes (paralogs) with overlapping functions. We discuss the complexity associated with studying paralogs and outline how recent advances in CRISPR will help address the "phenotype gap" and impact biomedical research. Copyright © 2017 Elsevier Inc. All rights reserved.
Sun, Lei; Yang, Xiaowei; Chen, Feifei; Li, Rongpeng; Li, Xuesong; Liu, Zhenxing; Gu, Yuyu; Gong, Xiaoyan; Liu, Zhonghua; Wei, Hua; Huang, Ying; Yuan, Sheng
2013-01-01
Fission yeast cells express Rpl32-2 highly while Rpl32-1 lowly in log phase; in contrast, expression of Rpl32-1 raises and reaches a peak level while Rpl32-2 is downregulated to a low basic level when cells enter into stationary phase. Overexpression of Rpl32-1 inhibits cell growth while overexpression of Rpl32-2 does not. Deleting rpl32-2 impairs cell growth more severely than deleting rpl32-1 does. Cell growth impaired by deleting either paralog can be rescued completely by reintroducing rpl32-2, but only partly by rpl32-1. Overexpression of Rpl32-1 inhibits cell division, yielding 4c DNA and multiple septa, while overexpressed Rpl32-2 promotes it. Transcriptomics analysis proved that Rpl32 paralogs regulate expression of a subset of genes related with cell division and stress response in a distinctive way. This functional difference of the two paralogs is due to their difference of 95th amino acid residue. The significance of a competitive inhibition between Rpl32 paralogs on their expression is discussed. PMID:23577148
Genomic organization of plant aminopropyl transferases.
Rodríguez-Kessler, Margarita; Delgado-Sánchez, Pablo; Rodríguez-Kessler, Gabriela Theresia; Moriguchi, Takaya; Jiménez-Bremont, Juan Francisco
2010-07-01
Aminopropyl transferases like spermidine synthase (SPDS; EC 2.5.1.16), spermine synthase and thermospermine synthase (SPMS, tSPMS; EC 2.5.1.22) belong to a class of widely distributed enzymes that use decarboxylated S-adenosylmethionine as an aminopropyl donor and putrescine or spermidine as an amino acceptor to form in that order spermidine, spermine or thermospermine. We describe the analysis of plant genomic sequences encoding SPDS, SPMS, tSPMS and PMT (putrescine N-methyltransferase; EC 2.1.1.53). Genome organization (including exon size, gain and loss, as well as intron number, size, loss, retention, placement and phase, and the presence of transposons) of plant aminopropyl transferase genes were compared between the genomic sequences of SPDS, SPMS and tSPMS from Zea mays, Oryza sativa, Malus x domestica, Populus trichocarpa, Arabidopsis thaliana and Physcomitrella patens. In addition, the genomic organization of plant PMT genes, proposed to be derived from SPDS during the evolution of alkaloid metabolism, is illustrated. Herein, a particular conservation and arrangement of exon and intron sequences between plant SPDS, SPMS and PMT genes that clearly differs with that of ACL5 genes, is shown. The possible acquisition of the plant SPMS exon II and, in particular exon XI in the monocot SPMS genes, is a remarkable feature that allows their differentiation from SPDS genes. In accordance with our in silico analysis, functional complementation experiments of the maize ZmSPMS1 enzyme (previously considered to be SPDS) in yeast demonstrated its spermine synthase activity. Another significant aspect is the conservation of intron sequences among SPDS and PMT paralogs. In addition the existence of microsynteny among some SPDS paralogs, especially in P. trichocarpa and A. thaliana, supports duplication events of plant SPDS genes. Based in our analysis, we hypothesize that SPMS genes appeared with the divergence of vascular plants by a processes of gene duplication and the acquisition of unique exons of as-yet unknown origin. 2010 Elsevier Masson SAS. All rights reserved.
Troggio, Michela; Šurbanovski, Nada; Bianco, Luca; Moretto, Marco; Giongo, Lara; Banchi, Elisa; Viola, Roberto; Fernández, Felicdad Fernández; Costa, Fabrizio; Velasco, Riccardo; Cestaro, Alessandro; Sargent, Daniel James
2013-01-01
High throughput arrays for the simultaneous genotyping of thousands of single-nucleotide polymorphisms (SNPs) have made the rapid genetic characterisation of plant genomes and the development of saturated linkage maps a realistic prospect for many plant species of agronomic importance. However, the correct calling of SNP genotypes in divergent polyploid genomes using array technology can be problematic due to paralogy, and to divergence in probe sequences causing changes in probe binding efficiencies. An Illumina Infinium II whole-genome genotyping array was recently developed for the cultivated apple and used to develop a molecular linkage map for an apple rootstock progeny (M432), but a large proportion of segregating SNPs were not mapped in the progeny, due to unexpected genotype clustering patterns. To investigate the causes of this unexpected clustering we performed BLAST analysis of all probe sequences against the ‘Golden Delicious’ genome sequence and discovered evidence for paralogous annealing sites and probe sequence divergence for a high proportion of probes contained on the array. Following visual re-evaluation of the genotyping data generated for 8,788 SNPs for the M432 progeny using the array, we manually re-scored genotypes at 818 loci and mapped a further 797 markers to the M432 linkage map. The newly mapped markers included the majority of those that could not be mapped previously, as well as loci that were previously scored as monomorphic, but which segregated due to divergence leading to heterozygosity in probe annealing sites. An evaluation of the 8,788 probes in a diverse collection of Malus germplasm showed that more than half the probes returned genotype clustering patterns that were difficult or impossible to interpret reliably, highlighting implications for the use of the array in genome-wide association studies. PMID:23826289
A novel sodium bicarbonate cotransporter-like gene in an ancient duplicated region: SLC4A9 at 5q31
Lipovich, Leonard; Lynch, Eric D; Lee, Ming K; King, Mary-Claire
2001-01-01
Background: Sodium bicarbonate cotransporter (NBC) genes encode proteins that execute coupled Na+ and HCO3- transport across epithelial cell membranes. We report the discovery, characterization, and genomic context of a novel human NBC-like gene, SLC4A9, on chromosome 5q31. Results: SLC4A9 was initially discovered by genomic sequence annotation and further characterized by sequencing of long-insert cDNA library clones. The predicted protein of 990 amino acids has 12 transmembrane domains and high sequence similarity to other NBCs. The 23-exon gene has 14 known mRNA isoforms. In three regions, mRNA sequence variation is generated by the inclusion or exclusion of portions of an exon. Noncoding SLC4A9 cDNAs were recovered multiple times from different libraries. The 3' untranslated region is fragmented into six alternatively spliced exons and contains expressed Alu, LINE and MER repeats. SLC4A9 has two alternative stop codons and six polyadenylation sites. Its expression is largely restricted to the kidney. In silico approaches were used to characterize two additional novel SLC4A genes and to place SLC4A9 within the context of multiple paralogous gene clusters containing members of the epidermal growth factor (EGF), ankyrin (ANK) and fibroblast growth factor (FGF) families. Seven human EGF-SLC4A-ANK-FGF clusters were found. Conclusion: The novel sodium bicarbonate cotransporter-like gene SLC4A9 demonstrates abundant alternative mRNA processing. It belongs to a growing class of functionally diverse genes characterized by inefficient highly variable splicing. The evolutionary history of the EGF-SLC4A-ANK-FGF gene clusters involves multiple rounds of duplication, apparently followed by large insertions and deletions at paralogous loci and genome-wide gene shuffling. PMID:11305939
Pan, Shu-Ting; Xue, Danfeng; Li, Zhi-Ling; Zhou, Zhi-Wei; He, Zhi-Xu; Yang, Yinxue; Yang, Tianxin; Qiu, Jia-Xuan; Zhou, Shu-Feng
2016-01-01
The human cytochrome P450 (CYP) superfamily consisting of 57 functional genes is the most important group of Phase I drug metabolizing enzymes that oxidize a large number of xenobiotics and endogenous compounds, including therapeutic drugs and environmental toxicants. The CYP superfamily has been shown to expand itself through gene duplication, and some of them become pseudogenes due to gene mutations. Orthologs and paralogs are homologous genes resulting from speciation or duplication, respectively. To explore the evolutionary and functional relationships of human CYPs, we conducted this bioinformatic study to identify their corresponding paralogs, homologs, and orthologs. The functional implications and implications in drug discovery and evolutionary biology were then discussed. GeneCards and Ensembl were used to identify the paralogs of human CYPs. We have used a panel of online databases to identify the orthologs of human CYP genes: NCBI, Ensembl Compara, GeneCards, OMA (“Orthologous MAtrix”) Browser, PATHER, TreeFam, EggNOG, and Roundup. The results show that each human CYP has various numbers of paralogs and orthologs using GeneCards and Ensembl. For example, the paralogs of CYP2A6 include CYP2A7, 2A13, 2B6, 2C8, 2C9, 2C18, 2C19, 2D6, 2E1, 2F1, 2J2, 2R1, 2S1, 2U1, and 2W1; CYP11A1 has 6 paralogs including CYP11B1, 11B2, 24A1, 27A1, 27B1, and 27C1; CYP51A1 has only three paralogs: CYP26A1, 26B1, and 26C1; while CYP20A1 has no paralog. The majority of human CYPs are well conserved from plants, amphibians, fishes, or mammals to humans due to their important functions in physiology and xenobiotic disposition. The data from different approaches are also cross-validated and validated when experimental data are available. These findings facilitate our understanding of the evolutionary relationships and functional implications of the human CYP superfamily in drug discovery. PMID:27367670
High-throughput physical mapping of chromosomes using automated in situ hybridization.
George, Phillip; Sharakhova, Maria V; Sharakhov, Igor V
2012-06-28
Projects to obtain whole-genome sequences for 10,000 vertebrate species and for 5,000 insect and related arthropod species are expected to take place over the next 5 years. For example, the sequencing of the genomes for 15 malaria mosquitospecies is currently being done using an Illumina platform. This Anopheles species cluster includes both vectors and non-vectors of malaria. When the genome assemblies become available, researchers will have the unique opportunity to perform comparative analysis for inferring evolutionary changes relevant to vector ability. However, it has proven difficult to use next-generation sequencing reads to generate high-quality de novo genome assemblies. Moreover, the existing genome assemblies for Anopheles gambiae, although obtained using the Sanger method, are gapped or fragmented. Success of comparative genomic analyses will be limited if researchers deal with numerous sequencing contigs, rather than with chromosome-based genome assemblies. Fragmented, unmapped sequences create problems for genomic analyses because: (i) unidentified gaps cause incorrect or incomplete annotation of genomic sequences; (ii) unmapped sequences lead to confusion between paralogous genes and genes from different haplotypes; and (iii) the lack of chromosome assignment and orientation of the sequencing contigs does not allow for reconstructing rearrangement phylogeny and studying chromosome evolution. Developing high-resolution physical maps for species with newly sequenced genomes is a timely and cost-effective investment that will facilitate genome annotation, evolutionary analysis, and re-sequencing of individual genomes from natural populations. Here, we present innovative approaches to chromosome preparation, fluorescent in situ hybridization (FISH), and imaging that facilitate rapid development of physical maps. Using An. gambiae as an example, we demonstrate that the development of physical chromosome maps can potentially improve genome assemblies and, thus, the quality of genomic analyses. First, we use a high-pressure method to prepare polytene chromosome spreads. This method, originally developed for Drosophila, allows the user to visualize more details on chromosomes than the regular squashing technique. Second, a fully automated, front-end system for FISH is used for high-throughput physical genome mapping. The automated slide staining system runs multiple assays simultaneously and dramatically reduces hands-on time. Third, an automatic fluorescent imaging system, which includes a motorized slide stage, automatically scans and photographs labeled chromosomes after FISH. This system is especially useful for identifying and visualizing multiple chromosomal plates on the same slide. In addition, the scanning process captures a more uniform FISH result. Overall, the automated high-throughput physical mapping protocol is more efficient than a standard manual protocol.
Phylogenetic Analysis of Mitochondrial Outer Membrane β-Barrel Channels
Wojtkowska, Małgorzata; Jąkalski, Marcin; Pieńkowska, Joanna R.; Stobienia, Olgierd; Karachitos, Andonis; Przytycka, Teresa M.; Weiner, January; Kmita, Hanna; Makałowski, Wojciech
2012-01-01
Transport of molecules across mitochondrial outer membrane is pivotal for a proper function of mitochondria. The transport pathways across the membrane are formed by ion channels that participate in metabolite exchange between mitochondria and cytoplasm (voltage-dependent anion-selective channel, VDAC) as well as in import of proteins encoded by nuclear genes (Tom40 and Sam50/Tob55). VDAC, Tom40, and Sam50/Tob55 are present in all eukaryotic organisms, encoded in the nuclear genome, and have β-barrel topology. We have compiled data sets of these protein sequences and studied their phylogenetic relationships with a special focus on the position of Amoebozoa. Additionally, we identified these protein-coding genes in Acanthamoeba castellanii and Dictyostelium discoideum to complement our data set and verify the phylogenetic position of these model organisms. Our analysis show that mitochondrial β-barrel channels from Archaeplastida (plants) and Opisthokonta (animals and fungi) experienced many duplication events that resulted in multiple paralogous isoforms and form well-defined monophyletic clades that match the current model of eukaryotic evolution. However, in representatives of Amoebozoa, Chromalveolata, and Excavata (former Protista), they do not form clearly distinguishable clades, although they locate basally to the plant and algae branches. In most cases, they do not posses paralogs and their sequences appear to have evolved quickly or degenerated. Consequently, the obtained phylogenies of mitochondrial outer membrane β-channels do not entirely reflect the recent eukaryotic classification system involving the six supergroups: Chromalveolata, Excavata, Archaeplastida, Rhizaria, Amoebozoa, and Opisthokonta. PMID:22155732
Xu, Qifang; Dunbrack, Roland L
2012-11-01
Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM-HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly.
Kauzlaric, Annamaria; Ecco, Gabriela; Cassano, Marco; Duc, Julien; Imbeault, Michael; Trono, Didier
2017-01-01
KRAB-containing poly-zinc finger proteins (KZFPs) constitute the largest family of transcription factors encoded by mammalian genomes, and growing evidence indicates that they fulfill functions critical to both embryonic development and maintenance of adult homeostasis. KZFP genes underwent broad and independent waves of expansion in many higher vertebrates lineages, yet comprehensive studies of members harbored by a given species are scarce. Here we present a thorough analysis of KZFP genes and related units in the murine genome. We first identified about twice as many elements than previously annotated as either KZFP genes or pseudogenes, notably by assigning to this family an entity formerly considered as a large group of Satellite repeats. We then could delineate an organization in clusters distributed throughout the genome, with signs of recombination, translocation, duplication and seeding of new sites by retrotransposition of KZFP genes and related genetic units (KZFP/rGUs). Moreover, we harvested evidence indicating that closely related paralogs had evolved through both drifting and shifting of sequences encoding for zinc finger arrays. Finally, we could demonstrate that the KAP1-SETDB1 repressor complex tames the expression of KZFP/rGUs within clusters, yet that the primary targets of this regulation are not the KZFP/rGUs themselves but enhancers contained in neighboring endogenous retroelements and that, underneath, KZFPs conserve highly individualized patterns of expression. PMID:28334004
Kauzlaric, Annamaria; Ecco, Gabriela; Cassano, Marco; Duc, Julien; Imbeault, Michael; Trono, Didier
2017-01-01
KRAB-containing poly-zinc finger proteins (KZFPs) constitute the largest family of transcription factors encoded by mammalian genomes, and growing evidence indicates that they fulfill functions critical to both embryonic development and maintenance of adult homeostasis. KZFP genes underwent broad and independent waves of expansion in many higher vertebrates lineages, yet comprehensive studies of members harbored by a given species are scarce. Here we present a thorough analysis of KZFP genes and related units in the murine genome. We first identified about twice as many elements than previously annotated as either KZFP genes or pseudogenes, notably by assigning to this family an entity formerly considered as a large group of Satellite repeats. We then could delineate an organization in clusters distributed throughout the genome, with signs of recombination, translocation, duplication and seeding of new sites by retrotransposition of KZFP genes and related genetic units (KZFP/rGUs). Moreover, we harvested evidence indicating that closely related paralogs had evolved through both drifting and shifting of sequences encoding for zinc finger arrays. Finally, we could demonstrate that the KAP1-SETDB1 repressor complex tames the expression of KZFP/rGUs within clusters, yet that the primary targets of this regulation are not the KZFP/rGUs themselves but enhancers contained in neighboring endogenous retroelements and that, underneath, KZFPs conserve highly individualized patterns of expression.
Dunbrack, Roland L.
2012-01-01
Motivation: Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. Results: We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM–HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. Availability: The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly. Contact: Roland.Dunbracks@fccc.edu PMID:22942020
Functional specificity of a Hox protein mediated by the recognition of minor groove structure.
Joshi, Rohit; Passner, Jonathan M; Rohs, Remo; Jain, Rinku; Sosinsky, Alona; Crickmore, Michael A; Jacob, Vinitha; Aggarwal, Aneel K; Honig, Barry; Mann, Richard S
2007-11-02
The recognition of specific DNA-binding sites by transcription factors is a critical yet poorly understood step in the control of gene expression. Members of the Hox family of transcription factors bind DNA by making nearly identical major groove contacts via the recognition helices of their homeodomains. In vivo specificity, however, often depends on extended and unstructured regions that link Hox homeodomains to a DNA-bound cofactor, Extradenticle (Exd). Using a combination of structure determination, computational analysis, and in vitro and in vivo assays, we show that Hox proteins recognize specific Hox-Exd binding sites via residues located in these extended regions that insert into the minor groove but only when presented with the correct DNA sequence. Our results suggest that these residues, which are conserved in a paralog-specific manner, confer specificity by recognizing a sequence-dependent DNA structure instead of directly reading a specific DNA sequence.
Sousa, Filipa L; Parente, Daniel J; Hessman, Jacob A; Chazelle, Allen; Teichmann, Sarah A; Swint-Kruse, Liskin
2016-09-01
The AlloRep database (www.AlloRep.org) (Sousa et al., 2016) [1] compiles extensive sequence, mutagenesis, and structural information for the LacI/GalR family of transcription regulators. Sequence alignments are presented for >3000 proteins in 45 paralog subfamilies and as a subsampled alignment of the whole family. Phenotypic and biochemical data on almost 6000 mutants have been compiled from an exhaustive search of the literature; citations for these data are included herein. These data include information about oligomerization state, stability, DNA binding and allosteric regulation. Protein structural data for 65 proteins are presented as easily-accessible, residue-contact networks. Finally, this article includes example queries to enable the use of the AlloRep database. See the related article, "AlloRep: a repository of sequence, structural and mutagenesis data for the LacI/GalR transcription regulators" (Sousa et al., 2016) [1].
Treetrimmer: a method for phylogenetic dataset size reduction.
Maruyama, Shinichiro; Eveleigh, Robert J M; Archibald, John M
2013-04-12
With rapid advances in genome sequencing and bioinformatics, it is now possible to generate phylogenetic trees containing thousands of operational taxonomic units (OTUs) from a wide range of organisms. However, use of rigorous tree-building methods on such large datasets is prohibitive and manual 'pruning' of sequence alignments is time consuming and raises concerns over reproducibility. There is a need for bioinformatic tools with which to objectively carry out such pruning procedures. Here we present 'TreeTrimmer', a bioinformatics procedure that removes unnecessary redundancy in large phylogenetic datasets, alleviating the size effect on more rigorous downstream analyses. The method identifies and removes user-defined 'redundant' sequences, e.g., orthologous sequences from closely related organisms and 'recently' evolved lineage-specific paralogs. Representative OTUs are retained for more rigorous re-analysis. TreeTrimmer reduces the OTU density of phylogenetic trees without sacrificing taxonomic diversity while retaining the original tree topology, thereby speeding up downstream computer-intensive analyses, e.g., Bayesian and maximum likelihood tree reconstructions, in a reproducible fashion.
Palmisano, Aldo N.; Winton, James R.; Dickhoff, Walton W.
1999-01-01
We cloned and sequenced a chinook salmon Hsp90 cDNA; sequence analysis shows it to be Hsp90??. Phylogenetic analysis supports the hypothesis that ?? and ?? paralogs of Hsp90 arose as a result of a gene duplication event and that they diverged early in the evolution of vertebrates, before tetrapods separated from the teleost lineage. Among several differences distinguishing poikilothermic Hsp90?? sequences from their bird and mammal orthologs, the teleost versions specifically lack a characteristic QTQDQP phosphorylation site near the N-terminus. We used the cDNA to develop an RNA (Northern) blot to quantify cellular Hsp90 mRNA levels. Chinook salmon embryonic (CHSE-214) cells responded to heat shock with a rapid rise in Hsp90 mRNA through 4 h, followed by a gradual decline over the next 20 h. Hsp90 mRNA level may be useful as a stress indicator, especially in a laboratory setting or in response to acute heat stress.
The genome of melon (Cucumis melo L.)
Garcia-Mas, Jordi; Benjak, Andrej; Sanseverino, Walter; Bourgeois, Michael; Mir, Gisela; González, Víctor M.; Hénaff, Elizabeth; Câmara, Francisco; Cozzuto, Luca; Lowy, Ernesto; Alioto, Tyler; Capella-Gutiérrez, Salvador; Blanca, Jose; Cañizares, Joaquín; Ziarsolo, Pello; Gonzalez-Ibeas, Daniel; Rodríguez-Moreno, Luis; Droege, Marcus; Du, Lei; Alvarez-Tejado, Miguel; Lorente-Galdos, Belen; Melé, Marta; Yang, Luming; Weng, Yiqun; Navarro, Arcadi; Marques-Bonet, Tomas; Aranda, Miguel A.; Nuez, Fernando; Picó, Belén; Gabaldón, Toni; Roma, Guglielmo; Guigó, Roderic; Casacuberta, Josep M.; Arús, Pere; Puigdomènech, Pere
2012-01-01
We report the genome sequence of melon, an important horticultural crop worldwide. We assembled 375 Mb of the double-haploid line DHL92, representing 83.3% of the estimated melon genome. We predicted 27,427 protein-coding genes, which we analyzed by reconstructing 22,218 phylogenetic trees, allowing mapping of the orthology and paralogy relationships of sequenced plant genomes. We observed the absence of recent whole-genome duplications in the melon lineage since the ancient eudicot triplication, and our data suggest that transposon amplification may in part explain the increased size of the melon genome compared with the close relative cucumber. A low number of nucleotide-binding site–leucine-rich repeat disease resistance genes were annotated, suggesting the existence of specific defense mechanisms in this species. The DHL92 genome was compared with that of its parental lines allowing the quantification of sequence variability in the species. The use of the genome sequence in future investigations will facilitate the understanding of evolution of cucurbits and the improvement of breeding strategies. PMID:22753475
Law, Adrienne; Boulanger, Martin J.
2011-01-01
The phenylacetic acid (PAA) degradation pathway is the sole aerobic route for phenylacetic acid metabolism in bacteria and facilitates degradation of environmental pollutants such as styrene and ethylbenzene. The PAA pathway also is implicated in promoting Burkholderia cenocepacia infections in cystic fibrosis patients. Intriguingly, the first enzyme in the PAA pathway is present in two copies (paaK1 and paaK2), yet each subsequent enzyme is present in only a single copy. Furthermore, sequence divergence indicates that PaaK1 and PaaK2 form a unique subgroup within the adenylate-forming enzyme (AFE) superfamily. To establish a biochemical rationale for the existence of the PaaK paralogs in B. cenocepacia, we present high resolution x-ray crystal structures of a selenomethionine derivative of PaaK1 in complex with ATP and adenylated phenylacetate intermediate complexes of PaaK1 and PaaK2 in distinct conformations. Structural analysis reveals a novel N-terminal microdomain that may serve to recruit subsequent PAA enzymes, whereas a bifunctional role is proposed for the P-loop in stabilizing the C-terminal domain in conformation 2. The potential for different kinetic profiles was suggested by a structurally divergent extension of the aryl substrate pocket in PaaK1 relative to PaaK2. Functional characterization confirmed this prediction, with PaaK1 possessing a lower Km for phenylacetic acid and better able to accommodate 3′ and 4′ substitutions on the phenyl ring. Collectively, these results offer detailed insight into the reaction mechanism of a novel subgroup of the AFE superfamily and provide a clear biochemical rationale for the presence of paralogous copies of PaaK of B. cenocepacia. PMID:21388965
Shirak, A; Golik, M; Lee, B-Y; Howe, A E; Kocher, T D; Hulata, G; Ron, M; Seroussi, E
2008-11-01
Lipocalins are involved in the binding of small molecules like sex steroids. We show here that the previously reported tilapia male-specific protein (MSP) is a lipocalin encoded by a variety of paralogous and homologous genes in different tilapia species. Exon-intron boundaries of MSP genes were typical of the six-exon genomic structure of lipocalins, and the transcripts were capable of encoding 200 amino-acid polypeptides that consisted of a putative signal peptide and a lipocalin domain. Cysteine residues are conserved in positions analogous to those forming the three disulfide bonds characteristic of the ligand pocket. The calculated molecular mass of the secreted MSP (20.4 kDa) was less than half of that observed, suggesting that it is highly glycosylated like its homologue tributyltin-binding protein. Analysis of sequence variations revealed three types of paralogs MSPA, MSPB and MSPC. Expression of both MSPA and MSPB was detected in testis. In haploid Oreochromis niloticus embryos, each of these types consisted of two closely related paralogs, and asymmetry between MSP copy numbers on the maternal (six copies) and the paternal (three copies) chromosomes was observed. Using this polymorphism we mapped MSPA and MSPC to linkage group 12 of an F(2) mapping family derived from a cross between O. niloticus and Oreochromis aureus. Females with high MSP copy number were more frequent by more than twofold than males. Gender-MSPC combinations showed significant deviation from expected Mendelian segregation (P=0.009) suggesting elimination of males with MSPC copies. We discuss different hypotheses to explain this elimination, including possibility for allelic conflict resulted by the hybridization.
Differential paralog divergence modulates genome evolution across yeast species
Lynch, Bryony; Huang, Mei; Alcantara, Erica; DeSevo, Christopher G.; Pai, Dave A.; Hoang, Margaret L.
2017-01-01
Evolutionary outcomes depend not only on the selective forces acting upon a species, but also on the genetic background. However, large timescales and uncertain historical selection pressures can make it difficult to discern such important background differences between species. Experimental evolution is one tool to compare evolutionary potential of known genotypes in a controlled environment. Here we utilized a highly reproducible evolutionary adaptation in Saccharomyces cerevisiae to investigate whether experimental evolution of other yeast species would select for similar adaptive mutations. We evolved populations of S. cerevisiae, S. paradoxus, S. mikatae, S. uvarum, and interspecific hybrids between S. uvarum and S. cerevisiae for ~200–500 generations in sulfate-limited continuous culture. Wild-type S. cerevisiae cultures invariably amplify the high affinity sulfate transporter gene, SUL1. However, while amplification of the SUL1 locus was detected in S. paradoxus and S. mikatae populations, S. uvarum cultures instead selected for amplification of the paralog, SUL2. We measured the relative fitness of strains bearing deletions and amplifications of both SUL genes from different species, confirming that, converse to S. cerevisiae, S. uvarum SUL2 contributes more to fitness in sulfate limitation than S. uvarum SUL1. By measuring the fitness and gene expression of chimeric promoter-ORF constructs, we were able to delineate the cause of this differential fitness effect primarily to the promoter of S. uvarum SUL1. Our data show evidence of differential sub-functionalization among the sulfate transporters across Saccharomyces species through recent changes in noncoding sequence. Furthermore, these results show a clear example of how such background differences due to paralog divergence can drive changes in genome evolution. PMID:28196070
Adhesive Properties of YapV and Paralogous Autotransporter Proteins of Yersinia pestis
Nair, Manoj K. M.; De Masi, Leon; Yue, Min; Galván, Estela M.; Chen, Huaiqing; Wang, Fang
2015-01-01
Yersinia pestis is the causative agent of plague. This bacterium evolved from an ancestral enteroinvasive Yersinia pseudotuberculosis strain by gene loss and acquisition of new genes, allowing it to use fleas as transmission vectors. Infection frequently leads to a rapidly lethal outcome in humans, a variety of rodents, and cats. This study focuses on the Y. pestis KIM yapV gene and its product, recognized as an autotransporter protein by its typical sequence, outer membrane localization, and amino-terminal surface exposure. Comparison of Yersinia genomes revealed that DNA encoding YapV or each of three individual paralogous proteins (YapK, YapJ, and YapX) was present as a gene or pseudogene in a strain-specific manner and only in Y. pestis and Y. pseudotuberculosis. YapV acted as an adhesin for alveolar epithelial cells and specific extracellular matrix (ECM) proteins, as shown with recombinant Escherichia coli, Y. pestis, or purified passenger domains. Like YapV, YapK and YapJ demonstrated adhesive properties, suggesting that their previously related in vivo activity is due to their capacity to modulate binding properties of Y. pestis in its hosts, in conjunction with other adhesins. A differential host-specific type of binding to ECM proteins by YapV, YapK, and YapJ suggested that these proteins participate in broadening the host range of Y. pestis. A phylogenic tree including 36 Y. pestis strains highlighted an association between the gene profile for the four paralogous proteins and the geographic location of the corresponding isolated strains, suggesting an evolutionary adaption of Y. pestis to specific local animal hosts or reservoirs. PMID:25690102
Opazo, Juan C.; Toloza-Villalobos, Jessica; Burmester, Thorsten; Venkatesh, Byrappa; Storz, Jay F.
2015-01-01
Comparative analyses of vertebrate genomes continue to uncover a surprising diversity of genes in the globin gene superfamily, some of which have very restricted phyletic distributions despite their antiquity. Genomic analysis of the globin gene repertoire of cartilaginous fish (Chondrichthyes) should be especially informative about the duplicative origins and ancestral functions of vertebrate globins, as divergence between Chondrichthyes and bony vertebrates represents the most basal split within the jawed vertebrates. Here, we report a comparative genomic analysis of the vertebrate globin gene family that includes the complete globin gene repertoire of the elephant shark (Callorhinchus milii). Using genomic sequence data from representatives of all major vertebrate classes, integrated analyses of conserved synteny and phylogenetic relationships revealed that the last common ancestor of vertebrates possessed a repertoire of at least seven globin genes: single copies of androglobin and neuroglobin, four paralogous copies of globin X, and the single-copy progenitor of the entire set of vertebrate-specific globins. Combined with expression data, the genomic inventory of elephant shark globins yielded four especially surprising findings: 1) there is no trace of the neuroglobin gene (a highly conserved gene that is present in all other jawed vertebrates that have been examined to date), 2) myoglobin is highly expressed in heart, but not in skeletal muscle (reflecting a possible ancestral condition in vertebrates with single-circuit circulatory systems), 3) elephant shark possesses two highly divergent globin X paralogs, one of which is preferentially expressed in gonads, and 4) elephant shark possesses two structurally distinct α-globin paralogs, one of which is preferentially expressed in the brain. Expression profiles of elephant shark globin genes reveal distinct specializations of function relative to orthologs in bony vertebrates and suggest hypotheses about ancestral functions of vertebrate globins. PMID:25743544
López-Igual, Rocío; Wilson, Adjélé; Bourcier de Carbon, Céline; Sutter, Markus; Turmo, Aiko
2016-01-01
The photoactive Orange Carotenoid Protein (OCP) is involved in cyanobacterial photoprotection. Its N-terminal domain (NTD) is responsible for interaction with the antenna and induction of excitation energy quenching, while the C-terminal domain is the regulatory domain that senses light and induces photoactivation. In most nitrogen-fixing cyanobacterial strains, there are one to four paralogous genes coding for homologs to the NTD of the OCP. The functions of these proteins are unknown. Here, we study the expression, localization, and function of these genes in Anabaena sp. PCC 7120. We show that the four genes present in the genome are expressed in both vegetative cells and heterocysts but do not seem to have an essential role in heterocyst formation. This study establishes that all four Anabaena NTD-like proteins can bind a carotenoid and the different paralogs have distinct functions. Surprisingly, only one paralog (All4941) was able to interact with the antenna and to induce permanent thermal energy dissipation. Two of the other Anabaena paralogs (All3221 and Alr4783) were shown to be very good singlet oxygen quenchers. The fourth paralog (All1123) does not seem to be involved in photoprotection. Structural homology modeling allowed us to propose specific features responsible for the different functions of these soluble carotenoid-binding proteins. PMID:27208286
Ream, Thomas S.; Haag, Jeremy R.; Pontvianne, Frederic; ...
2015-05-02
Using affinity purification and mass spectrometry, we identified the subunits of Arabidopsis thaliana multisubunit RNA Polymerases I and III (abbreviated as Pol I and Pol III), providing the first description of their physical compositions in plants. AC40 and AC19 subunits are typically common to Pol I (a.k.a. Pol A) and Pol III (a.k.a. Pol C) and are encoded by single genes whose mutation, in humans, is a cause of the craniofacial disorder, Treacher-Collins Syndrome. Surprisingly, A. thaliana, and related species, express two distinct AC40 paralogs, one of which assembles into Pol I and the other of which assembles into Polmore » III. Changes at eight amino acid positions correlate with this functional divergence of Pol I and Pol III-specific AC40 paralogs. Two genes encode homologs of the yeast C53 subunit, and either variant can assemble into Pol III. By contrast, only one of two potential C17 variants, and one of two potential C31 variants were detected in Pol III. We introduce a new nomenclature system for plant Pol I and Pol III subunits in which the twelve subunits that are structurally and functionally homologous among Pols I through V are assigned equivalent numbers.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ream, Thomas S.; Haag, Jeremy R.; Pontvianne, Frederic
Using affinity purification and mass spectrometry, we identified the subunits of Arabidopsis thaliana multisubunit RNA Polymerases I and III (abbreviated as Pol I and Pol III), providing the first description of their physical compositions in plants. AC40 and AC19 subunits are typically common to Pol I (a.k.a. Pol A) and Pol III (a.k.a. Pol C) and are encoded by single genes whose mutation, in humans, is a cause of the craniofacial disorder, Treacher-Collins Syndrome. Surprisingly, A. thaliana, and related species, express two distinct AC40 paralogs, one of which assembles into Pol I and the other of which assembles into Polmore » III. Changes at eight amino acid positions correlate with this functional divergence of Pol I and Pol III-specific AC40 paralogs. Two genes encode homologs of the yeast C53 subunit, and either variant can assemble into Pol III. By contrast, only one of two potential C17 variants, and one of two potential C31 variants were detected in Pol III. We introduce a new nomenclature system for plant Pol I and Pol III subunits in which the twelve subunits that are structurally and functionally homologous among Pols I through V are assigned equivalent numbers.« less
CODEHOP (COnsensus-DEgenerate Hybrid Oligonucleotide Primer) PCR primer design
Rose, Timothy M.; Henikoff, Jorja G.; Henikoff, Steven
2003-01-01
We have developed a new primer design strategy for PCR amplification of distantly related gene sequences based on consensus-degenerate hybrid oligonucleotide primers (CODEHOPs). An interactive program has been written to design CODEHOP PCR primers from conserved blocks of amino acids within multiply-aligned protein sequences. Each CODEHOP consists of a pool of related primers containing all possible nucleotide sequences encoding 3–4 highly conserved amino acids within a 3′ degenerate core. A longer 5′ non-degenerate clamp region contains the most probable nucleotide predicted for each flanking codon. CODEHOPs are used in PCR amplification to isolate distantly related sequences encoding the conserved amino acid sequence. The primer design software and the CODEHOP PCR strategy have been utilized for the identification and characterization of new gene orthologs and paralogs in different plant, animal and bacterial species. In addition, this approach has been successful in identifying new pathogen species. The CODEHOP designer (http://blocks.fhcrc.org/codehop.html) is linked to BlockMaker and the Multiple Alignment Processor within the Blocks Database World Wide Web (http://blocks.fhcrc.org). PMID:12824413
Vatanparast, Mohammad; Powell, Adrian; Doyle, Jeff J; Egan, Ashley N
2018-03-01
The development of pipelines for locus discovery has spurred the use of target enrichment for plant phylogenomics. However, few studies have compared pipelines from locus discovery and bait design, through validation, to tree inference. We compared three methods within Leguminosae (Fabaceae) and present a workflow for future efforts. Using 30 transcriptomes, we compared Hyb-Seq, MarkerMiner, and the Yang and Smith (Y&S) pipelines for locus discovery, validated 7501 baits targeting 507 loci across 25 genera via Illumina sequencing, and inferred gene and species trees via concatenation- and coalescent-based methods. Hyb-Seq discovered loci with the longest mean length. MarkerMiner discovered the most conserved loci with the least flagged as paralogous. Y&S offered the most parsimony-informative sites and putative orthologs. Target recovery averaged 93% across taxa. We optimized our targeted locus set based on a workflow designed to minimize paralog/ortholog conflation and thus present 423 loci for legume phylogenomics. Methods differed across criteria important for phylogenetic marker development. We recommend Hyb-Seq as a method that may be useful for most phylogenomic projects. Our targeted locus set is a resource for future, community-driven efforts to reconstruct the legume tree of life.
Identification and characterization of two wheat Glycogen Synthase Kinase 3/ SHAGGY-like kinases.
Bittner, Thomas; Campagne, Sarah; Neuhaus, Gunther; Rensing, Stefan A; Fischer-Iglesias, Christiane
2013-04-18
Plant Glycogen Synthase Kinase 3/ SHAGGY-like kinases (GSKs) have been implicated in numerous biological processes ranging from embryonic, flower, stomata development to stress and wound responses. They are key regulators of brassinosteroid signaling and are also involved in the cross-talk between auxin and brassinosteroid pathways. In contrast to the human genome that contains two genes, plant GSKs are encoded by a multigene family. Little is known about Liliopsida resp. Poaceae in comparison to Brassicaceae GSKs. Here, we report the identification and structural characterization of two GSK homologs named TaSK1 and TaSK2 in the hexaploid wheat genome as well as a widespread phylogenetic analysis of land plant GSKs. Genomic and cDNA sequence alignments as well as chromosome localization using nullisomic-tetrasomic lines provided strong evidence for three expressed gene copies located on homoeolog chromosomes for TaSK1 as well as for TaSK2. Predicted proteins displayed a clear GSK signature. In vitro kinase assays showed that TaSK1 and TaSK2 possessed kinase activity. A phylogenetic analysis of land plant GSKs indicated that TaSK1 and TaSK2 belong to clade II of plant GSKs, the Arabidopsis members of which are all involved in Brassinosteroid signaling. Based on a single ancestral gene in the last common ancestor of all land plants, paralogs were acquired and retained through paleopolyploidization events, resulting in six to eight genes in angiosperms. More recent duplication events have increased the number up to ten in some lineages. To account for plant diversity in terms of functionality, morphology and development, attention has to be devoted to Liliopsida resp Poaceae GSKs in addition to Arabidopsis GSKs. In this study, molecular characterization, chromosome localization, kinase activity test and phylogenetic analysis (1) clarified the homologous/paralogous versus homoeologous status of TaSK sequences, (2) pointed out their affiliation to the GSK multigene family, (3) showed a functional kinase activity, (4) allowed a classification in clade II, members of which are involved in BR signaling and (5) allowed to gain information on acquisition and retention of GSK paralogs in angiosperms in the context of whole genome duplication events. Our results provide a framework to explore Liliopsida resp Poaceae GSKs functions in development.
Molecular cloning and expression analysis of WRKY transcription factor genes in Salvia miltiorrhiza.
Li, Caili; Li, Dongqiao; Shao, Fenjuan; Lu, Shanfa
2015-03-17
WRKY proteins comprise a large family of transcription factors and play important regulatory roles in plant development and defense response. The WRKY gene family in Salvia miltiorrhiza has not been characterized. A total of 61 SmWRKYs were cloned from S. miltiorrhiza. Multiple sequence alignment showed that SmWRKYs could be classified into 3 groups and 8 subgroups. Sequence features, the WRKY domain and other motifs of SmWRKYs are largely conserved with Arabidopsis AtWRKYs. Each group of WRKY domains contains characteristic conserved sequences, and group-specific motifs might attribute to functional divergence of WRKYs. A total of 17 pairs of orthologous SmWRKY and AtWRKY genes and 21 pairs of paralogous SmWRKY genes were identified. Maximum likelihood analysis showed that SmWRKYs had undergone strong selective pressure for adaptive evolution. Functional divergence analysis suggested that the SmWRKY subgroup genes and many paralogous SmWRKY gene pairs were divergent in functions. Various critical amino acids contributed to functional divergence among subgroups were detected. Of the 61 SmWRKYs, 22, 13, 4 and 1 were predominantly expressed in roots, stems, leaves, and flowers, respectively. The other 21 were mainly expressed in at least two tissues analyzed. In S. miltiorrhiza roots treated with MeJA, significant changes of gene expression were observed for 49 SmWRKYs, of which 26 were up-regulated, 18 were down-regulated, while the other 5 were either up-regulated or down-regulated at different time-points of treatment. Analysis of published RNA-seq data showed that 42 of the 61 identified SmWRKYs were yeast extract and Ag(+)-responsive. Through a systematic analysis, SmWRKYs potentially involved in tanshinone biosynthesis were predicted. These results provide insights into functional conservation and diversification of SmWRKYs and are useful information for further elucidating SmWRKY functions.
Casimiro-Soriguer, Inés; Narbona, Eduardo; Buide, M. L.; del Valle, José C.; Whittall, Justen B.
2016-01-01
Flower color polymorphisms are widely used as model traits from genetics to ecology, yet determining the biochemical and molecular basis can be challenging. Anthocyanin-based flower color variations can be caused by at least 12 structural and three regulatory genes in the anthocyanin biosynthetic pathway (ABP). We use mRNA-Seq to simultaneously sequence and estimate expression of these candidate genes in nine samples of Silene littorea representing three color morphs (dark pink, light pink and white) across three developmental stages in hopes of identifying the cause of flower color variation. We identified 29 putative paralogs for the 15 candidate genes in the ABP. We assembled complete coding sequences for 16 structural loci and nine of ten regulatory loci. Among these 29 putative paralogs, we identified 622 SNPs, yet only nine synonymous SNPs in Ans had allele frequencies that differentiated pigmented petals (dark pink and light pink) from white petals. These Ans allele frequency differences were further investigated with an expanded sequencing survey of 38 individuals, yet no SNPs consistently differentiated the color morphs. We also found one locus, F3h1, with strong differential expression between pigmented and white samples (>42x). This may be caused by decreased expression of Myb1a in white petal buds. Myb1a in S. littorea is a regulatory locus closely related to Subgroup 7 Mybs known to regulate F3h and other loci in the first half of the ABP in model species. We then compare the mRNA-Seq results with petal biochemistry which revealed cyanidin as the primary anthocyanin and five flavonoid intermediates. Concentrations of three of the flavonoid intermediates were significantly lower in white petals than in pigmented petals (rutin, quercetin and isovitexin). The biochemistry results for rutin, quercetin, luteolin and apigenin are consistent with the transcriptome results suggesting a blockage at F3h, possibly caused by downregulation of Myb1a. PMID:26973662
Armesto, Paula; Infante, Carlos; Cousin, Xavier; Ponce, Marian; Manchado, Manuel
2015-04-01
In the present work, seven genes encoding Na(+),K(+)-ATPase (NKA) β-subunits in the teleost Solea senegalensis are described for the first time. Sequence analysis of the predicted polypeptides revealed a high degree of conservation with those of other vertebrate species and maintenance of important motifs involved in structure and function. Phylogenetic analysis clustered the seven genes into four main clades: β1 (atp1b1a and atp1b1b), β2 (atp1b2a and atp1b2b), β3 (atp1b3a and atp1b3b) and β4 (atp1b4). In juveniles, all paralogous transcripts were detected in the nine tissues examined albeit with different expression patterns. The most ubiquitous expressed gene was atp1b1a whereas atp1b1b was mainly detected in osmoregulatory organs (gill, kidney and intestine), and atp1b2a, atp1b2b, atp1b3a, atp1b3b and atp1b4 in brain. An expression analysis in three brain regions and pituitary revealed that β1-type transcripts were more abundant in pituitary than the other β paralogs with slight differences between brain regions. Quantification of mRNA abundance in gills after a salinity challenge showed an activation of atp1b1a and atp1b1b at high salinity water (60 ppt) and atp1b3a and atp1b3b in response to low salinity (5 ppt). Transcriptional analysis during larval development showed specific expression patterns for each paralog. Moreover, no differences in the expression profiles between larvae cultivated at 10 and 35 ppt were observed except for atp1b4 with higher mRNA levels at 10 than 35 ppt at 18 days post hatch. Whole-mount in situ hybridization analysis revealed that atp1b1b was mainly localized in gut, pronephric tubule, gill, otic vesicle, and chordacentrum of newly hatched larvae. All these data suggest distinct roles of NKA β subunits in tissues, during development and osmoregulation with β1 subunits involved in the adaptation to hyperosmotic conditions and β3 subunits to hypoosmotic environments. Copyright © 2014 Elsevier Inc. All rights reserved.
Castillo, Andreína I; Andreína Pacheco, M; Escalante, Ananias A
2017-06-01
Malaria parasites (genus Plasmodium) are a diverse group found in many species of vertebrate hosts. These parasites invade red blood cells in a complex process comprising several proteins, many encoded by multigene families, one of which is merozoite surface protein 7 (msp7). In the case of Plasmodium vivax, the most geographically widespread human-infecting species, differences in the number of paralogs within multigene families have been previously explained, at least in part, as potential adaptations to the human host. To explore this in msp7, we studied its orthologs in closely related nonhuman primate parasites; investigating both paralog evolutionary history and genetic polymorphism. The emerging patterns were then compared with the human parasite Plasmodium falciparum. We found that the evolution of the msp7 family is consistent with a birth-and-death model, where duplications, pseudogenizations, and gene loss events are common. However, all paralogs in P. vivax and P. falciparum had orthologs in their closely related species in non-human primates indicating that the ancestors of those paralogs precede the events leading to their origins as human parasites. Thus, the number of paralogs cannot be explained as an adaptation to human hosts. Although there is no functional information for msp7 in P. vivax, we found evidence for purifying selection in the genetic polymorphism of some of its paralogs as well as their orthologs in closely related non-human primate parasites. We also found evidence indicating that a few of P. vivax's paralogs may have diverged from their orthologs in non-human primates by episodic positive selection. Hence, they may had been under selection when the lineage leading to P. vivax diverged from the Asian non-human primates and switched into Homininae. All these lines of evidence suggest that msp7 is functionally important in P. vivax. Copyright © 2017 Elsevier B.V. All rights reserved.
Karn, Robert C.; Chung, Amanda G.; Laukaitis, Christina M.
2014-01-01
The Androgen-binding protein (Abp) region of the mouse genome contains 30 Abpa genes encoding alpha subunits and 34 Abpbg genes encoding betagamma subunits, their products forming dimers composed of an alpha and a betagamma subunit. We endeavored to determine how many Abp genes are expressed as proteins in tears and saliva, and as transcripts in the exocrine glands producing them. Using standard PCR, we amplified Abp transcripts from cDNA libraries of C57BL/6 mice and found fifteen Abp gene transcripts in the lacrimal gland and five in the submandibular gland. Proteomic analyses identified proteins corresponding to eleven of the lacrimal gland transcripts, all of them different from the three salivary ABPs reported previously. Our qPCR results showed that five of the six transcripts that lacked corresponding proteins are expressed at very low levels compared to those transcripts with proteins. We found 1) no overlap in the repertoires of expressed Abp paralogs in lacrimal gland/tears and salivary glands/saliva; 2) substantial sex-limited expression of lacrimal gland/tear expressed-paralogs in males but no sex-limited expression in females; and 3) that the lacrimal gland/tear expressed-paralogs are found exclusively in ancestral clades 1, 2 and 3 of the five clades described previously while the salivary glands/saliva expressed-paralogs are found only in clade 5. The number of instances of extremely low levels of transcription without corresponding protein production in paralogs specific to tears and saliva suggested the role of subfunctionalization, a derived condition wherein genes that may have been expressed highly in both glands ancestrally were down-regulated subsequent to duplication. Thus, evidence for subfunctionalization can be seen in our data and we argue that the partitioning of paralog expression between lacrimal and salivary glands that we report here occurred as the result of adaptive evolution. PMID:25531410
Karn, Robert C; Chung, Amanda G; Laukaitis, Christina M
2014-01-01
The Androgen-binding protein (Abp) region of the mouse genome contains 30 Abpa genes encoding alpha subunits and 34 Abpbg genes encoding betagamma subunits, their products forming dimers composed of an alpha and a betagamma subunit. We endeavored to determine how many Abp genes are expressed as proteins in tears and saliva, and as transcripts in the exocrine glands producing them. Using standard PCR, we amplified Abp transcripts from cDNA libraries of C57BL/6 mice and found fifteen Abp gene transcripts in the lacrimal gland and five in the submandibular gland. Proteomic analyses identified proteins corresponding to eleven of the lacrimal gland transcripts, all of them different from the three salivary ABPs reported previously. Our qPCR results showed that five of the six transcripts that lacked corresponding proteins are expressed at very low levels compared to those transcripts with proteins. We found 1) no overlap in the repertoires of expressed Abp paralogs in lacrimal gland/tears and salivary glands/saliva; 2) substantial sex-limited expression of lacrimal gland/tear expressed-paralogs in males but no sex-limited expression in females; and 3) that the lacrimal gland/tear expressed-paralogs are found exclusively in ancestral clades 1, 2 and 3 of the five clades described previously while the salivary glands/saliva expressed-paralogs are found only in clade 5. The number of instances of extremely low levels of transcription without corresponding protein production in paralogs specific to tears and saliva suggested the role of subfunctionalization, a derived condition wherein genes that may have been expressed highly in both glands ancestrally were down-regulated subsequent to duplication. Thus, evidence for subfunctionalization can be seen in our data and we argue that the partitioning of paralog expression between lacrimal and salivary glands that we report here occurred as the result of adaptive evolution.
Gao, Weixia; Zhang, Zhongxiong; Feng, Jun; Dang, Yulei; Quan, Yufen; Gu, Yanyan; Wang, Shufang; Song, Cunjiang
2016-09-01
Actin-like MreB paralogs play important roles in cell shape maintenance, cell wall synthesis and the regulation of the D,L-endopeptidases, CwlO and LytE. The gram-positive bacteria, Bacillus amyloliquefaciens LL3, is a poly-γ-glutamic acid (γ-PGA) producing strain that contains three MreB paralogs: MreB, Mbl and MreBH. In B. amyloliquefaciens, CwlO and LytE can degrade γ-PGA. In this study, we aimed to test the hypothesis that modulating transcript levels of MreB paralogs would alter the synthesis and degradation of γ-PGA. The results showed that overexpression or inhibition of MreB, Mbl or MreBH had distinct effects on cell morphology and the molecular weight of the γ-PGA products. In fermentation medium, cells of mreB inhibition mutant were 50.2% longer than LL3, and the γ-PGA titer increased by 55.7%. However, changing the expression level of mbl showed only slight effects on the morphology, γ-PGA molecular weight and titer. In the mreBH inhibition mutant, γ-PGA production and its molecular weight increased by 56.7% and 19.4%, respectively. These results confirmed our hypothesis that suppressing the expression of MreB paralogs might reduce γ-PGA degradation, and that improving the cell size could strengthen γ-PGA synthesis. This is the first report of enhanced γ-PGA production via suppression of actin-like MreB paralogs. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Mikami, Suzuka; Kanaba, Teppei; Ito, Yutaka; Mishima, Masaki
2013-10-01
The transcriptional corepressor SMRT/HDAC1-associated repressor protein (SHARP) recruits histone deacetylases. Human SHARP protein is thought to function in processes involving steroid hormone responses and the Notch signaling pathway. SHARP consists of RNA recognition motifs (RRMs) in the N-terminal region and the spen paralog and ortholog C-terminal (SPOC) domain in the C-terminal region. It is known that the SPOC domain binds the LSD motif in the C-terminal tail of corepressors silencing mediator for retinoid and thyroid receptor (SMRT)/nuclear receptor corepressor (NcoR). We are interested in delineating the mechanism by which the SPOC domain recognizes the LSD motif of the C-terminal tail of SMRT/NcoR. To this end, we are investigating the tertiary structure of the SPOC/SMRT peptide using NMR. Herein, we report on the (1)H, (13)C and (15)N resonance assignments of the SPOC domain in complex with a SMRT peptide, which contributes towards a structural understanding of the SPOC/SMRT peptide and its molecular recognition.
Childs, Kevin L; Konganti, Kranti; Buell, C Robin
2012-01-01
Major feedstock sources for future biofuel production are likely to be high biomass producing plant species such as poplar, pine, switchgrass, sorghum and maize. One active area of research in these species is genome-enabled improvement of lignocellulosic biofuel feedstock quality and yield. To facilitate genomic-based investigations in these species, we developed the Biofuel Feedstock Genomic Resource (BFGR), a database and web-portal that provides high-quality, uniform and integrated functional annotation of gene and transcript assembly sequences from species of interest to lignocellulosic biofuel feedstock researchers. The BFGR includes sequence data from 54 species and permits researchers to view, analyze and obtain annotation at the gene, transcript, protein and genome level. Annotation of biochemical pathways permits the identification of key genes and transcripts central to the improvement of lignocellulosic properties in these species. The integrated nature of the BFGR in terms of annotation methods, orthologous/paralogous relationships and linkage to seven species with complete genome sequences allows comparative analyses for biofuel feedstock species with limited sequence resources. Database URL: http://bfgr.plantbiology.msu.edu.
Patel, Hardik J.; Patel, Pallav D.; Ochiana, Stefan O.; Yan, Pengrong; Sun, Weilin; Patel, Maulik R.; Shah, Smit K.; Tramentozzi, Elisa; Brooks, James; Bolaender, Alexander; Shrestha, Liza; Stephani, Ralph; Finotti, Paola; Leifer, Cynthia; Li, Zihai; Gewirth, Daniel T.; Taldone, Tony; Chiosis, Gabriela
2015-01-01
Grp94 is involved in the regulation of a restricted number of proteins and represents a potential target in a host of diseases, including cancer, septic shock, autoimmune diseases, chronic inflammatory conditions, diabetes, coronary thrombosis, and stroke. We have recently identified a novel allosteric pocket located in the Grp94 N-terminal binding site that can be used to design ligands with a 2-log selectivity over the other Hsp90 paralogs. Here we perform extensive SAR investigations in this ligand series and rationalize the affinity and paralog selectivity of choice derivatives by molecular modeling. We then use this to design 18c, a derivative with good potency for Grp94 (IC50 = 0.22 μM) and selectivity over other paralogs (>100- and 33-fold for Hsp90α/β and Trap-1, respectively). The paralog selectivity and target-mediated activity of 18c was confirmed in cells through several functional readouts. Compound 18c was also inert when tested against a large panel of kinases. We show that 18c has biological activity in several cellular models of inflammation and cancer and also present here for the first time the in vivo profile of a Grp94 inhibitor. PMID:25901531
Hox11 paralogous genes are essential for metanephric kidney induction
Wellik, Deneen M.; Hawkes, Patrick J.; Capecchi, Mario R.
2002-01-01
The mammalian Hox complex is divided into four linkage groups containing 13 sets of paralogous genes. These paralogous genes have retained functional redundancy during evolution. For this reason, loss of only one or two Hox genes within a paralogous group often results in incompletely penetrant phenotypes which are difficult to interpret by molecular analysis. For example, mice individually mutant for Hoxa11 or Hoxd11 show no discernible kidney abnormalities. Hoxa11/Hoxd11 double mutants, however, demonstrate hypoplasia of the kidneys. As described in this study, removal of the last Hox11 paralogous member, Hoxc11, results in the complete loss of metanephric kidney induction. In these triple mutants, the metanephric blastema condenses, and expression of early patterning genes, Pax2 and Wt1, is unperturbed. Eya1 expression is also intact. Six2 expression, however, is absent, as is expression of the inducing growth factor, Gdnf. In the absence of Gdnf, ureteric bud formation is not initiated. Molecular analysis of this phenotype demonstrates that Hox11 control of early metanephric induction is accomplished by the interaction of Hox11 genes with the pax-eya-six regulatory cascade, a pathway that may be used by Hox genes more generally for the induction of multiple structures along the anteroposterior axis. PMID:12050119
Hox11 paralogous genes are essential for metanephric kidney induction.
Wellik, Deneen M; Hawkes, Patrick J; Capecchi, Mario R
2002-06-01
The mammalian Hox complex is divided into four linkage groups containing 13 sets of paralogous genes. These paralogous genes have retained functional redundancy during evolution. For this reason, loss of only one or two Hox genes within a paralogous group often results in incompletely penetrant phenotypes which are difficult to interpret by molecular analysis. For example, mice individually mutant for Hoxa11 or Hoxd11 show no discernible kidney abnormalities. Hoxa11/Hoxd11 double mutants, however, demonstrate hypoplasia of the kidneys. As described in this study, removal of the last Hox11 paralogous member, Hoxc11, results in the complete loss of metanephric kidney induction. In these triple mutants, the metanephric blastema condenses, and expression of early patterning genes, Pax2 and Wt1, is unperturbed. Eya1 expression is also intact. Six2 expression, however, is absent, as is expression of the inducing growth factor, Gdnf. In the absence of Gdnf, ureteric bud formation is not initiated. Molecular analysis of this phenotype demonstrates that Hox11 control of early metanephric induction is accomplished by the interaction of Hox11 genes with the pax-eya-six regulatory cascade, a pathway that may be used by Hox genes more generally for the induction of multiple structures along the anteroposterior axis.
Hall, Jennifer R; Clow, Kathy A; Rise, Matthew L; Driedzic, William R
2015-09-01
Aquaglyceroporins (GLPs) are integral membrane proteins that facilitate passive movement of water, glycerol and urea across cellular membranes. In this study, GLP-encoding genes were characterized in rainbow smelt (Osmerus mordax mordax), an anadromous teleost that accumulates high glycerol and modest urea levels in plasma and tissues as an adaptive cryoprotectant mechanism in sub-zero temperatures. We report the gene and promoter sequences for two aqp10b paralogs (aqp10ba, aqp10bb) that are 82% identical at the predicted amino acid level, and aqp9b. Aqp10bb and aqp9b have the 6 exon structure common to vertebrate GLPs. Aqp10ba has 8 exons; there are two additional exons at the 5' end, and the promoter sequence is different from aqp10bb. Molecular phylogenetic analysis suggests that the aqp10b paralogs arose from a gene duplication event specific to the smelt lineage. Smelt GLP transcripts are ubiquitously expressed; however, aqp10ba transcripts were highest in kidney, aqp10bb transcripts were highest in kidney, intestine, pyloric caeca and brain, and aqp9b transcripts were highest in spleen, liver, red blood cells and kidney. In cold-temperature challenge experiments, plasma glycerol and urea levels were significantly higher in cold- compared to warm-acclimated smelt; however, GLP transcript levels were generally either significantly lower or remained constant. The exception was significantly higher aqp10ba transcript levels in kidney. High aqp10ba transcripts in smelt kidney that increase significantly in response to cold temperature in congruence with plasma urea suggest that this gene duplicate may have evolved to allow the re-absorption of urea to concomitantly conserve nitrogen and prevent freezing. Copyright © 2015 Elsevier Inc. All rights reserved.
Bek-Thomsen, Malene; Poulsen, Knud; Kilian, Mogens
2012-01-01
ABSTRACT The distribution, genome location, and evolution of the four paralogous zinc metalloproteases, IgA1 protease, ZmpB, ZmpC, and ZmpD, in Streptococcus pneumoniae and related commensal species were studied by in silico analysis of whole genomes and by activity screening of 154 representatives of 20 species. ZmpB was ubiquitous in the Mitis and Salivarius groups of the genus Streptococcus and in the genera Gemella and Granulicatella, with the exception of a fragmented gene in Streptococcus thermophilus, the only species with a nonhuman habitat. IgA1 protease activity was observed in all members of S. pneumoniae, S. pseudopneumoniae, S. oralis, S. sanguinis, and Gemella haemolysans, was variably present in S. mitis and S. infantis, and absent in S. gordonii, S. parasanguinis, S. cristatus, S. oligofermentans, S. australis, S. peroris, and S. suis. Phylogenetic analysis of 297 zmp sequences and representative housekeeping genes provided evidence for an unprecedented selection for genetic diversification of the iga, zmpB, and zmpD genes in S. pneumoniae and evidence of very frequent intraspecies transfer of entire genes and combination of genes. Presumably due to their adaptation to a commensal lifestyle, largely unaffected by adaptive mucosal immune factors, the corresponding genes in commensal streptococci have remained conserved. The widespread distribution and significant sequence diversity indicate an ancient origin of the zinc metalloproteases predating the emergence of the humanoid species. zmpB, which appears to be the ancestral gene, subsequently duplicated and successfully diversified into distinct functions, is likely to serve an important but yet unknown housekeeping function associated with the human host. PMID:23033471
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maeder, Dennis L.; Anderson, Iain; Brettin, Thomas S.
2006-05-19
We report here a comparative analysis of the genome sequence of Methanosarcina barkeri with those of Methanosarcina acetivorans and Methanosarcina mazei. All three genomes share a conserved double origin of replication and many gene clusters. M. barkeri is distinguished by having an organization that is well conserved with respect to the other Methanosarcinae in the region proximal to the origin of replication with interspecies gene similarities as high as 95%. However it is disordered and marked by increased transposase frequency and decreased gene synteny and gene density in the proximal semi-genome. Of the 3680 open reading frames in M. barkeri,more » 678 had paralogs with better than 80% similarity to both M. acetivorans and M. mazei while 128 nonhypothetical orfs were unique (non-paralogous) amongst these species including a complete formate dehydrogenase operon, two genes required for N-acetylmuramic acid synthesis, a 14 gene gas vesicle cluster and a bacterial P450-specific ferredoxin reductase cluster not previously observed or characterized in this genus. A cryptic 36 kbp plasmid sequence was detected in M. barkeri that contains an orc1 gene flanked by a presumptive origin of replication consisting of 38 tandem repeats of a 143 nt motif. Three-way comparison of these genomes reveals differing mechanisms for the accrual of changes. Elongation of the large M. acetivorans is the result of multiple gene-scale insertions and duplications uniformly distributed in that genome, while M. barkeri is characterized by localized inversions associated with the loss of gene content. In contrast, the relatively short M. mazei most closely approximates the ancestral organizational state.« less
Biederman, Michelle K; Nelson, Megan M; Asalone, Kathryn C; Pedersen, Alyssa L; Saldanha, Colin J; Bracht, John R
2018-05-21
Developmentally programmed genome rearrangements are rare in vertebrates, but have been reported in scattered lineages including the bandicoot, hagfish, lamprey, and zebra finch (Taeniopygia guttata) [1]. In the finch, a well-studied animal model for neuroendocrinology and vocal learning [2], one such programmed genome rearrangement involves a germline-restricted chromosome, or GRC, which is found in germlines of both sexes but eliminated from mature sperm [3, 4]. Transmitted only through the oocyte, it displays uniparental female-driven inheritance, and early in embryonic development is apparently eliminated from all somatic tissue in both sexes [3, 4]. The GRC comprises the longest finch chromosome at over 120 million base pairs [3], and previously the only known GRC-derived sequence was repetitive and non-coding [5]. Because the zebra finch genome project was sourced from male muscle (somatic) tissue [6], the remaining genomic sequence and protein-coding content of the GRC remain unknown. Here we report the first protein-coding gene from the GRC: a member of the α-soluble N-ethylmaleimide sensitive fusion protein (NSF) attachment protein (α-SNAP) family hitherto missing from zebra finch gene annotations. In addition to the GRC-encoded α-SNAP, we find an additional paralogous α-SNAP residing in the somatic genome (a somatolog)-making the zebra finch the first example in which α-SNAP is not a single-copy gene. We show divergent, sex-biased expression for the paralogs and also that positive selection is detectable across the bird α-SNAP lineage, including the GRC-encoded α-SNAP. This study presents the identification and evolutionary characterization of the first protein-coding GRC gene in any organism. Copyright © 2018 Elsevier Ltd. All rights reserved.
Blazier, J Chris; Ruhlman, Tracey A; Weng, Mao-Lun; Rehman, Sumaiyah K; Sabir, Jamal S M; Jansen, Robert K
2016-04-18
Genes for the plastid-encoded RNA polymerase (PEP) persist in the plastid genomes of all photosynthetic angiosperms. However, three unrelated lineages (Annonaceae, Passifloraceae and Geraniaceae) have been identified with unusually divergent open reading frames (ORFs) in the conserved region of rpoA, the gene encoding the PEP α subunit. We used sequence-based approaches to evaluate whether these genes retain function. Both gene sequences and complete plastid genome sequences were assembled and analyzed from each of the three angiosperm families. Multiple lines of evidence indicated that the rpoA sequences are likely functional despite retaining as low as 30% nucleotide sequence identity with rpoA genes from outgroups in the same angiosperm order. The ratio of non-synonymous to synonymous substitutions indicated that these genes are under purifying selection, and bioinformatic prediction of conserved domains indicated that functional domains are preserved. One of the lineages (Pelargonium, Geraniaceae) contains species with multiple rpoA-like ORFs that show evidence of ongoing inter-paralog gene conversion. The plastid genomes containing these divergent rpoA genes have experienced extensive structural rearrangement, including large expansions of the inverted repeat. We propose that illegitimate recombination, not positive selection, has driven the divergence of rpoA.
Churova, Maria V; Meshcheryakova, Olga V; Ruchev, Mikhail; Nemova, Nina N
2017-09-01
This study was conducted to characterize the features of muscle-specific genes expression during development of brown trout Salmo trutta inhabiting the river Krivoy ruchey (Kola Peninsula, Russia). Gene expression levels of myogenic regulatory factors (MRFs - MyoD1 paralogs (MyoD1a, MyoD1b, MyoD1c), Myf5, myogenin), myostatin paralogs (MSTN-1a, MSTN-1b, MSTN-2a), fast skeletal myosin heavy chain (MyHC) were measured in the white muscles of brown trout parr of ages 0+ (under-yearling), 1+ (yearling) and 2+ (two year old) and smolts of age 2+. Multidirectional changes in MyoD1 and MSTN paralogs expression along with myogenin, Myf 5 and MyHC expression levels in white muscles in parr of trout with age were revealed. The expression of MyoD1c, myogenin, MSTN-2a was the highest in 0+ parr and then decreased. MyoD1a/b expression levels didn't differ between age groups. The simultaneous elevation of MyHC, Myf5, MSTN-1a, and MSTN-1b was found in trout yearlings. In smolts, expression levels of MSTN paralogs, MyHC, Myf5, MyoD1a was lower than in parr. But in contrast, the MyoD1c and myogenin mRNA levels was higher in smolts. The study revealed that there are definite patterns in simultaneous muscle-specific genes expression in age groups of parr and smolts. As MyoD and MSTN paralogs expression changed differently in dependence on age and stage, it was suggested that paralogs of the same gene complementarily control myogenesis during development. Copyright © 2017 Elsevier Inc. All rights reserved.
Expression and responses to dehydration and salinity stresses of V-PPase gene members in wheat.
Wang, Yuezhi; Xu, Haibin; Zhang, Guangxiang; Zhu, Huilan; Zhang, Lixia; Zhang, Zhengzhi; Zhang, Caiqin; Ma, Zhengqiang
2009-12-01
Vacuolar H(+)-translocating pyrophosphatase (V-PPase) is a key enzyme related to plant growth as well as abiotic stress tolerance. In this work, wheat V-PPase genes TaVP1, TaVP2 and TaVP3 were identified. TaVP1 and TaVP2 are more similar to each other than to TaVP3. Their deduced polypeptide sequences preserve the topological structure and essential residues of V-PPases. Phylogenetic studies suggested that monocot plants, at least monocot grasses, have three VP paralogs. TaVP3 transcripts were only detected in developing seeds, and no TaVP2 transcripts were found in germinating seeds. TaVP2 was mainly expressed in shoot tissues and down-regulated in leaves under dehydration. Its expression was up-regulated in roots under high salinity. TaVP1 was relatively more ubiquitously and evenly expressed than TaVP2. Its expression level in roots was highest among the tissues examined, and was inducible by salinity stress. These results indicated that the V-PPase gene paralogs in wheat are differentially regulated spatially and in response to dehydration and salinity stresses. 2009 Institute of Genetics and Developmental Biology and the Genetics Society of China. Published by Elsevier Ltd. All rights reserved.
Guide RNA selection for CRISPR-Cas9 transfections in Plasmodium falciparum.
Ribeiro, Jose M; Garriga, Meera; Potchen, Nicole; Crater, Anna K; Gupta, Ankit; Ito, Daisuke; Desai, Sanjay A
2018-06-12
CRISPR-Cas9 mediated genome editing is addressing key limitations in the transfection of malaria parasites. While this method has already simplified the needed molecular cloning and reduced the time required to generate mutants in the human pathogen Plasmodium falciparum, optimal selection of required guide RNAs and guidelines for successful transfections have not been well characterized, leading workers to use time-consuming trial and error approaches. We used a genome-wide computational approach to create a comprehensive and publicly accessible database of possible guide RNA sequences in the P. falciparum genome. For each guide, we report on-target efficiency and specificity scores as well as information about the genomic site relevant to optimal design of CRISPR-Cas9 transfections to modify, disrupt, or conditionally knockdown any gene. As many antimalarial drug and vaccine targets are encoded by multigene families, we also developed a new paralog specificity score that should facilitate modification of either a single family member of interest or multiple paralogs that serve overlapping roles. Finally, we tabulated features of successful transfections in our laboratory, providing broadly useful guidelines for parasite transfections. Molecular studies aimed at understanding parasite biology or characterizing drug and vaccine targets in P. falciparum should be facilitated by this comprehensive database. Published by Elsevier Ltd.
Jojic, Borka; Amodeo, Simona; Bregy, Irina; Ochsenreiter, Torsten
2018-05-10
The translationally controlled tumor protein (TCTP; also known as TPT1 in mammals) is highly conserved and ubiquitously expressed in eukaryotes. It is involved in growth and development, cell cycle progression, protection against cellular stresses and apoptosis, indicating the multifunctional role of the protein. Here, for the first time, we characterize the expression and function of TCTP in the human and animal pathogen, Trypanosoma brucei We identified two paralogs ( TCTP1 and TCTP2 ) that are differentially expressed in the life cycle of the parasite. The genes have identical 5' untranslated regions (UTRs) and almost identical open-reading frames. The 3'UTRs differ substantially in sequence and length, and are sufficient for the exclusive expression of TCTP1 in procyclic- and TCTP2 in bloodstream-form parasites. Furthermore, we characterize which parts of the 3'UTR are needed for TCTP2 mRNA stability. RNAi experiments demonstrate that TCTP1 and TCTP2 expression is essential for normal cell growth in procyclic- and bloodstream-form parasites, respectively. Depletion of TCTP1 in the procyclic form cells leads to aberrant cell and mitochondrial organelle morphology, as well as enlarged, and a reduced number of, acidocalcisomes. © 2018. Published by The Company of Biologists Ltd.
Structural Insights into Functional Overlapping and Differentiation among Myosin V Motors*
Nascimento, Andrey F. Z.; Trindade, Daniel M.; Tonoli, Celisa C. C.; de Giuseppe, Priscila O.; Assis, Leandro H. P.; Honorato, Rodrigo V.; de Oliveira, Paulo S. L.; Mahajan, Pravin; Burgess-Brown, Nicola A.; von Delft, Frank; Larson, Roy E.; Murakami, Mario T.
2013-01-01
Myosin V (MyoV) motors have been implicated in the intracellular transport of diverse cargoes including vesicles, organelles, RNA-protein complexes, and regulatory proteins. Here, we have solved the cargo-binding domain (CBD) structures of the three human MyoV paralogs (Va, Vb, and Vc), revealing subtle structural changes that drive functional differentiation and a novel redox mechanism controlling the CBD dimerization process, which is unique for the MyoVc subclass. Moreover, the cargo- and motor-binding sites were structurally assigned, indicating the conservation of residues involved in the recognition of adaptors for peroxisome transport and providing high resolution insights into motor domain inhibition by CBD. These results contribute to understanding the structural requirements for cargo transport, autoinhibition, and regulatory mechanisms in myosin V motors. PMID:24097982
Ongoing resolution of duplicate gene functions shapes the diversification of a metabolic network
Kuang, Meihua Christina; Hutchins, Paul D; Russell, Jason D; Coon, Joshua J; Hittinger, Chris Todd
2016-01-01
The evolutionary mechanisms leading to duplicate gene retention are well understood, but the long-term impacts of paralog differentiation on the regulation of metabolism remain underappreciated. Here we experimentally dissect the functions of two pairs of ancient paralogs of the GALactose sugar utilization network in two yeast species. We show that the Saccharomyces uvarum network is more active, even as over-induction is prevented by a second co-repressor that the model yeast Saccharomyces cerevisiae lacks. Surprisingly, removal of this repression system leads to a strong growth arrest, likely due to overly rapid galactose catabolism and metabolic overload. Alternative sugars, such as fructose, circumvent metabolic control systems and exacerbate this phenotype. We further show that S. cerevisiae experiences homologous metabolic constraints that are subtler due to how the paralogs have diversified. These results show how the functional differentiation of paralogs continues to shape regulatory network architectures and metabolic strategies long after initial preservation. DOI: http://dx.doi.org/10.7554/eLife.19027.001 PMID:27690225
Conn, Caitlin E; Bythell-Douglas, Rohan; Neumann, Drexel; Yoshida, Satoko; Whittington, Bryan; Westwood, James H; Shirasu, Ken; Bond, Charles S; Dyer, Kelly A; Nelson, David C
2015-07-31
Obligate parasitic plants in the Orobanchaceae germinate after sensing plant hormones, strigolactones, exuded from host roots. In Arabidopsis thaliana, the α/β-hydrolase D14 acts as a strigolactone receptor that controls shoot branching, whereas its ancestral paralog, KAI2, mediates karrikin-specific germination responses. We observed that KAI2, but not D14, is present at higher copy numbers in parasitic species than in nonparasitic relatives. KAI2 paralogs in parasites are distributed into three phylogenetic clades. The fastest-evolving clade, KAI2d, contains the majority of KAI2 paralogs. Homology models predict that the ligand-binding pockets of KAI2d resemble D14. KAI2d transgenes confer strigolactone-specific germination responses to Arabidopsis thaliana. Thus, the KAI2 paralogs D14 and KAI2d underwent convergent evolution of strigolactone recognition, respectively enabling developmental responses to strigolactones in angiosperms and host detection in parasites. Copyright © 2015, American Association for the Advancement of Science.
Ongoing resolution of duplicate gene functions shapes the diversification of a metabolic network
Kuang, Meihua Christina; Hutchins, Paul D.; Russell, Jason D.; ...
2016-09-30
The evolutionary mechanisms leading to duplicate gene retention are well understood, but the long-term impacts of paralog differentiation on the regulation of metabolism remain underappreciated. Here we experimentally dissect the functions of two pairs of ancient paralogs of theGALactose sugar utilization network in two yeast species. Here, we show that theSaccharomyces uvarumnetwork is more active, even as over-induction is prevented by a second co-repressor that the model yeastSaccharomyces cerevisiaelacks. Surprisingly, removal of this repression system leads to a strong growth arrest, likely due to overly rapid galactose catabolism and metabolic overload. Alternative sugars, such as fructose, circumvent metabolic control systemsmore » and exacerbate this phenotype. Furthermore, we show thatS. cerevisiaeexperiences homologous metabolic constraints that are subtler due to how the paralogs have diversified. Our results show how the functional differentiation of paralogs continues to shape regulatory network architectures and metabolic strategies long after initial preservation.« less
Opazo, Juan C; Lee, Alison P; Hoffmann, Federico G; Toloza-Villalobos, Jessica; Burmester, Thorsten; Venkatesh, Byrappa; Storz, Jay F
2015-07-01
Comparative analyses of vertebrate genomes continue to uncover a surprising diversity of genes in the globin gene superfamily, some of which have very restricted phyletic distributions despite their antiquity. Genomic analysis of the globin gene repertoire of cartilaginous fish (Chondrichthyes) should be especially informative about the duplicative origins and ancestral functions of vertebrate globins, as divergence between Chondrichthyes and bony vertebrates represents the most basal split within the jawed vertebrates. Here, we report a comparative genomic analysis of the vertebrate globin gene family that includes the complete globin gene repertoire of the elephant shark (Callorhinchus milii). Using genomic sequence data from representatives of all major vertebrate classes, integrated analyses of conserved synteny and phylogenetic relationships revealed that the last common ancestor of vertebrates possessed a repertoire of at least seven globin genes: single copies of androglobin and neuroglobin, four paralogous copies of globin X, and the single-copy progenitor of the entire set of vertebrate-specific globins. Combined with expression data, the genomic inventory of elephant shark globins yielded four especially surprising findings: 1) there is no trace of the neuroglobin gene (a highly conserved gene that is present in all other jawed vertebrates that have been examined to date), 2) myoglobin is highly expressed in heart, but not in skeletal muscle (reflecting a possible ancestral condition in vertebrates with single-circuit circulatory systems), 3) elephant shark possesses two highly divergent globin X paralogs, one of which is preferentially expressed in gonads, and 4) elephant shark possesses two structurally distinct α-globin paralogs, one of which is preferentially expressed in the brain. Expression profiles of elephant shark globin genes reveal distinct specializations of function relative to orthologs in bony vertebrates and suggest hypotheses about ancestral functions of vertebrate globins. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Porter, Teresita M; Gibson, Joel F; Shokralla, Shadi; Baird, Donald J; Golding, G Brian; Hajibabaei, Mehrdad
2014-01-01
Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut-offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high-throughput environmental sequencing. This method provides rank-flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast-based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave-one-out cross-validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut-offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.
Automated sequence-specific protein NMR assignment using the memetic algorithm MATCH.
Volk, Jochen; Herrmann, Torsten; Wüthrich, Kurt
2008-07-01
MATCH (Memetic Algorithm and Combinatorial Optimization Heuristics) is a new memetic algorithm for automated sequence-specific polypeptide backbone NMR assignment of proteins. MATCH employs local optimization for tracing partial sequence-specific assignments within a global, population-based search environment, where the simultaneous application of local and global optimization heuristics guarantees high efficiency and robustness. MATCH thus makes combined use of the two predominant concepts in use for automated NMR assignment of proteins. Dynamic transition and inherent mutation are new techniques that enable automatic adaptation to variable quality of the experimental input data. The concept of dynamic transition is incorporated in all major building blocks of the algorithm, where it enables switching between local and global optimization heuristics at any time during the assignment process. Inherent mutation restricts the intrinsically required randomness of the evolutionary algorithm to those regions of the conformation space that are compatible with the experimental input data. Using intact and artificially deteriorated APSY-NMR input data of proteins, MATCH performed sequence-specific resonance assignment with high efficiency and robustness.
USDA-ARS?s Scientific Manuscript database
The biocontrol agent, Trichoderma virens, has the ability to protect plants from pathogens by eliciting plant defense responses, involvement in mycoparasitism, or secreting antagonistic secondary metabolites. SM1, an elicitor of induced systemic resistance (ISR), was found to have three paralogs wi...
Saisawang, Chonticha; Ketterman, Albert J.
2014-01-01
Glutathione transferases (GST) are an ancient superfamily comprising a large number of paralogous proteins in a single organism. This multiplicity of GSTs has allowed the copies to diverge for neofunctionalization with proposed roles ranging from detoxication and oxidative stress response to involvement in signal transduction cascades. We performed a comparative genomic analysis using FlyBase annotations and Drosophila melanogaster GST sequences as templates to further annotate the GST orthologs in the 12 Drosophila sequenced genomes. We found that GST genes in the Drosophila subgenera have undergone repeated local duplications followed by transposition, inversion, and micro-rearrangements of these copies. The colinearity and orientations of the orthologous GST genes appear to be unique in many of the species which suggests that genomic rearrangement events have occurred multiple times during speciation. The high micro-plasticity of the genomes appears to have a functional contribution utilized for evolution of this gene family. PMID:25310450
Comparative mapping in the Fagaceae and beyond with EST-SSRs
2012-01-01
Background Genetic markers and linkage mapping are basic prerequisites for comparative genetic analyses, QTL detection and map-based cloning. A large number of mapping populations have been developed for oak, but few gene-based markers are available for constructing integrated genetic linkage maps and comparing gene order and QTL location across related species. Results We developed a set of 573 expressed sequence tag-derived simple sequence repeats (EST-SSRs) and located 397 markers (EST-SSRs and genomic SSRs) on the 12 oak chromosomes (2n = 2x = 24) on the basis of Mendelian segregation patterns in 5 full-sib mapping pedigrees of two species: Quercus robur (pedunculate oak) and Quercus petraea (sessile oak). Consensus maps for the two species were constructed and aligned. They showed a high degree of macrosynteny between these two sympatric European oaks. We assessed the transferability of EST-SSRs to other Fagaceae genera and a subset of these markers was mapped in Castanea sativa, the European chestnut. Reasonably high levels of macrosynteny were observed between oak and chestnut. We also obtained diversity statistics for a subset of EST-SSRs, to support further population genetic analyses with gene-based markers. Finally, based on the orthologous relationships between the oak, Arabidopsis, grape, poplar, Medicago, and soybean genomes and the paralogous relationships between the 12 oak chromosomes, we propose an evolutionary scenario of the 12 oak chromosomes from the eudicot ancestral karyotype. Conclusions This study provides map locations for a large set of EST-SSRs in two oak species of recognized biological importance in natural ecosystems. This first step toward the construction of a gene-based linkage map will facilitate the assignment of future genome scaffolds to pseudo-chromosomes. This study also provides an indication of the potential utility of new gene-based markers for population genetics and comparative mapping within and beyond the Fagaceae. PMID:22931513
A Fluorescent In Vitro Assay to Investigate Paralog-Specific SUMO Conjugation.
Eisenhardt, Nathalie; Chaugule, Viduth K; Pichler, Andrea
2016-01-01
Protein modification with the small ubiquitin-related modifier SUMO is a potent regulatory mechanism implicated in a variety of biological pathways. In vitro sumoylation reactions have emerged as a versatile tool to identify and characterize novel SUMO enzymes as well as their substrates. Here, we present detailed protocols for the purification and fluorescent labeling of mammalian SUMO paralogs for their application in sumoylation assays. These assays provide a fast readout for in vitro SUMO chain formation activity of E3 ligases in a paralog-specific manner. Finally, we critically analyze the application of fluorescent SUMO proteins to study substrate modification in vitro revealing also the drawbacks of the system.
Dröge, J.; Gregor, I.; McHardy, A. C.
2015-01-01
Motivation: Metagenomics characterizes microbial communities by random shotgun sequencing of DNA isolated directly from an environment of interest. An essential step in computational metagenome analysis is taxonomic sequence assignment, which allows identifying the sequenced community members and reconstructing taxonomic bins with sequence data for the individual taxa. For the massive datasets generated by next-generation sequencing technologies, this cannot be performed with de-novo phylogenetic inference methods. We describe an algorithm and the accompanying software, taxator-tk, which performs taxonomic sequence assignment by fast approximate determination of evolutionary neighbors from sequence similarities. Results: Taxator-tk was precise in its taxonomic assignment across all ranks and taxa for a range of evolutionary distances and for short as well as for long sequences. In addition to the taxonomic binning of metagenomes, it is well suited for profiling microbial communities from metagenome samples because it identifies bacterial, archaeal and eukaryotic community members without being affected by varying primer binding strengths, as in marker gene amplification, or copy number variations of marker genes across different taxa. Taxator-tk has an efficient, parallelized implementation that allows the assignment of 6 Gb of sequence data per day on a standard multiprocessor system with 10 CPU cores and microbial RefSeq as the genomic reference data. Availability and implementation: Taxator-tk source and binary program files are publicly available at http://algbio.cs.uni-duesseldorf.de/software/. Contact: Alice.McHardy@uni-duesseldorf.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25388150
Bystry, Vojtech; Agathangelidis, Andreas; Bikos, Vasilis; Sutton, Lesley Ann; Baliakas, Panagiotis; Hadzidimitriou, Anastasia; Stamatopoulos, Kostas; Darzentas, Nikos
2015-12-01
An ever-increasing body of evidence supports the importance of B cell receptor immunoglobulin (BcR IG) sequence restriction, alias stereotypy, in chronic lymphocytic leukemia (CLL). This phenomenon accounts for ∼30% of studied cases, one in eight of which belong to major subsets, and extends beyond restricted sequence patterns to shared biologic and clinical characteristics and, generally, outcome. Thus, the robust assignment of new cases to major CLL subsets is a critical, and yet unmet, requirement. We introduce a novel application, ARResT/AssignSubsets, which enables the robust assignment of BcR IG sequences from CLL patients to major stereotyped subsets. ARResT/AssignSubsets uniquely combines expert immunogenetic sequence annotation from IMGT/V-QUEST with curation to safeguard quality, statistical modeling of sequence features from more than 7500 CLL patients, and results from multiple perspectives to allow for both objective and subjective assessment. We validated our approach on the learning set, and evaluated its real-world applicability on a new representative dataset comprising 459 sequences from a single institution. ARResT/AssignSubsets is freely available on the web at http://bat.infspire.org/arrest/assignsubsets/ nikos.darzentas@gmail.com. Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Pomel, Sébastien; Diogon, Marie; Bouchard, Philippe; Pradel, Lydie; Ravet, Viviane; Coffe, Gérard; Viguès, Bernard
2006-02-01
Previous attempts to identify the membrane skeleton of Paramecium cells have revealed a protein pattern that is both complex and specific. The most prominent structural elements, epiplasmic scales, are centered around ciliary units and are closely apposed to the cytoplasmic side of the inner alveolar membrane. We sought to characterize epiplasmic scale proteins (epiplasmins) at the molecular level. PCR approaches enabled the cloning and sequencing of two closely related genes by amplifications of sequences from a macronuclear genomic library. Using these two genes (EPI-1 and EPI-2), we have contributed to the annotation of the Paramecium tetraurelia macronuclear genome and identified 39 additional (paralogous) sequences. Two orthologous sequences were found in the Tetrahymena thermophila genome. Structural analysis of the 43 sequences indicates that the hallmark of this new multigenic family is a 79 aa domain flanked by two Q-, P- and V-rich stretches of sequence that are much more variable in amino-acid composition. Such features clearly distinguish members of the multigenic family from epiplasmic proteins previously sequenced in other ciliates. The expression of Green Fluorescent Protein (GFP)-tagged epiplasmin showed significant labeling of epiplasmic scales as well as oral structures. We expect that the GFP construct described herein will prove to be a useful tool for comparative subcellular localization of different putative epiplasmins in Paramecium.
A Theory of Utility Conditionals: Paralogical Reasoning from Decision-Theoretic Leakage
ERIC Educational Resources Information Center
Bonnefon, Jean-Francois
2009-01-01
Many "if p, then q" conditionals have decision-theoretic features, such as antecedents or consequents that relate to the utility functions of various agents. These decision-theoretic features leak into reasoning processes, resulting in various paralogical conclusions. The theory of utility conditionals offers a unified account of the various forms…
Genome sequence and genetic diversity of European ash trees.
Sollars, Elizabeth S A; Harper, Andrea L; Kelly, Laura J; Sambles, Christine M; Ramirez-Gonzalez, Ricardo H; Swarbreck, David; Kaithakottil, Gemy; Cooper, Endymion D; Uauy, Cristobal; Havlickova, Lenka; Worswick, Gemma; Studholme, David J; Zohren, Jasmin; Salmon, Deborah L; Clavijo, Bernardo J; Li, Yi; He, Zhesi; Fellgett, Alison; McKinney, Lea Vig; Nielsen, Lene Rostgaard; Douglas, Gerry C; Kjær, Erik Dahl; Downie, J Allan; Boshier, David; Lee, Steve; Clark, Jo; Grant, Murray; Bancroft, Ian; Caccamo, Mario; Buggs, Richard J A
2017-01-12
Ash trees (genus Fraxinus, family Oleaceae) are widespread throughout the Northern Hemisphere, but are being devastated in Europe by the fungus Hymenoscyphus fraxineus, causing ash dieback, and in North America by the herbivorous beetle Agrilus planipennis. Here we sequence the genome of a low-heterozygosity Fraxinus excelsior tree from Gloucestershire, UK, annotating 38,852 protein-coding genes of which 25% appear ash specific when compared with the genomes of ten other plant species. Analyses of paralogous genes suggest a whole-genome duplication shared with olive (Olea europaea, Oleaceae). We also re-sequence 37 F. excelsior trees from Europe, finding evidence for apparent long-term decline in effective population size. Using our reference sequence, we re-analyse association transcriptomic data, yielding improved markers for reduced susceptibility to ash dieback. Surveys of these markers in British populations suggest that reduced susceptibility to ash dieback may be more widespread in Great Britain than in Denmark. We also present evidence that susceptibility of trees to H. fraxineus is associated with their iridoid glycoside levels. This rapid, integrated, multidisciplinary research response to an emerging health threat in a non-model organism opens the way for mitigation of the epidemic.
Chaw, R. Crystal; Collin, Matthew; Wimmer, Marjorie; Helmrick, Kara-Leigh; Hayashi, Cheryl Y.
2017-01-01
Spiders swath their eggs with silk to protect developing embryos and hatchlings. Egg case silks, like other fibrous spider silks, are primarily composed of proteins called spidroins (spidroin = spider-fibroin). Silks, and thus spidroins, are important throughout the lives of spiders, yet the evolution of spidroin genes has been relatively understudied. Spidroin genes are notoriously difficult to sequence because they are typically very long (≥ 10 kb of coding sequence) and highly repetitive. Here, we investigate the evolution of spider silk genes through long-read sequencing of Bacterial Artificial Chromosome (BAC) clones. We demonstrate that the silver garden spider Argiope argentata has multiple egg case spidroin loci with a loss of function at one locus. We also use degenerate PCR primers to search the genomic DNA of congeneric species and find evidence for multiple egg case spidroin loci in other Argiope spiders. Comparative analyses show that these multiple loci are more similar at the nucleotide level within a species than between species. This pattern is consistent with concerted evolution homogenizing gene copies within a genome. More complicated explanations include convergent evolution or recent independent gene duplications within each species. PMID:29127108
2013-01-01
Background Snake venoms generally show sequence and quantitative variation within and between species, but some rattlesnakes have undergone exceptionally rapid, dramatic shifts in the composition, lethality, and pharmacological effects of their venoms. Such shifts have occurred within species, most notably in Mojave (Crotalus scutulatus), South American (C. durissus), and timber (C. horridus) rattlesnakes, resulting in some populations with extremely potent, neurotoxic venoms without the hemorrhagic effects typical of rattlesnake bites. Results To better understand the evolutionary changes that resulted in the potent venom of a population of C. horridus from northern Florida, we sequenced the venom-gland transcriptome of an animal from this population for comparison with the previously described transcriptome of the eastern diamondback rattlesnake (C. adamanteus), a congener with a more typical rattlesnake venom. Relative to the toxin transcription of C. adamanteus, which consisted primarily of snake-venom metalloproteinases, C-type lectins, snake-venom serine proteinases, and myotoxin-A, the toxin transcription of C. horridus was far simpler in composition and consisted almost entirely of snake-venom serine proteinases, phospholipases A2, and bradykinin-potentiating and C-type natriuretic peptides. Crotalus horridus lacked significant expression of the hemorrhagic snake-venom metalloproteinases and C-type lectins. Evolution of shared toxin families involved differential expansion and loss of toxin clades within each species and pronounced differences in the highly expressed toxin paralogs. Toxin genes showed significantly higher rates of nonsynonymous substitution than nontoxin genes. The expression patterns of nontoxin genes were conserved between species, despite the vast differences in toxin expression. Conclusions Our results represent the first complete, sequence-based comparison between the venoms of closely related snake species and reveal in unprecedented detail the rapid evolution of snake venoms. We found that the difference in venom properties resulted from major changes in expression levels of toxin gene families, differential gene-family expansion and loss, changes in which paralogs within gene families were expressed at high levels, and higher nonsynonymous substitution rates in the toxin genes relative to nontoxins. These massive alterations in the genetics of the venom phenotype emphasize the evolutionary lability and flexibility of this ecologically critical trait. PMID:23758969
Gautier, Philippe; Loosli, Felix; Tay, Boon-Hui; Tay, Alice; Murdoch, Emma; Coutinho, Pedro; van Heyningen, Veronica; Brenner, Sydney; Venkatesh, Byrappa; Kleinjan, Dirk A.
2013-01-01
Pax6 is a developmental control gene essential for eye development throughout the animal kingdom. In addition, Pax6 plays key roles in other parts of the CNS, olfactory system, and pancreas. In mammals a single Pax6 gene encoding multiple isoforms delivers these pleiotropic functions. Here we provide evidence that the genomes of many other vertebrate species contain multiple Pax6 loci. We sequenced Pax6-containing BACs from the cartilaginous elephant shark (Callorhinchus milii) and found two distinct Pax6 loci. Pax6.1 is highly similar to mammalian Pax6, while Pax6.2 encodes a paired-less Pax6. Using synteny relationships, we identify homologs of this novel paired-less Pax6.2 gene in lizard and in frog, as well as in zebrafish and in other teleosts. In zebrafish two full-length Pax6 duplicates were known previously, originating from the fish-specific genome duplication (FSGD) and expressed in divergent patterns due to paralog-specific loss of cis-elements. We show that teleosts other than zebrafish also maintain duplicate full-length Pax6 loci, but differences in gene and regulatory domain structure suggest that these Pax6 paralogs originate from a more ancient duplication event and are hence renamed as Pax6.3. Sequence comparisons between mammalian and elephant shark Pax6.1 loci highlight the presence of short- and long-range conserved noncoding elements (CNEs). Functional analysis demonstrates the ancient role of long-range enhancers for Pax6 transcription. We show that the paired-less Pax6.2 ortholog in zebrafish is expressed specifically in the developing retina. Transgenic analysis of elephant shark and zebrafish Pax6.2 CNEs with homology to the mouse NRE/Pα internal promoter revealed highly specific retinal expression. Finally, morpholino depletion of zebrafish Pax6.2 resulted in a “small eye” phenotype, supporting a role in retinal development. In summary, our study reveals that the pleiotropic functions of Pax6 in vertebrates are served by a divergent family of Pax6 genes, forged by ancient duplication events and by independent, lineage-specific gene losses. PMID:23359656
Bargues, M Dolores; Zuriaga, M Angeles; Mas-Coma, Santiago
2014-01-01
A pseudogene, paralogous to rDNA 5.8S and ITS-2, is described in Meccus dimidiata dimidiata, M. d. capitata, M. d. maculippenis, M. d. hegneri, M. sp. aff. dimidiata, M. p. phyllosoma, M. p. longipennis, M. p. pallidipennis, M. p. picturata, M. p. mazzottii, Triatoma mexicana, Triatoma nitida and Triatoma sanguisuga, covering North America, Central America and northern South America. Such a nuclear rDNA pseudogene is very rare. In the 5.8S gene, criteria for pseudogene identification included length variability, lower GC content, mutations regarding the functional uniform sequence, and relatively high base substitutions in evolutionary conserved sites. At ITS-2 level, criteria were the shorter sequence and large proportion of insertions and deletions (indels). Pseudogenic 5.8S and ITS-2 secondary structures were different from the functional foldings, different one another, showing less negative values for minimum free energy (mfe) and centroid predictions, and lower fit between mfe, partition function, and centroid structures. A complete characterization indicated a processed pseudogenic unit of the ghost type, escaping from rDNA concerted evolution and with functionality subject to constraints instead of evolving free by neutral drift. Despite a high indel number, low mutation number and an evolutionary rate similar to the functional ITS-2, that pseudogene distinguishes different taxa and furnishes coherent phylogenetic topologies with resolution similar to the functional ITS-2. The discovery of a pseudogene in many phylogenetically related species is unique in animals and allowed for an estimation of its palaeobiogeographical origin based on molecular clock data, inheritance pathways, evolutionary rate and pattern, and geographical spread. Additional to the technical risk to be considered henceforth, this relict pseudogene, designated as "ps(5.8S+ITS-2)", proves to be a valuable marker for specimen classification, phylogenetic analyses, and systematic/taxonomic studies. It opens a new research field, Chagas disease epidemiology and control included, given its potential relationships with triatomine fitness, behaviour and adaptability. Copyright © 2013 Elsevier B.V. All rights reserved.
Jain, Aditi; Anand, Saurabh; Singh, Neer K; Das, Sandip
2018-03-12
The impact of polyploidy on functional diversification of cis-regulatory elements is poorly understood. This is primarily on account of lack of well-defined structure of cis-elements and a universal regulatory code. To the best of our knowledge, this is the first report on characterization of sequence and functional diversification of paralogous and homeologous promoter elements associated with MIR164 from Brassica. The availability of whole genome sequence allowed us to identify and isolate a total of 42 homologous copies of MIR164 from diploid species-Brassica rapa (A-genome), Brassica nigra (B-genome), Brassica oleracea (C-genome), and allopolyploids-Brassica juncea (AB-genome), Brassica carinata (BC-genome) and Brassica napus (AC-genome). Additionally, we retrieved homologous sequences based on comparative genomics from Arabidopsis lyrata, Capsella rubella, and Thellungiella halophila, spanning ca. 45 million years of evolutionary history of Brassicaceae. Sequence comparison across Brassicaceae revealed lineage-, karyotype, species-, and sub-genome specific changes providing a snapshot of evolutionary dynamics of miRNA promoters in polyploids. Tree topology of cis-elements associated with MIR164 was found to re-capitulate the species and family evolutionary history. Phylogenetic shadowing identified transcription factor binding sites (TFBS) conserved across Brassicaceae, of which, some are already known as regulators of MIR164 expression. Some of the TFBS were found to be distributed in a sub-genome specific (e.g., SOX specific to promoter of MIR164c from MF2 sub-genome), lineage-specific (YABBY binding motif, specific to C. rubella in MIR164b), or species-specific (e.g., VOZ in A. thaliana MIR164a) manner which might contribute towards genetic and adaptive variation. Reporter activity driven by promoters associated with MIR164 paralogs and homeologs was majorly in agreement with known role of miR164 in leaf shaping, regulation of lateral root development and senescence, and one previously un-described novel role in trichome. The impact of polyploidy was most profound when reporter activity across three MIR164c homeologs were compared that revealed negligible overlap, whereas reporter activity among two homeologs of MIR164a displays significant overlap. A copy number dependent cis-regulatory divergence thus exists in MIR164 genes in Brassica juncea. The full extent of regulatory diversification towards adaptive strategies will only be known when future endeavors analyze the promoter function under duress of stress and hormonal regimes.
SALAD database: a motif-based database of protein annotations for plant comparative genomics
Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi
2010-01-01
Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933
Niskanen, Einari A; Hytönen, Vesa P; Grapputo, Alessandro; Nordlund, Henri R; Kulomaa, Markku S; Laitinen, Olli H
2005-01-01
Background A chicken egg contains several biotin-binding proteins (BBPs), whose complete DNA and amino acid sequences are not known. In order to identify and characterise these genes and proteins we studied chicken cDNAs and genes available in the NCBI database and chicken genome database using the reported N-terminal amino acid sequences of chicken egg-yolk BBPs as search strings. Results Two separate hits showing significant homology for these N-terminal sequences were discovered. For one of these hits, the chromosomal location in the immediate proximity of the avidin gene family was found. Both of these hits encode proteins having high sequence similarity with avidin suggesting that chicken BBPs are paralogous to avidin family. In particular, almost all residues corresponding to biotin binding in avidin are conserved in these putative BBP proteins. One of the found DNA sequences, however, seems to encode a carboxy-terminal extension not present in avidin. Conclusion We describe here the predicted properties of the putative BBP genes and proteins. Our present observations link BBP genes together with avidin gene family and shed more light on the genetic arrangement and variability of this family. In addition, comparative modelling revealed the potential structural elements important for the functional and structural properties of the putative BBP proteins. PMID:15777476
Blazier, J. Chris; Ruhlman, Tracey A.; Weng, Mao-Lun; Rehman, Sumaiyah K.; Sabir, Jamal S. M.; Jansen, Robert K.
2016-01-01
Genes for the plastid-encoded RNA polymerase (PEP) persist in the plastid genomes of all photosynthetic angiosperms. However, three unrelated lineages (Annonaceae, Passifloraceae and Geraniaceae) have been identified with unusually divergent open reading frames (ORFs) in the conserved region of rpoA, the gene encoding the PEP α subunit. We used sequence-based approaches to evaluate whether these genes retain function. Both gene sequences and complete plastid genome sequences were assembled and analyzed from each of the three angiosperm families. Multiple lines of evidence indicated that the rpoA sequences are likely functional despite retaining as low as 30% nucleotide sequence identity with rpoA genes from outgroups in the same angiosperm order. The ratio of non-synonymous to synonymous substitutions indicated that these genes are under purifying selection, and bioinformatic prediction of conserved domains indicated that functional domains are preserved. One of the lineages (Pelargonium, Geraniaceae) contains species with multiple rpoA-like ORFs that show evidence of ongoing inter-paralog gene conversion. The plastid genomes containing these divergent rpoA genes have experienced extensive structural rearrangement, including large expansions of the inverted repeat. We propose that illegitimate recombination, not positive selection, has driven the divergence of rpoA. PMID:27087667
SALAD database: a motif-based database of protein annotations for plant comparative genomics.
Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi
2010-01-01
Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, K; Doyle, C Kuyler; Lykidis, A
2006-01-01
Ehrlichia canis, a small obligately intracellular, tick-transmitted, gram-negative, {alpha}-proteobacterium, is the primary etiologic agent of globally distributed canine monocytic ehrlichiosis. Complete genome sequencing revealed that the E. canis genome consists of a single circular chromosome of 1,315,030 bp predicted to encode 925 proteins, 40 stable RNA species, 17 putative pseudogenes, and a substantial proportion of noncoding sequence (27%). Interesting genome features include a large set of proteins with transmembrane helices and/or signal sequences and a unique serine-threonine bias associated with the potential for O glycosylation that was prominent in proteins associated with pathogen-host interactions. Furthermore, two paralogous protein families associatedmore » with immune evasion were identified, one of which contains poly(G-C) tracts, suggesting that they may play a role in phase variation and facilitation of persistent infections. Genes associated with pathogen-host interactions were identified, including a small group encoding proteins (n = 12) with tandem repeats and another group encoding proteins with eukaryote-like ankyrin domains (n = 7).« less
Developmentally distinct MYB genes encode functionally equivalent proteins in Arabidopsis.
Lee, M M; Schiefelbein, J
2001-05-01
The duplication and divergence of developmental control genes is thought to have driven morphological diversification during the evolution of multicellular organisms. To examine the molecular basis of this process, we analyzed the functional relationship between two paralogous MYB transcription factor genes, WEREWOLF (WER) and GLABROUS1 (GL1), in Arabidopsis. The WER and GL1 genes specify distinct cell types and exhibit non-overlapping expression patterns during Arabidopsis development. Nevertheless, reciprocal complementation experiments with a series of gene fusions showed that WER and GL1 encode functionally equivalent proteins, and their unique roles in plant development are entirely due to differences in their cis-regulatory sequences. Similar experiments with a distantly related MYB gene (MYB2) showed that its product cannot functionally substitute for WER or GL1. Furthermore, an analysis of the WER and GL1 proteins shows that conserved sequences correspond to specific functional domains. These results provide new insights into the evolution of the MYB gene family in Arabidopsis, and, more generally, they demonstrate that novel developmental gene function may arise solely by the modification of cis-regulatory sequences.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, K.; Kuyler Doyle, C.; Lykidis, A.
2005-09-01
Ehrlichia canis, a small obligately intracellular, tick-transmitted, gram-negative, a-proteobacterium is the primary etiologic agent of globally distributed canine monocytic ehrlichiosis. Complete genome sequencing revealed that the E. canis genome consists of a single circular chromosome of 1,315,030 bp predicted to encode 925 proteins, 40 stable RNA species, and 17 putative pseudogenes, and a substantial proportion of non-coding sequence (27 percent). Interesting genome features include a large set of proteins with transmembrane helices and/or signal sequences, and a unique serine-threonine bias associated with the potential for O-glycosylation that was prominent in proteins associated with pathogen-host interactions. Furthermore, two paralogous protein familiesmore » associated with immune evasion were identified, one of which contains poly G:C tracts, suggesting that they may play a role in phase variation and facilitation of persistent infections. Proteins associated with pathogen-host interactions were identified including a small group of proteins (12) with tandem repeats and another with eukaryotic-like ankyrin domains (7).« less
Molecular evolution of cyclin proteins in animals and fungi
2011-01-01
Background The passage through the cell cycle is controlled by complexes of cyclins, the regulatory units, with cyclin-dependent kinases, the catalytic units. It is also known that cyclins form several families, which differ considerably in primary structure from one eukaryotic organism to another. Despite these lines of evidence, the relationship between the evolution of cyclins and their function is an open issue. Here we present the results of our study on the molecular evolution of A-, B-, D-, E-type cyclin proteins in animals and fungi. Results We constructed phylogenetic trees for these proteins, their ancestral sequences and analyzed patterns of amino acid replacements. The analysis of infrequently fixed atypical amino acid replacements in cyclins evidenced that accelerated evolution proceeded predominantly during paralog duplication or after it in animals and fungi and that it was related to aromorphic changes in animals. It was shown also that evolutionary flexibility of cyclin function may be provided by consequential reorganization of regions on protein surface remote from CDK binding sites in animal and fungal cyclins and by functional differentiation of paralogous cyclins formed in animal evolution. Conclusions The results suggested that changes in the number and/or nature of cyclin-binding proteins may underlie the evolutionary role of the alterations in the molecular structure of cyclins and their involvement in diverse molecular-genetic events. PMID:21798004
Expansion by whole genome duplication and evolution of the sox gene family in teleost fish
Naville, Magali; Volff, Jean-Nicolas
2017-01-01
It is now recognized that several rounds of whole genome duplication (WGD) have occurred during the evolution of vertebrates, but the link between WGDs and phenotypic diversification remains unsolved. We have investigated in this study the impact of the teleost-specific WGD on the evolution of the sox gene family in teleostean fishes. The sox gene family, which encodes for transcription factors, has essential role in morphology, physiology and behavior of vertebrates and teleosts, the current largest group of vertebrates. We have first redrawn the evolution of all sox genes identified in eleven teleost genomes using a comparative genomic approach including phylogenetic and synteny analyses. We noticed, compared to tetrapods, an important expansion of the sox family: 58% (11/19) of sox genes are duplicated in teleost genomes. Furthermore, all duplicated sox genes, except sox17 paralogs, are derived from the teleost-specific WGD. Then, focusing on five sox genes, analyzing the evolution of coding and non-coding sequences, as well as the expression patterns in fish embryos and adult tissues, we demonstrated that these paralogs followed lineage-specific evolutionary trajectories in teleost genomes. This work, based on whole genome data from multiple teleostean species, supports the contribution of WGDs to the expansion of gene families, as well as to the emergence of genomic differences between lineages that might promote genetic and phenotypic diversity in teleosts. PMID:28738066
NUCKS1 is a novel RAD51AP1 paralog important for homologous recombination and genome stability
Parplys, Ann C.; Zhao, Weixing; Sharma, Neelam; ...
2015-08-31
NUCKS1 (nuclear casein kinase and cyclin-dependent kinase substrate 1) is a 27 kD chromosomal, vertebrate-specific protein, for which limited functional data exist. Here, we demonstrate that NUCKS1 shares extensive sequence homology with RAD51AP1 (RAD51 associated protein 1), suggesting that these two proteins are paralogs. Similar to the phenotypic effects of RAD51AP1 knockdown, we find that depletion of NUCKS1 in human cells impairs DNA repair by homologous recombination (HR) and chromosome stability. Depletion of NUCKS1 also results in greatly increased cellular sensitivity to mitomycin C (MMC), and in increased levels of spontaneous and MMC-induced chromatid breaks. NUCKS1 is critical to maintainingmore » wild type HR capacity, and, as observed for a number of proteins involved in the HR pathway, functional loss of NUCKS1 leads to a slow down in DNA replication fork progression with a concomitant increase in the utilization of new replication origins. Interestingly, recombinant NUCKS1 shares the same DNA binding preference as RAD51AP1, but binds to DNA with reduced affinity when compared to RAD51AP1. Finally, our results show that NUCKS1 is a chromatin-associated protein with a role in the DNA damage response and in HR, a DNA repair pathway critical for tumor suppression.« less
Fang, Chong; Nagy-Staroń, Anna; Grafe, Martin; Heermann, Ralf; Jung, Kirsten; Gebhard, Susanne; Mascher, Thorsten
2017-04-01
BceRS and PsdRS are paralogous two-component systems in Bacillus subtilis controlling the response to antimicrobial peptides. In the presence of extracellular bacitracin and nisin, respectively, the two response regulators (RRs) bind their target promoters, P bceA or P psdA , resulting in a strong up-regulation of target gene expression and ultimately antibiotic resistance. Despite high sequence similarity between the RRs BceR and PsdR and their known binding sites, no cross-regulation has been observed between them. We therefore investigated the specificity determinants of P bceA and P psdA that ensure the insulation of these two paralogous pathways at the RR-promoter interface. In vivo and in vitro analyses demonstrate that the regulatory regions within these two promoters contain three important elements: in addition to the known (main) binding site, we identified a linker region and a secondary binding site that are crucial for functionality. Initial binding to the high-affinity, low-specificity main binding site is a prerequisite for the subsequent highly specific binding of a second RR dimer to the low-affinity secondary binding site. In addition to this hierarchical cooperative binding, discrimination requires a competition of the two RRs for their respective binding site mediated by only slight differences in binding affinities. © 2016 John Wiley & Sons Ltd.
Functional Evolution in Orthologous Cell-encoded RNA-dependent RNA Polymerases.
Qian, Xinlei; Hamid, Fursham M; El Sahili, Abbas; Darwis, Dina Amallia; Wong, Yee Hwa; Bhushan, Shashi; Makeyev, Eugene V; Lescar, Julien
2016-04-22
Many eukaryotic organisms encode more than one RNA-dependent RNA polymerase (RdRP) that probably emerged as a result of gene duplication. Such RdRP paralogs often participate in distinct RNA silencing pathways and show characteristic repertoires of enzymatic activities in vitro However, to what extent members of individual paralogous groups can undergo functional changes during speciation remains an open question. We show that orthologs of QDE-1, an RdRP component of the quelling pathway in Neurospora crassa, have rapidly diverged in evolution at the amino acid sequence level. Analyses of purified QDE-1 polymerases from N. crassa (QDE-1(Ncr)) and related fungi, Thielavia terrestris (QDE-1(Tte)) and Myceliophthora thermophila (QDE-1(Mth)), show that all three enzymes can synthesize RNA, but the precise modes of their action differ considerably. Unlike their QDE-1(Ncr) counterpart favoring processive RNA synthesis, QDE-1(Tte) and QDE-1(Mth) produce predominantly short RNA copies via primer-independent initiation. Surprisingly, a 3.19 Å resolution crystal structure of QDE-1(Tte) reveals a quasisymmetric dimer similar to QDE-1(Ncr) Further electron microscopy analyses confirm that QDE-1(Tte) occurs as a dimer in solution and retains this status upon interaction with a template. We conclude that divergence of orthologous RdRPs can result in functional innovation while retaining overall protein fold and quaternary structure. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Functional Evolution in Orthologous Cell-encoded RNA-dependent RNA Polymerases*
Qian, Xinlei; Hamid, Fursham M.; El Sahili, Abbas; Darwis, Dina Amallia; Wong, Yee Hwa; Bhushan, Shashi; Makeyev, Eugene V.; Lescar, Julien
2016-01-01
Many eukaryotic organisms encode more than one RNA-dependent RNA polymerase (RdRP) that probably emerged as a result of gene duplication. Such RdRP paralogs often participate in distinct RNA silencing pathways and show characteristic repertoires of enzymatic activities in vitro. However, to what extent members of individual paralogous groups can undergo functional changes during speciation remains an open question. We show that orthologs of QDE-1, an RdRP component of the quelling pathway in Neurospora crassa, have rapidly diverged in evolution at the amino acid sequence level. Analyses of purified QDE-1 polymerases from N. crassa (QDE-1Ncr) and related fungi, Thielavia terrestris (QDE-1Tte) and Myceliophthora thermophila (QDE-1Mth), show that all three enzymes can synthesize RNA, but the precise modes of their action differ considerably. Unlike their QDE-1Ncr counterpart favoring processive RNA synthesis, QDE-1Tte and QDE-1Mth produce predominantly short RNA copies via primer-independent initiation. Surprisingly, a 3.19 Å resolution crystal structure of QDE-1Tte reveals a quasisymmetric dimer similar to QDE-1Ncr. Further electron microscopy analyses confirm that QDE-1Tte occurs as a dimer in solution and retains this status upon interaction with a template. We conclude that divergence of orthologous RdRPs can result in functional innovation while retaining overall protein fold and quaternary structure. PMID:26907693
Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.
2015-01-01
This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030
Quality Control Test for Sequence-Phenotype Assignments
Ortiz, Maria Teresa Lara; Rosario, Pablo Benjamín Leon; Luna-Nevarez, Pablo; Gamez, Alba Savin; Martínez-del Campo, Ana; Del Rio, Gabriel
2015-01-01
Relating a gene mutation to a phenotype is a common task in different disciplines such as protein biochemistry. In this endeavour, it is common to find false relationships arising from mutations introduced by cells that may be depurated using a phenotypic assay; yet, such phenotypic assays may introduce additional false relationships arising from experimental errors. Here we introduce the use of high-throughput DNA sequencers and statistical analysis aimed to identify incorrect DNA sequence-phenotype assignments and observed that 10–20% of these false assignments are expected in large screenings aimed to identify critical residues for protein function. We further show that this level of incorrect DNA sequence-phenotype assignments may significantly alter our understanding about the structure-function relationship of proteins. We have made available an implementation of our method at http://bis.ifc.unam.mx/en/software/chispas. PMID:25700273
Leaché, Adam D.; Chavez, Andreas S.; Jones, Leonard N.; Grummer, Jared A.; Gottscho, Andrew D.; Linkem, Charles W.
2015-01-01
Sequence capture and restriction site associated DNA sequencing (RADseq) are popular methods for obtaining large numbers of loci for phylogenetic analysis. These methods are typically used to collect data at different evolutionary timescales; sequence capture is primarily used for obtaining conserved loci, whereas RADseq is designed for discovering single nucleotide polymorphisms (SNPs) suitable for population genetic or phylogeographic analyses. Phylogenetic questions that span both “recent” and “deep” timescales could benefit from either type of data, but studies that directly compare the two approaches are lacking. We compared phylogenies estimated from sequence capture and double digest RADseq (ddRADseq) data for North American phrynosomatid lizards, a species-rich and diverse group containing nine genera that began diversifying approximately 55 Ma. Sequence capture resulted in 584 loci that provided a consistent and strong phylogeny using concatenation and species tree inference. However, the phylogeny estimated from the ddRADseq data was sensitive to the bioinformatics steps used for determining homology, detecting paralogs, and filtering missing data. The topological conflicts among the SNP trees were not restricted to any particular timescale, but instead were associated with short internal branches. Species tree analysis of the largest SNP assembly, which also included the most missing data, supported a topology that matched the sequence capture tree. This preferred phylogeny provides strong support for the paraphyly of the earless lizard genera Holbrookia and Cophosaurus, suggesting that the earless morphology either evolved twice or evolved once and was subsequently lost in Callisaurus. PMID:25663487
King, Jay D; Leprince, Jérôme; Vaudry, Hubert; Coquet, Laurent; Jouenne, Thierry; Conlon, J Michael
2008-08-01
Peptidomic analysis of norepinephrine-stimulated skin secretions from the Caribbean frog Leptodactylus validus Garman, 1888 led to the identification of three peptides with previously undescribed sequences that were structurally similar to those of antimicrobial peptides isolated from other species of leptodactylid frogs. These paralogs have been termed ocellatin-V1 (GVVDILKGAGKDLLAHALSKLSEKV.NH(2)), ocellatin-V2 (GVLDILKGAGKDLLAHALSKISEKV.NH(2)), and ocellatin-V3 (GVLDILTGAGKDLLAHALSKLSEKV.NH(2)). The very low antimicrobial potency (MIC>200microM) against Escherichia coli and Staphylococcus aureus associated with the peptides is probably a consequence of their lack of amphipathicity and reduced cationicity compared with active members of the ocellatin family from related species.
Ramirez, Agnese; Crisafulli, Sebastiano G.; Rizzuti, Mafalda; Bresolin, Nereo; Comi, Giacomo P.; Corti, Stefania
2018-01-01
Spinal muscular atrophy (SMA) is an autosomal-recessive childhood motor neuron disease and the main genetic cause of infant mortality. SMA is caused by deletions or mutations in the survival motor neuron 1 (SMN1) gene, which results in SMN protein deficiency. Only one approved drug has recently become available and allows for the correction of aberrant splicing of the paralogous SMN2 gene by antisense oligonucleotides (ASOs), leading to production of full-length SMN protein. We have already demonstrated that a sequence of an ASO variant, Morpholino (MO), is particularly suitable because of its safety and efficacy profile and is both able to increase SMN levels and rescue the murine SMA phenotype. Here, we optimized this strategy by testing the efficacy of four new MO sequences targeting SMN2. Two out of the four new MO sequences showed better efficacy in terms of SMN protein production both in SMA induced pluripotent stem cells (iPSCs) and SMAΔ7 mice. Further, the effect was enhanced when different MO sequences were administered in combination. Our data provide an important insight for MO-based treatment for SMA. Optimization of the target sequence and validation of a treatment based on a combination of different MO sequences could support further pre-clinical studies and the progression toward future clinical trials. PMID:29316633
Ramirez, Agnese; Crisafulli, Sebastiano G; Rizzuti, Mafalda; Bresolin, Nereo; Comi, Giacomo P; Corti, Stefania; Nizzardo, Monica
2018-01-06
Spinal muscular atrophy (SMA) is an autosomal-recessive childhood motor neuron disease and the main genetic cause of infant mortality. SMA is caused by deletions or mutations in the survival motor neuron 1 ( SMN1 ) gene, which results in SMN protein deficiency. Only one approved drug has recently become available and allows for the correction of aberrant splicing of the paralogous SMN2 gene by antisense oligonucleotides (ASOs), leading to production of full-length SMN protein. We have already demonstrated that a sequence of an ASO variant, Morpholino (MO), is particularly suitable because of its safety and efficacy profile and is both able to increase SMN levels and rescue the murine SMA phenotype. Here, we optimized this strategy by testing the efficacy of four new MO sequences targeting SMN2 . Two out of the four new MO sequences showed better efficacy in terms of SMN protein production both in SMA induced pluripotent stem cells (iPSCs) and SMAΔ7 mice. Further, the effect was enhanced when different MO sequences were administered in combination. Our data provide an important insight for MO-based treatment for SMA. Optimization of the target sequence and validation of a treatment based on a combination of different MO sequences could support further pre-clinical studies and the progression toward future clinical trials.
Interactions involving the Rad51 paralogs Rad51C and XRCC3 in human cells
NASA Technical Reports Server (NTRS)
Wiese, Claudia; Collins, David W.; Albala, Joanna S.; Thompson, Larry H.; Kronenberg, Amy; Schild, David; Chatterjee, A. (Principal Investigator)
2002-01-01
Homologous recombinational repair of DNA double-strand breaks and crosslinks in human cells is likely to require Rad51 and the five Rad51 paralogs (XRCC2, XRCC3, Rad51B/Rad51L1, Rad51C/Rad51L2 and Rad51D/Rad51L3), as has been shown in chicken and rodent cells. Previously, we reported on the interactions among these proteins using baculovirus and two- and three-hybrid yeast systems. To test for interactions involving XRCC3 and Rad51C, stable human cell lines have been isolated that express (His)6-tagged versions of XRCC3 or Rad51C. Ni2+-binding experiments demonstrate that XRCC3 and Rad51C interact in human cells. In addition, we find that Rad51C, but not XRCC3, interacts directly or indirectly with Rad51B, Rad51D and XRCC2. These results argue that there are at least two complexes of Rad51 paralogs in human cells (Rad51C-XRCC3 and Rad51B-Rad51C-Rad51D-XRCC2), both containing Rad51C. Moreover, Rad51 is not found in these complexes. X-ray treatment did not alter either the level of any Rad51 paralog or the observed interactions between paralogs. However, the endogenous level of Rad51C is moderately elevated in the XRCC3-overexpressing cell line, suggesting that dimerization between these proteins might help stabilize Rad51C.
Solis-Escalante, Daniel; Kuijpers, Niels G. A.; Barrajon-Simancas, Nuria; van den Broek, Marcel; Pronk, Jack T.
2015-01-01
As a result of ancestral whole-genome and small-scale duplication events, the genomes of Saccharomyces cerevisiae and many eukaryotes still contain a substantial fraction of duplicated genes. In all investigated organisms, metabolic pathways, and more particularly glycolysis, are specifically enriched for functionally redundant paralogs. In ancestors of the Saccharomyces lineage, the duplication of glycolytic genes is purported to have played an important role leading to S. cerevisiae's current lifestyle favoring fermentative metabolism even in the presence of oxygen and characterized by a high glycolytic capacity. In modern S. cerevisiae strains, the 12 glycolytic reactions leading to the biochemical conversion from glucose to ethanol are encoded by 27 paralogs. In order to experimentally explore the physiological role of this genetic redundancy, a yeast strain with a minimal set of 14 paralogs was constructed (the “minimal glycolysis” [MG] strain). Remarkably, a combination of a quantitative systems approach and semiquantitative analysis in a wide array of growth environments revealed the absence of a phenotypic response to the cumulative deletion of 13 glycolytic paralogs. This observation indicates that duplication of glycolytic genes is not a prerequisite for achieving the high glycolytic fluxes and fermentative capacities that are characteristic of S. cerevisiae and essential for many of its industrial applications and argues against gene dosage effects as a means of fixing minor glycolytic paralogs in the yeast genome. The MG strain was carefully designed and constructed to provide a robust prototrophic platform for quantitative studies and has been made available to the scientific community. PMID:26071034
Inferring Higher Functional Information for RIKEN Mouse Full-Length cDNA Clones With FACTS
Nagashima, Takeshi; Silva, Diego G.; Petrovsky, Nikolai; Socha, Luis A.; Suzuki, Harukazu; Saito, Rintaro; Kasukawa, Takeya; Kurochkin, Igor V.; Konagaya, Akihiko; Schönbach, Christian
2003-01-01
FACTS (Functional Association/Annotation of cDNA Clones from Text/Sequence Sources) is a semiautomated knowledge discovery and annotation system that integrates molecular function information derived from sequence analysis results (sequence inferred) with functional information extracted from text. Text-inferred information was extracted from keyword-based retrievals of MEDLINE abstracts and by matching of gene or protein names to OMIM, BIND, and DIP database entries. Using FACTS, we found that 47.5% of the 60,770 RIKEN mouse cDNA FANTOM2 clone annotations were informative for text searches. MEDLINE queries yielded molecular interaction-containing sentences for 23.1% of the clones. When disease MeSH and GO terms were matched with retrieved abstracts, 22.7% of clones were associated with potential diseases, and 32.5% with GO identifiers. A significant number (23.5%) of disease MeSH-associated clones were also found to have a hereditary disease association (OMIM Morbidmap). Inferred neoplastic and nervous system disease represented 49.6% and 36.0% of disease MeSH-associated clones, respectively. A comparison of sequence-based GO assignments with informative text-based GO assignments revealed that for 78.2% of clones, identical GO assignments were provided for that clone by either method, whereas for 21.8% of clones, the assignments differed. In contrast, for OMIM assignments, only 28.5% of clones had identical sequence-based and text-based OMIM assignments. Sequence, sentence, and term-based functional associations are included in the FACTS database (http://facts.gsc.riken.go.jp/), which permits results to be annotated and explored through web-accessible keyword and sequence search interfaces. The FACTS database will be a critical tool for investigating the functional complexity of the mouse transcriptome, cDNA-inferred interactome (molecular interactions), and pathome (pathologies). PMID:12819151
[Multiplexing mapping of human cDNAs]. Final report, September 1, 1991--February 28, 1994
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
Using PCR with automated product analysis, 329 human brain cDNA sequences have been assigned to individual human chromosomes. Primers were designed from single-pass cDNA sequences expressed sequence tags (ESTs). Primers were used in PCR reactions with DNA from somatic cell hybrid mapping panels as templates, often with multiplexing. Many ESTs mapped match sequence database records. To evaluate of these matches, the position of the primers relative to the matching region (In), the BLAST scores and the Poisson probability values of the EST/sequence record match were determined. In cases where the gene product was stringently identified by the sequence match hadmore » already been mapped, the gene locus determined by EST was consistent with the previous position which strongly supports the validity of assigning unknown genes to human chromosomes based on the EST sequence matches. In the present cases mapping the ESTs to a chromosome can also be considered to have mapped the known gene product: rolipram-sensitive cAMP phosphodiesterase, chromosome 1; protein phosphatase 2A{beta}, chromosome 4; alpha-catenin, chromosome 5; the ELE1 oncogene, chromosome 10q11.2 or q2.1-q23; MXII protein, chromosome l0q24-qter; ribosomal protein L18a homologue, chromosome 14; ribosomal protein L3, chromosome 17; and moesin, Xp11-cen. There were also ESTs mapped that were closely related to non-human sequence records. These matches therefore can be considered to identify human counterparts of known gene products, or members of known gene families. Examples of these include membrane proteins, translation-associated proteins, structural proteins, and enzymes. These data then demonstrate that single pass sequence information is sufficient to design PCR primers useful for assigning cDNA sequences to human chromosomes. When the EST sequence matches previous sequence database records, the chromosome assignments of the EST can be used to make preliminary assignments of the human gene to a chromosome.« less
Streaming fragment assignment for real-time analysis of sequencing experiments
Roberts, Adam; Pachter, Lior
2013-01-01
We present eXpress, a software package for highly efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time, and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data, showing greater efficiency than other quantification methods. PMID:23160280
Dlugosch, Katrina M.; Lai, Zhao; Bonin, Aurélie; Hierro, José; Rieseberg, Loren H.
2013-01-01
Transcriptome sequences are becoming more broadly available for multiple individuals of the same species, providing opportunities to derive population genomic information from these datasets. Using the 454 Life Science Genome Sequencer FLX and FLX-Titanium next-generation platforms, we generated 11−430 Mbp of sequence for normalized cDNA for 40 wild genotypes of the invasive plant Centaurea solstitialis, yellow starthistle, from across its worldwide distribution. We examined the impact of sequencing effort on transcriptome recovery and overlap among individuals. To do this, we developed two novel publicly available software pipelines: SnoWhite for read cleaning before assembly, and AllelePipe for clustering of loci and allele identification in assembled datasets with or without a reference genome. AllelePipe is designed specifically for cases in which read depth information is not appropriate or available to assist with disentangling closely related paralogs from allelic variation, as in transcriptome or previously assembled libraries. We find that modest applications of sequencing effort recover most of the novel sequences present in the transcriptome of this species, including single-copy loci and a representative distribution of functional groups. In contrast, the coverage of variable sites, observation of heterozygosity, and overlap among different libraries are all highly dependent on sequencing effort. Nevertheless, the information gained from overlapping regions was informative regarding coarse population structure and variation across our small number of population samples, providing the first genetic evidence in support of hypothesized invasion scenarios. PMID:23390612
Hoxa2 and Hoxb2 control dorsoventral patterns of neuronal development in the rostral hindbrain.
Davenne, M; Maconochie, M K; Neun, R; Pattyn, A; Chambon, P; Krumlauf, R; Rijli, F M
1999-04-01
Little is known about how the generation of specific neuronal types at stereotypic positions within the hindbrain is linked to Hox gene-mediated patterning. Here, we show that during neurogenesis, Hox paralog group 2 genes control both anteroposterior (A-P) and dorsoventral (D-V) patterning. Hoxa2 and Hoxb2 differentially regulate, in a rhombomere-specific manner, the expression of several genes in broad D-V-restricted domains or narrower longitudinal columns of neuronal progenitors, immature neurons, and differentiating neuronal subtypes. Moreover, Hoxa2 and Hoxb2 can functionally synergize in controlling the development of ventral neuronal subtypes in rhombomere 3 (r3). Thus, in addition to their roles in A-P patterning, Hoxa2 and Hoxb2 have distinct and restricted functions along the D-V axis during neurogenesis, providing insights into how neuronal fates are assigned at stereotypic positions within the hindbrain.
Torruella, Guifré; Derelle, Romain; Paps, Jordi; Lang, B. Franz; Roger, Andrew J.; Shalchian-Tabrizi, Kamran; Ruiz-Trillo, Iñaki
2012-01-01
Many of the eukaryotic phylogenomic analyses published to date were based on alignments of hundreds to thousands of genes. Frequently, in such analyses, the most realistic evolutionary models currently available are often used to minimize the impact of systematic error. However, controversy remains over whether or not idiosyncratic gene family dynamics (i.e., gene duplications and losses) and incorrect orthology assignments are always appropriately taken into account. In this paper, we present an innovative strategy for overcoming orthology assignment problems. Rather than identifying and eliminating genes with paralogy problems, we have constructed a data set comprised exclusively of conserved single-copy protein domains that, unlike most of the commonly used phylogenomic data sets, should be less confounded by orthology miss-assignments. To evaluate the power of this approach, we performed maximum likelihood and Bayesian analyses to infer the evolutionary relationships within the opisthokonts (which includes Metazoa, Fungi, and related unicellular lineages). We used this approach to test 1) whether Filasterea and Ichthyosporea form a clade, 2) the interrelationships of early-branching metazoans, and 3) the relationships among early-branching fungi. We also assessed the impact of some methods that are known to minimize systematic error, including reducing the distance between the outgroup and ingroup taxa or using the CAT evolutionary model. Overall, our analyses support the Filozoa hypothesis in which Ichthyosporea are the first holozoan lineage to emerge followed by Filasterea, Choanoflagellata, and Metazoa. Blastocladiomycota appears as a lineage separate from Chytridiomycota, although this result is not strongly supported. These results represent independent tests of previous phylogenetic hypotheses, highlighting the importance of sophisticated approaches for orthology assignment in phylogenomic analyses. PMID:21771718
Concerted action of two dlx paralogs in sensory placode formation.
Solomon, Keely S; Fritz, Andreas
2002-07-01
Sensory placodes are ectodermal thickenings that give rise to elements of the vertebrate cranial sensory nervous system, including the inner ear and nose. Although mutations have been described in humans, mice and zebrafish that perturb ear and nose development, no mutation is known to prevent sensory placode formation. Thus, it has been postulated that a functional redundancy exists in the genetic mechanisms that govern sensory placode development. We describe a zebrafish deletion mutation, b380, which results in a lack of both otic and olfactory placodes. The b380 deletion removes several known genes and expressed sequence tags, including dlx3 and dlx7, two transcription factors that share a homoeobox domain similar in sequence to the Drosophila Distal-less gene. dlx3 and dlx7 are expressed in an overlapping pattern in the regions that produce the otic and olfactory placodes in zebrafish. We present evidence suggesting that it is specifically the removal of these two genes that leads to the otic and olfactory phenotype of b380 mutants. Using morpholinos, antisense oligonucleotides that effectively block translation of target genes, we find that functional reduction of both dlx genes contributes to placode loss. Expression patterns of the otic marker pax2.1, olfactory marker anxV and eya1, a marker of both placodes, in morpholino-injected embryos recapitulate the reduced expression of these genes seen in b380 mutants. We also examine expression of dlx3 and dlx7 in the morpholino-injected embryos and present evidence for existence of auto- and cross-regulatory control of expression among these genes. We demonstrate that dlx3 is necessary and sufficient for proper otic and olfactory placode development. However, our results indicate that dlx3 and dlx7 act in concert and their importance in placode formation is only revealed by inactivating both paralogs.
Tharuka, M D Neranjan; Bathige, S D N K; Lee, Jehee
2017-12-01
Glutathione S-transferases (GSTs, EC 2.5.1.18) are important Phase II detoxifying enzymes that catalyze hydrophobic, electrophilic xenobiotic substance with the conjugation of reduced glutathione (GSH). In this study, GSTμ and GSTρ paralogs of GST in the big belly seahorse (Hippocampus abdominalis; HaGSTρ, HaGSTμ) were biochemically, molecularly, functionally characterized to determine their detoxification range and protective capacities upon different pathogenic stresses. HaGSTρ and HaGSTμ are composed of coding sequences of 681bp and 654bp, which encode proteins 225 and 217 amino acids, with predicted molecular masses of 26.06kDa and 25.74kDa respectively. Sequence analysis revealed that both HaGSTs comprise the characteristic GSH-binding site in the thioredoxin-like N-terminal domain and substrate binding site in the C-terminal domain. The recombinant HaGSTρ and HaGSTμ proteins catalyzed the model GST substrate 1-chloro-2, 4-dinitrobenzene (CDNB). Enzyme kinetic analysis revealed different K m and V max values for each rHaGST, suggesting that they have different conjugation rates. The optimum conditions (pH, temperature) and inhibitory assays of each protein demonstrated different optimal ranges. However, HaGSTμ was highly expressed in the ovary and gill, whereas HaGSTρ was highly expressed in the gill and pouch. mRNA expression of HaGSTρ and HaGSTμ was significantly elevated upon lipopolysaccharide, Poly (I:C), and Edwardsiella tarda challenges in liver and in blood cells as well as with Streptococcus iniae challenge in blood cells. From these collective experimental results, we propose that HaGSTρ and HaGSTμ are effective in detoxifying xenobiotic toxic agents, and importantly, their mRNA expression could be stimulated by immunological stress signals in the aquatic environment. Copyright © 2017 Elsevier Inc. All rights reserved.
Convergence of the transcriptional responses to heat shock and singlet oxygen stresses.
Dufour, Yann S; Imam, Saheed; Koo, Byoung-Mo; Green, Heather A; Donohue, Timothy J
2012-09-01
Cells often mount transcriptional responses and activate specific sets of genes in response to stress-inducing signals such as heat or reactive oxygen species. Transcription factors in the RpoH family of bacterial alternative σ factors usually control gene expression during a heat shock response. Interestingly, several α-proteobacteria possess two or more paralogs of RpoH, suggesting some functional distinction. We investigated the target promoters of Rhodobacter sphaeroides RpoH(I) and RpoH(II) using genome-scale data derived from gene expression profiling and the direct interactions of each protein with DNA in vivo. We found that the RpoH(I) and RpoH(II) regulons have both distinct and overlapping gene sets. We predicted DNA sequence elements that dictate promoter recognition specificity by each RpoH paralog. We found that several bases in the highly conserved TTG in the -35 element are important for activity with both RpoH homologs; that the T-9 position, which is over-represented in the RpoH(I) promoter sequence logo, is critical for RpoH(I)-dependent transcription; and that several bases in the predicted -10 element were important for activity with either RpoH(II) or both RpoH homologs. Genes that are transcribed by both RpoH(I) and RpoH(II) are predicted to encode for functions involved in general cell maintenance. The functions specific to the RpoH(I) regulon are associated with a classic heat shock response, while those specific to RpoH(II) are associated with the response to the reactive oxygen species, singlet oxygen. We propose that a gene duplication event followed by changes in promoter recognition by RpoH(I) and RpoH(II) allowed convergence of the transcriptional responses to heat and singlet oxygen stress in R. sphaeroides and possibly other bacteria.
Zhao, Chuan-Li; Hsu, Hua-Feng
2014-01-01
This paper considers single machine scheduling and due date assignment with setup time. The setup time is proportional to the length of the already processed jobs; that is, the setup time is past-sequence-dependent (p-s-d). It is assumed that a job's processing time depends on its position in a sequence. The objective functions include total earliness, the weighted number of tardy jobs, and the cost of due date assignment. We analyze these problems with two different due date assignment methods. We first consider the model with job-dependent position effects. For each case, by converting the problem to a series of assignment problems, we proved that the problems can be solved in O(n 4) time. For the model with job-independent position effects, we proved that the problems can be solved in O(n 3) time by providing a dynamic programming algorithm. PMID:25258727
Zhao, Chuan-Li; Hsu, Chou-Jung; Hsu, Hua-Feng
2014-01-01
This paper considers single machine scheduling and due date assignment with setup time. The setup time is proportional to the length of the already processed jobs; that is, the setup time is past-sequence-dependent (p-s-d). It is assumed that a job's processing time depends on its position in a sequence. The objective functions include total earliness, the weighted number of tardy jobs, and the cost of due date assignment. We analyze these problems with two different due date assignment methods. We first consider the model with job-dependent position effects. For each case, by converting the problem to a series of assignment problems, we proved that the problems can be solved in O(n(4)) time. For the model with job-independent position effects, we proved that the problems can be solved in O(n(3)) time by providing a dynamic programming algorithm.
Dauchot, Nicolas; Raulier, Pierre; Maudoux, Olivier; Notté, Christine; Draye, Xavier; Van Cutsem, Pierre
2015-01-01
Key Message: The loss of mini-exon 2 in the 1-FEH IIb glycosyl-hydrolase results in a putative non-functional allele. This loss of function has a strong impact on the susceptibility to post-harvest inulin depolymerization. Significant variation of copy number was identified in its close paralog 1-FEH IIa, but no quantitative effect of copy number on carbohydrates-related phenotypes was detected. Inulin polyfructan is the second most abundant storage carbohydrate in flowering plants. After harvest, it is depolymerized by fructan exohydrolases (FEHs) as an adaptive response to end-season cold temperatures. In chicory, the intensity of this depolymerization differs between cultivars but also between individuals within a cultivar. Regarding this phenotypic variability, we recently identified statistically significant associations between inulin degradation and genetic polymorphisms located in three FEHs. We present here new results of a systematic analysis of copy number variation (CNV) in five key members of the chicory (Cichorium intybus) GH32 multigenic family, including three FEH genes and the two inulin biosynthesis genes: 1-SST and 1-FFT. qPCR analysis identified a significant variability of relative copy number only in the 1-FEH IIa gene. However, this CNV had no quantitative effect. Instead, cloning of the full length gDNA of a close paralogous sequence (1-FEH IIb) identified a 1028 bp deletion in lines less susceptible to post-harvest inulin depolymerization. This region comprises a 9 bp mini-exon containing one of the three conserved residues of the active site. This results in a putative non-functional 1-FEH IIb allele and an observed lower inulin depolymerization. Extensive genotyping confirmed that the loss of mini-exon 2 in 1-FEH IIb and the previously identified 47 bp duplication located in the 3′UTR of 1-FEH IIa belong to a single haplotype, both being statistically associated with reduced susceptibility to post-harvest inulin depolymerization. Emergence of these haplotypes is discussed. PMID:26157446
Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske
2007-02-14
The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.
Counts, Jenna T; Hester, Tasha M; Rouhana, Labib
2017-12-01
Chaperonin-containing Tail-less complex polypeptide 1 (CCT) is a highly conserved, hetero-oligomeric complex that ensures proper folding of actin, tubulin, and regulators of mitosis. Eight subunits (CCT1-8) make up this complex, and every subunit has a homolog expressed in the testes and somatic tissue of the planarian flatworm Schmidtea mediterranea. Gene duplications of four subunits in the genomes of S. mediterranea and other planarian flatworms created paralogs to CCT1, CCT3, CCT4, and CCT8 that are expressed exclusively in the testes. Functional analyses revealed that each CCT subunit expressed in the S. mediterranea soma is essential for homeostatic integrity and survival, whereas sperm elongation defects were observed upon knockdown of each individual testis-specific paralog (Smed-cct1B; Smed-cct3B; Smed-cct4A; and Smed-cct8B), regardless of potential redundancy with paralogs expressed in both testes and soma (Smed-cct1A; Smed-cct3A; Smed-cct4B; and Smed-cct8A). Yet, no detriment was observed in the number of adult somatic stem cells (neoblasts) that maintain differentiated tissue in planarians. Thus, expression of all eight CCT subunits is required to execute the essential functions of the CCT complex. Furthermore, expression of the somatic paralogs in planarian testes is not sufficient to complete spermatogenesis when testis-specific paralogs are knocked down, suggesting that the evolution of chaperonin subunits may drive changes in the development of sperm structure and that correct CCT subunit stoichiometry is crucial for spermiogenesis. © 2017 Wiley Periodicals, Inc.
Junqueira-de-Azevedo, Inácio L.M.; Bastos, Carolina Mancini Val; Ho, Paulo Lee; Luna, Milene Schmidt; Yamanouye, Norma; Casewell, Nicholas R.
2015-01-01
Attempts to reconstruct the evolutionary history of snake toxins in the context of their co-option to the venom gland rarely account for nonvenom snake genes that are paralogous to toxins, and which therefore represent important connectors to ancestral genes. In order to reevaluate this process, we conducted a comparative transcriptomic survey on body tissues from a venomous snake. A nonredundant set of 33,000 unigenes (assembled transcripts of reference genes) was independently assembled from six organs of the medically important viperid snake Bothrops jararaca, providing a reference list of 82 full-length toxins from the venom gland and specific products from other tissues, such as pancreatic digestive enzymes. Unigenes were then screened for nontoxin transcripts paralogous to toxins revealing 1) low level coexpression of approximately 20% of toxin genes (e.g., bradykinin-potentiating peptide, C-type lectin, snake venom metalloproteinase, snake venom nerve growth factor) in body tissues, 2) the identity of the closest paralogs to toxin genes in eight classes of toxins, 3) the location and level of paralog expression, indicating that, in general, co-expression occurs in a higher number of tissues and at lower levels than observed for toxin genes, and 4) strong evidence of a toxin gene reverting back to selective expression in a body tissue. In addition, our differential gene expression analyses identify specific cellular processes that make the venom gland a highly specialized secretory tissue. Our results demonstrate that the evolution and production of venom in snakes is a complex process that can only be understood in the context of comparative data from other snake tissues, including the identification of genes paralogous to venom toxins. PMID:25502939
Whole genome sequencing revealed host adaptation-focused genomic plasticity of pathogenic Leptospira
Xu, Yinghua; Zhu, Yongzhang; Wang, Yuezhu; Chang, Yung-Fu; Zhang, Ying; Jiang, Xiugao; Zhuang, Xuran; Zhu, Yongqiang; Zhang, Jinlong; Zeng, Lingbing; Yang, Minjun; Li, Shijun; Wang, Shengyue; Ye, Qiang; Xin, Xiaofang; Zhao, Guoping; Zheng, Huajun; Guo, Xiaokui; Wang, Junzhi
2016-01-01
Leptospirosis, caused by pathogenic Leptospira spp., has recently been recognized as an emerging infectious disease worldwide. Despite its severity and global importance, knowledge about the molecular pathogenesis and virulence evolution of Leptospira spp. remains limited. Here we sequenced and analyzed 102 isolates representing global sources. A high genomic variability were observed among different Leptospira species, which was attributed to massive gene gain and loss events allowing for adaptation to specific niche conditions and changing host environments. Horizontal gene transfer and gene duplication allowed the stepwise acquisition of virulence factors in pathogenic Leptospira evolved from a recent common ancestor. More importantly, the abundant expansion of specific virulence-related protein families, such as metalloproteases-associated paralogs, were exclusively identified in pathogenic species, reflecting the importance of these protein families in the pathogenesis of leptospirosis. Our observations also indicated that positive selection played a crucial role on this bacteria adaptation to hosts. These novel findings may lead to greater understanding of the global diversity and virulence evolution of Leptospira spp. PMID:26833181
The changing of the guard: Molecular diversity and rapid evolution of beta-defensins.
Semple, Colin A; Gautier, Phillipe; Taylor, Karen; Dorin, Julia R
2006-11-01
Defensins are small cationic peptides involved in innate immunity and are components of the first line of defence against invading pathogens. beta-defensins are a subgroup of the defensin family that display a particular cysteine spacing and pattern of intramolecular bonding. These molecules are produced mostly by epithelia lining exposed surfaces and appear to have both antimicrobial and cell signalling functions. The unusually high degree of sequence variation in the mature peptide produced by the paralogous and in some cases orthologous genes implies extensive specialisation and species specific adaptation. Here we review recent functional data that are an important addition to our knowledge of the innate immune response and novel antibiotic design. We also consider the organisation and evolution of the genomic loci harbouring these genes where radical and rapid changes in beta-defensin sequences have been shown to result from the interplay of both positive and negative selection. Consequently these genes provide some unusually clear glimpses of the processes of duplication and specialisation that have shaped the mammalian genome.
Unusual Gene Order and Organization of the Sea Urchin Hox Cluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cameron, R A; Rowen, L; Nesbitt, R
2005-10-11
The highly consistent gene order and axial colinear expression patterns found in vertebrate hox gene clusters are less well conserved across the rest of bilaterians. We report the first deuterostome instance of an intact hox cluster with a unique gene order where the paralog groups are not expressed in a sequential manner. The finished sequence from BAC clones from the genome of the sea urchin, Strongylocentrotus purpuratus, reveals a gene order wherein the anterior genes (Hox1, Hox2 and Hox3) lie nearest the posterior genes in the cluster such that the most 3 gene is Hox5. (The gene order is :more » 5-Hox1, 2, 3, 11/13c, 11/13b, 11/13a, 9/10, 8, 7, 6, 5 - 3). The finished sequence result is corroborated by restriction mapping evidence and BAC-end scaffold analyses. Comparisons with a putative ancestral deuterostome Hox gene cluster suggest that the rearrangements leading to the sea urchin gene order were many and complex.« less
Unusual Gene Order and Organization of the Sea Urchin HoxCluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
Richardson, Paul M.; Lucas, Susan; Cameron, R. Andrew
2005-05-10
The highly consistent gene order and axial colinear expression patterns found in vertebrate hox gene clusters are less well conserved across the rest of bilaterians. We report the first deuterostome instance of an intact hox cluster with a unique gene order where the paralog groups are not expressed in a sequential manner. The finished sequence from BAC clones from the genome of the sea urchin, Strongylocentrotus purpuratus, reveals a gene order wherein the anterior genes (Hox1, Hox2 and Hox3) lie nearest the posterior genes in the cluster such that the most 3' gene is Hox5. (The gene order is :more » 5'-Hox1,2, 3, 11/13c, 11/13b, '11/13a, 9/10, 8, 7, 6, 5 - 3)'. The finished sequence result is corroborated by restriction mapping evidence and BAC-end scaffold analyses. Comparisons with a putative ancestral deuterostome Hox gene cluster suggest that the rearrangements leading to the sea urchin gene order were many and complex.« less
Nicoludis, John M; Lau, Sze-Yi; Schärfe, Charlotta P I; Marks, Debora S; Weihofen, Wilhelm A; Gaudet, Rachelle
2015-11-03
Clustered protocadherin (Pcdh) proteins mediate dendritic self-avoidance in neurons via specific homophilic interactions in their extracellular cadherin (EC) domains. We determined crystal structures of EC1-EC3, containing the homophilic specificity-determining region, of two mouse clustered Pcdh isoforms (PcdhγA1 and PcdhγC3) to investigate the nature of the homophilic interaction. Within the crystal lattices, we observe antiparallel interfaces consistent with a role in trans cell-cell contact. Antiparallel dimerization is supported by evolutionary correlations. Two interfaces, located primarily on EC2-EC3, involve distinctive clustered Pcdh structure and sequence motifs, lack predicted glycosylation sites, and contain residues highly conserved in orthologs but not paralogs, pointing toward their biological significance as homophilic interaction interfaces. These two interfaces are similar yet distinct, reflecting a possible difference in interaction architecture between clustered Pcdh subfamilies. These structures initiate a molecular understanding of clustered Pcdh assemblies that are required to produce functional neuronal networks. Copyright © 2015 Elsevier Ltd. All rights reserved.
Swanson, D S; Pan, X; Musser, J M
1996-01-01
Mycobacterium scrofulaceum is most commonly recovered from children with cervical lymphadenitis, although it also accounts for approximately 2% of the mycobacterial infections in AIDS patients. Species assignment of M. scrofulaceum isolated by conventional techniques can be difficult and time-consuming. To develop a strategy for rapid species assignment of these organisms, a 360-bp region of the gene (hsp65) encoding a 65-kDa heat shock protein in 37 isolates from diverse sources was sequenced. Eight hsp65 alleles were identified, and these sequences formed phylogenetic clusters and lineages largely distinct from other Mycobacterium species. There was incomplete correlation between serovar designation and hsp65 allele assignment. The hsp65 data correlated strongly with the results of sequence analysis of the gene coding for 16S rRNA. Automated DNA sequencing of a 360-bp region of the hsp65 gene provides a rapid and unambiguous method for species assignment of these acid-fast organisms for diagnostic purposes. PMID:8940463
Suckau, Detlev; Resemann, Anja
2009-12-01
The ability to match Top-Down protein sequencing (TDS) results by MALDI-TOF to protein sequences by classical protein database searching was evaluated in this work. Resulting from these analyses were the protein identity, the simultaneous assignment of the N- and C-termini and protein sequences of up to 70 residues from either terminus. In combination with de novo sequencing using the MALDI-TDS data, even fusion proteins were assigned and the detailed sequence around the fusion site was elucidated. MALDI-TDS allowed to efficiently match protein sequences quickly and to validate recombinant protein structures-in particular, protein termini-on the level of undigested proteins.
Ochoa, David; García-Gutiérrez, Ponciano; Juan, David; Valencia, Alfonso; Pazos, Florencio
2013-01-27
A widespread family of methods for studying and predicting protein interactions using sequence information is based on co-evolution, quantified as similarity of phylogenetic trees. Part of the co-evolution observed between interacting proteins could be due to co-adaptation caused by inter-protein contacts. In this case, the co-evolution is expected to be more evident when evaluated on the surface of the proteins or the internal layers close to it. In this work we study the effect of incorporating information on predicted solvent accessibility to three methods for predicting protein interactions based on similarity of phylogenetic trees. We evaluate the performance of these methods in predicting different types of protein associations when trees based on positions with different characteristics of predicted accessibility are used as input. We found that predicted accessibility improves the results of two recent versions of the mirrortree methodology in predicting direct binary physical interactions, while it neither improves these methods, nor the original mirrortree method, in predicting other types of interactions. That improvement comes at no cost in terms of applicability since accessibility can be predicted for any sequence. We also found that predictions of protein-protein interactions are improved when multiple sequence alignments with a richer representation of sequences (including paralogs) are incorporated in the accessibility prediction.
Jeon, Amy Hye Won; Böhm, Christopher; Chen, Fusheng; Huo, Hairu; Ruan, Xueying; Ren, Carl He; Ho, Keith; Qamar, Seema; Mathews, Paul M.; Fraser, Paul E.; Mount, Howard T. J.; St George-Hyslop, Peter; Schmitt-Ulms, Gerold
2013-01-01
γ-Secretase plays a pivotal role in the production of neurotoxic amyloid β-peptides (Aβ) in Alzheimer disease (AD) and consists of a heterotetrameric core complex that includes the aspartyl intramembrane protease presenilin (PS). The human genome codes for two presenilin paralogs. To understand the causes for distinct phenotypes of PS paralog-deficient mice and elucidate whether PS mutations associated with early-onset AD affect the molecular environment of mature γ-secretase complexes, quantitative interactome comparisons were undertaken. Brains of mice engineered to express wild-type or mutant PS1, or HEK293 cells stably expressing PS paralogs with N-terminal tandem-affinity purification tags served as biological source materials. The analyses revealed novel interactions of the γ-secretase core complex with a molecular machinery that targets and fuses synaptic vesicles to cellular membranes and with the H+-transporting lysosomal ATPase macrocomplex but uncovered no differences in the interactomes of wild-type and mutant PS1. The catenin/cadherin network was almost exclusively found associated with PS1. Another intramembrane protease, signal peptide peptidase, predominantly co-purified with PS2-containing γ-secretase complexes and was observed to influence Aβ production. PMID:23589300
2016-01-01
Abstract Background Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. New information In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand. Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset. Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach. PMID:27932919
Holovachov, Oleksandr
2016-01-01
Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Curated collections of genetic information do include erroneous sequences. These sequences have detrimental effect on the resolution of cladograms used in tree-based approach. They must be identified and excluded from the reference dataset beforehand.Various combinations of multiple sequence alignment and phylogeny inference methods provide cladograms with different topology and bootstrap support. These combinations of methods need to be tested in order to determine the one that gives highest resolution for the particular reference dataset.Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach.
Makarova, Kira S; Sorokin, Alexander V; Novichkov, Pavel S; Wolf, Yuri I; Koonin, Eugene V
2007-11-27
An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. New Archaeal Clusters of Orthologous Genes (arCOGs) were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon) using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover approximately 88% of the genes in a genome compared to a approximately 76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; approximately 40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome) consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA) is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile that, in addition to the core archaeal functions, encoded more idiosyncratic systems, e.g., the CASS systems of antivirus defense and some toxin-antitoxin systems. The arCOGs provide a convenient, flexible framework for functional annotation of archaeal genomes, comparative genomics and evolutionary reconstructions. Genomic reconstructions suggest that the last common ancestor of archaea might have been (nearly) as advanced as the modern archaeal hyperthermophiles. ArCOGs and related information are available at: ftp://ftp.ncbi.nih.gov/pub/koonin/arCOGs/.
Müller, Jonas E. N.; Kupper, Christiane E.; Schneider, Olha; Vorholt, Julia A.; Ellingsen, Trond E.; Brautaset, Trygve
2013-01-01
Bacillus methanolicus can utilize methanol as the sole carbon source for growth and it encodes an NAD+-dependent methanol dehydrogenase (Mdh), catalyzing the oxidation of methanol to formaldehyde. Recently, the genomes of the B. methanolicus strains MGA3 (ATCC53907) and PB1 (NCIMB13113) were sequenced and found to harbor three different putative Mdh encoding genes, each belonging to the type III Fe-NAD+-dependent alcohol dehydrogenases. In each strain, two of these genes are encoded on the chromosome and one on a plasmid; only one chromosomal act gene encoding the previously described activator protein ACT was found. The six Mdhs and the ACT proteins were produced recombinantly in Escherichia coli, purified, and characterized. All Mdhs required NAD+ as cosubstrate, were catalytically stimulated by ACT, exhibited a broad and different substrate specificity range and displayed both dehydrogenase and reductase activities. All Mdhs catalyzed the oxidation of methanol; however the catalytic activity for methanol was considerably lower than for most other alcohols tested, suggesting that these enzymes represent a novel class of alcohol dehydrogenases. The kinetic constants for the Mdhs were comparable when acting as pure enzymes, but together with ACT the differences were more pronounced. Quantitative PCR experiments revealed major differences with respect to transcriptional regulation of the paralogous genes. Taken together our data indicate that the repertoire of methanol oxidizing enzymes in thermotolerant bacilli is larger than expected with complex mechanisms involved in their regulation. PMID:23527128
Krog, Anne; Heggeset, Tonje M B; Müller, Jonas E N; Kupper, Christiane E; Schneider, Olha; Vorholt, Julia A; Ellingsen, Trond E; Brautaset, Trygve
2013-01-01
Bacillus methanolicus can utilize methanol as the sole carbon source for growth and it encodes an NAD(+)-dependent methanol dehydrogenase (Mdh), catalyzing the oxidation of methanol to formaldehyde. Recently, the genomes of the B. methanolicus strains MGA3 (ATCC53907) and PB1 (NCIMB13113) were sequenced and found to harbor three different putative Mdh encoding genes, each belonging to the type III Fe-NAD(+)-dependent alcohol dehydrogenases. In each strain, two of these genes are encoded on the chromosome and one on a plasmid; only one chromosomal act gene encoding the previously described activator protein ACT was found. The six Mdhs and the ACT proteins were produced recombinantly in Escherichia coli, purified, and characterized. All Mdhs required NAD(+) as cosubstrate, were catalytically stimulated by ACT, exhibited a broad and different substrate specificity range and displayed both dehydrogenase and reductase activities. All Mdhs catalyzed the oxidation of methanol; however the catalytic activity for methanol was considerably lower than for most other alcohols tested, suggesting that these enzymes represent a novel class of alcohol dehydrogenases. The kinetic constants for the Mdhs were comparable when acting as pure enzymes, but together with ACT the differences were more pronounced. Quantitative PCR experiments revealed major differences with respect to transcriptional regulation of the paralogous genes. Taken together our data indicate that the repertoire of methanol oxidizing enzymes in thermotolerant bacilli is larger than expected with complex mechanisms involved in their regulation.
Boorse, Graham C; Crespi, Erica J; Dautzenberg, Frank M; Denver, Robert J
2005-11-01
Several corticotropin-releasing factor (CRF) family genes have been identified in vertebrates. Mammals have four paralogous genes that encode CRF or the urocortins 1, 2, and 3. In teleost fishes, a CRF, urotensin I (a fish ortholog of mammalian urocortin 1) and urocortin 3 have been identified, suggesting that at least three of the four mammalian lineages arose in a common ancestor of modern bony fishes and tetrapods. Here we report the isolation of genes orthologous to mammalian urocortin 1 and urocortin 3 from the South African clawed frog, Xenopus laevis. We characterize the pharmacology of the frog peptides and show that X. laevis urocortin 1 binds to and activates the frog CRF1 and CRF2 receptors at picomolar concentrations. Similar to mammals, frog urocortin 3 is selective for the CRF2 receptor. Only frog urocortin 1 binds to the CRF-binding protein, although with significantly lower affinity than frog CRF. Both urocortin genes are expressed in brain, pituitary, heart, and kidney of juvenile frogs; urocortin 1 is also expressed in skin. We also identified novel urocortin sequences in the genomes of pufferfish, zebrafish, chicken, and dog. Phylogenetic analysis supports the view that four paralogous lineages of CRF-like peptides arose before the divergence of the actinopterygian and sarcopterygian fishes. Our findings show that the functional relationships among CRF ligands and binding proteins, and their anorexigenic actions mediated by the CRF2 receptor, arose early in vertebrate evolution.
Collaborative Learning through Formative Peer Review with Technology
ERIC Educational Resources Information Center
Eaton, Carrie Diaz; Wade, Stephanie
2014-01-01
This paper describes a collaboration between a mathematician and a compositionist who developed a sequence of collaborative writing assignments for calculus. This sequence of developmentally appropriate assignments presents peer review as a collaborative process that promotes reflection, deepens understanding, and improves exposition. First, we…
Dores, Robert M
2016-01-01
The evolution of the melanocortin receptors (MCRs) is closely associated with the evolution of the melanocortin-2 receptor accessory proteins (MRAPs). Recent annotation of the elephant shark genome project revealed the sequence of a putative MRAP1 ortholog. The presence of this sequence in the genome of a cartilaginous fish raises the possibility that the mrap1 and mrap2 genes in the genomes of gnathostome vertebrates were the result of the chordate 2R genome duplication event. The presence of a putative MRAP1 ortholog in a cartilaginous fish genome is perplexing. Recent studies on melanocortin-2 receptor (MC2R) in the genomes of the elephant shark and the Japanese stingray indicate that these MC2R orthologs can be functionally expressed in CHO cells without co-expression of an exogenous mrap1 cDNA. The novel ligand selectivity of these cartilaginous fish MC2R orthologs is discussed. Finally, the origin of the mc2r and mc5r genes is reevaluated. The distinctive primary sequence conservation of MC2R and MC5R is discussed in light of the physiological roles of these two MCR paralogs.
Sequencing the Black Aspergilli species complex
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kuo, Alan; Salamov, Asaf; Zhou, Kemin
2011-03-11
The ~15 members of the Aspergillus section Nigri species complex (the "Black Aspergilli") are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as food processing and spoilage agents and agricultural toxigens. Despite their utility and ubiquity, the morphological and metabolic distinctiveness of the complex's members, and thus their taxonomy, is poorly defined. We are using short read pyrosequencing technology (Roche/454 and Illumina/Solexa) to rapidly scale up genomic and transcriptomic analysis of this species complex. To date we predict 11197 genes in Aspergillus niger, 11624 genes inmore » A. carbonarius, and 10845 genes in A. aculeatus. A. aculeatus is our most recent genome, and was assembled primarily from 454-sequenced reads and annotated with the aid of >2 million 454 ESTs and >300 million Solexa ESTs. To most effectively deploy these very large numbers of ESTs we developed 2 novel methods for clustering the ESTs into assemblies. We have also developed a pipeline to propose orthologies and paralogies among genes in the species complex. In the near future we will apply these methods to additional species of Black Aspergilli that are currently in our sequencing pipeline.« less
MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach
Watson, Mick; Minot, Samuel S.; Rivera, Maria C.; Franklin, Rima B.
2017-01-01
Abstract Background: Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at <0.05% of DNA in the total mixture), 91% of reads were assigned to the correct species, 93% of reads were assigned to the correct genus, and >99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities. PMID:28327976
MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.
Brown, Bonnie L; Watson, Mick; Minot, Samuel S; Rivera, Maria C; Franklin, Rima B
2017-03-01
Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at <0.05% of DNA in the total mixture), 91% of reads were assigned to the correct species, 93% of reads were assigned to the correct genus, and >99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities. © The Author 2017. Published by Oxford University Press.
Bhuiyan, Sharmin Siddique; Kinoshita, Shigeharu; Wongwarangkana, Chaninya; Asaduzzaman, Md; Asakawa, Shuichi; Watabe, Shugo
2013-07-06
A novel sarcomeric myosin heavy chain gene, MYH14, was identified following the completion of the human genome project. MYH14 contains an intronic microRNA, miR-499, which is expressed in a slow/cardiac muscle specific manner along with its host gene; it plays a key role in muscle fiber-type specification in mammals. Interestingly, teleost fish genomes contain multiple MYH14 and miR-499 paralogs. However, the evolutionary history of MYH14 and miR-499 has not been studied in detail. In the present study, we identified MYH14/miR-499 loci on various teleost fish genomes and examined their evolutionary history by sequence and expression analyses. Synteny and phylogenetic analyses depict the evolutionary history of MYH14/miR-499 loci where teleost specific duplication and several subsequent rounds of species-specific gene loss events took place. Interestingly, miR-499 was not located in the MYH14 introns of certain teleost fish. An MYH14 paralog, lacking miR-499, exhibited an accelerated rate of evolution compared with those containing miR-499, suggesting a putative functional relationship between MYH14 and miR-499. In medaka, Oryzias latipes, miR-499 is present where MYH14 is completely absent in the genome. Furthermore, by using in situ hybridization and small RNA sequencing, miR-499 was expressed in the notochord at the medaka embryonic stage and slow/cardiac muscle at the larval and adult stages. Comparing the flanking sequences of MYH14/miR-499 loci between torafugu Takifugu rubripes, zebrafish Danio rerio, and medaka revealed some highly conserved regions, suggesting that cis-regulatory elements have been functionally conserved in medaka miR-499 despite the loss of its host gene. This study reveals the evolutionary history of the MYH14/miRNA-499 locus in teleost fish, indicating divergent distribution and expression of MYH14 and miR-499 genes in different teleost fish lineages. We also found that medaka miR-499 was even expressed in the absence of its host gene. To our knowledge, this is the first report that shows the conversion of intronic into non-intronic miRNA during the evolution of a teleost fish lineage.
Rice, Danny W; Palmer, Jeffrey D
2006-01-01
Background Horizontal gene transfer (HGT) to the plant mitochondrial genome has recently been shown to occur at a surprisingly high rate; however, little evidence has been found for HGT to the plastid genome, despite extensive sequencing. In this study, we analyzed all genes from sequenced plastid genomes to unearth any neglected cases of HGT and to obtain a measure of the overall extent of HGT to the plastid. Results Although several genes gave strongly supported conflicting trees under certain conditions, we are confident of HGT in only a single case beyond the rubisco HGT already reported. Most of the conflicts involved near neighbors connected by long branches (e.g. red algae and their secondary hosts), where phylogenetic methods are prone to mislead. However, three genes – clpP, ycf2, and rpl36 – provided strong support for taxa moving far from their organismal position. Further taxon sampling of clpP and ycf2 resulted in rejection of HGT due to long-branch attraction and a serious error in the published plastid genome sequence of Oenothera elata, respectively. A single new case, a bacterial rpl36 gene transferred into the ancestor of the cryptophyte and haptophyte plastids, appears to be a true HGT event. Interestingly, this rpl36 gene is a distantly related paralog of the rpl36 type found in other plastids and most eubacteria. Moreover, the transferred gene has physically replaced the native rpl36 gene, yet flanking genes and intergenic regions show no sign of HGT. This suggests that gene replacement somehow occurred by recombination at the very ends of rpl36, without the level and length of similarity normally expected to support recombination. Conclusion The rpl36 HGT discovered in this study is of considerable interest in terms of both molecular mechanism and phylogeny. The plastid acquisition of a bacterial rpl36 gene via HGT provides the first strong evidence for a sister-group relationship between haptophyte and cryptophyte plastids to the exclusion of heterokont and alveolate plastids. Moreover, the bacterial gene has replaced the native plastid rpl36 gene by an uncertain mechanism that appears inconsistent with existing models for the recombinational basis of gene conversion. PMID:16956407
Derivational Suffixes as Cues to Stress Position in Reading Greek
ERIC Educational Resources Information Center
Grimani, Aikaterini; Protopapas, Athanassios
2017-01-01
Background: In languages with lexical stress, reading aloud must include stress assignment. Stress information sources across languages include word-final letter sequences. Here, we examine whether such sequences account for stress assignment in Greek and whether this is attributable to absolute rules involving accenting morphemes or to…
Amaral, Ian P G; Johnston, Ian A
2012-01-01
To identify circadian patterns of gene expression in skeletal muscle, adult male zebrafish were acclimated for 2 wk to a 12:12-h light-dark photoperiod and then exposed to continuous darkness for 86 h with ad libitum feeding. The increase in gut food content associated with the subjective light period was much diminished by the third cycle, enabling feeding and circadian rhythms to be distinguished. Expression of zebrafish paralogs of mammalian transcriptional activators of the circadian mechanism (bmal1, clock1, and rora) followed a rhythmic pattern with a ∼24-h periodicity. Peak expression of rora paralogs occurred at the beginning of the subjective light period [Zeitgeber time (ZT)07 and ZT02 for roraa and rorab], whereas the highest expression of bmal1 and clock paralogs occurred 12 h later (ZT13-15 and ZT16 for bmal and clock paralogs). Expression of the transcriptional repressors cry1a, per1a/1b, per2, per3, nr1d2a/2b, and nr1d1 also followed a circadian pattern with peak expression at ZT0-02. Expression of the two paralogs of cry2 occurred in phase with clock1a/1b. Duplicated genes had a high correlation of expression except for paralogs of clock1, nr1d2, and per1, with cry1b showing no circadian pattern. The highest expression difference was 9.2-fold for the activator bmal1b and 51.7-fold for the repressor per1a. Out of 32 candidate clock-controlled genes, only myf6, igfbp3, igfbp5b, and hsf2 showed circadian expression patterns. Igfbp3, igfbp5b, and myf6 were expressed in phase with clock1a/1b and had an average of twofold change in expression from peak to trough, whereas hsf2 transcripts were expressed in phase with cry1a and had a 7.2-fold-change in expression. The changes in expression of clock and clock-controlled genes observed during continuous darkness were also observed at similar ZTs in fish exposed to a normal photoperiod in a separate control experiment. The role of circadian clocks in regulating muscle maintenance and growth are discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holowiecki, Andrew
While heme is an important cofactor for numerous proteins, it is highly toxic in its unbound form and can perpetuate the formation of reactive oxygen species. Heme oxygenase enzymes (HMOX1 and HMOX2) degrade heme into biliverdin and carbon monoxide, with biliverdin subsequently being converted to bilirubin by biliverdin reductase (BVRa or BVRb). As a result of the teleost-specific genome duplication event, zebrafish have paralogs of hmox1 (hmox1a and hmox1b) and hmox2 (hmox2a and hmox2b). Expression of all four hmox paralogs and two bvr isoforms were measured in adult tissues (gill, brain and liver) and sexually dimorphic differences were observed, mostmore » notably in the basal expression of hmox1a, hmox2a, hmox2b and bvrb in liver samples. hmox1a, hmox2a and hmox2b were significantly induced in male liver tissues in response to 96 h cadmium exposure (20 μM). hmox2a and hmox2b were significantly induced in male brain samples, but only hmox2a was significantly reduced in male gill samples in response to the 96 h cadmium exposure. hmox paralogs displayed significantly different levels of basal expression in most adult tissues, as well as during zebrafish development (24 to 120 hpf). Furthermore, hmox1a, hmox1b and bvrb were significantly induced in zebrafish eleutheroembryos in response to multiple pro-oxidants (cadmium, hemin and tert-butylhydroquinone). Knockdown of Nrf2a, a transcriptional regulator of hmox1a, was demonstrated to inhibit the Cd-mediated induction of hmox1b and bvrb. These results demonstrate distinct mechanisms of hmox and bvr transcriptional regulation in zebrafish, providing initial evidence of the partitioning of function of the hmox paralogs. - Highlights: • hmox1a, hmox2a, hmox2b and bvrb are sexually dimorphic in expression. • hmox paralogs were induced in adult tissues by cadmium exposure. • hmox1a, hmox1b and bvrb were induced by multiple pro-oxidants zebrafish embryos. • Differential expression of zebrafish hmox paralogs suggest partitioning of function. • Nrf2a mediates the induction of hmox1b and bvrb by cadmium in zebrafish embryos.« less
Biedler, James K; Tu, Zhijian
2010-07-08
The maternal zygotic transition marks the time at which transcription from the zygotic genome is initiated and a subset of maternal RNAs are progressively degraded in the developing embryo. A number of early zygotic genes have been identified in Drosophila melanogaster and comparisons to sequenced mosquito genomes suggest that some of these early zygotic genes such as bottleneck are fast-evolving or subject to turnover in dipteran insects. One objective of this study is to identify early zygotic genes from the yellow fever mosquito Aedes aegypti to study their evolution. We are also interested in obtaining early zygotic promoters that will direct transgene expression in the early embryo as part of a Medea gene drive system. Two novel early zygotic kinesin light chain genes we call AaKLC2.1 and AaKLC2.2 were identified by transcriptome sequencing of Aedes aegypti embryos at various time points. These two genes have 98% nucleotide and amino acid identity in their coding regions and show transcription confined to the early zygotic stage according to gene-specific RT-PCR analysis. These AaKLC2 genes have a paralogous gene (AaKLC1) in Ae. aegypti. Phylogenetic inference shows that an ortholog to the AaKLC2 genes is only found in the sequenced genome of Culex quinquefasciatus. In contrast, AaKLC1 gene orthologs are found in all three sequenced mosquito species including Anopheles gambiae. There is only one KLC gene in D. melanogaster and other sequenced holometabolous insects that appears to be similar to AaKLC1. Unlike AaKLC2, AaKLC1 is expressed in all life stages and tissues tested, which is consistent with the expression pattern of the An. gambiae and D. melanogaster KLC genes. Phylogenetic inference also suggests that AaKLC2 genes and their likely C. quinquefasciatus ortholog are fast-evolving genes relative to the highly conserved AaKLC1-like paralogs. Embryonic injection of a luciferase reporter under the control of a 1 kb fragment upstream of the AaKLC2.1 start codon shows promoter activity at least as early as 3 hours in the developing Ae. aegypti embryo. The AaKLC2.1 promoter activity reached ~1600 fold over the negative control at 5 hr after egg deposition. Transcriptome profiling by use of high throughput sequencing technologies has proven to be a valuable method for the identification and discovery of early and transient zygotic genes. The evolutionary investigation of the KLC gene family reveals that duplication is a source for the evolution of new genes that play a role in the dynamic process of early embryonic development. AaKLC2.1 may provide a promoter for early zygotic-specific transgene expression, which is a key component of the Medea gene drive system.
Flexible taxonomic assignment of ambiguous sequencing reads
2011-01-01
Background To characterize the diversity of bacterial populations in metagenomic studies, sequencing reads need to be accurately assigned to taxonomic units in a given reference taxonomy. Reads that cannot be reliably assigned to a unique leaf in the taxonomy (ambiguous reads) are typically assigned to the lowest common ancestor of the set of species that match it. This introduces a potentially severe error in the estimation of bacteria present in the sample due to false positives, since all species in the subtree rooted at the ancestor are implicitly assigned to the read even though many of them may not match it. Results We present a method that maps each read to a node in the taxonomy that minimizes a penalty score while balancing the relevance of precision and recall in the assignment through a parameter q. This mapping can be obtained in time linear in the number of matching sequences, because LCA queries to the reference taxonomy take constant time. When applied to six different metagenomic datasets, our algorithm produces different taxonomic distributions depending on whether coverage or precision is maximized. Including information on the quality of the reads reduces the number of unassigned reads but increases the number of ambiguous reads, stressing the relevance of our method. Finally, two measures of performance are described and results with a set of artificially generated datasets are discussed. Conclusions The assignment strategy of sequencing reads introduced in this paper is a versatile and a quick method to study bacterial communities. The bacterial composition of the analyzed samples can vary significantly depending on how ambiguous reads are assigned depending on the value of the q parameter. Validation of our results in an artificial dataset confirm that a combination of values of q produces the most accurate results. PMID:21211059
Evolution and Expression of Tissue Globins in Ray-Finned Fishes.
Gallagher, Michael D; Macqueen, Daniel J
2017-01-01
The globin gene family encodes oxygen-binding hemeproteins conserved across the major branches of multicellular life. The origins and evolutionary histories of complete globin repertoires have been established for many vertebrates, but there remain major knowledge gaps for ray-finned fish. Therefore, we used phylogenetic, comparative genomic and gene expression analyses to discover and characterize canonical “non-blood” globin family members (i.e., myoglobin, cytoglobin, neuroglobin, globin-X, and globin-Y) across multiple ray-finned fish lineages, revealing novel gene duplicates (paralogs) conserved from whole genome duplication (WGD) and small-scale duplication events. Our key findings were that: (1) globin-X paralogs in teleosts have been retained from the teleost-specific WGD, (2) functional paralogs of cytoglobin, neuroglobin, and globin-X, but not myoglobin, have been conserved from the salmonid-specific WGD, (3) triplicate lineage-specific myoglobin paralogs are conserved in arowanas (Osteoglossiformes), which arose by tandem duplication and diverged under positive selection, (4) globin-Y is retained in multiple early branching fish lineages that diverged before teleosts, and (5) marked variation in tissue-specific expression of globin gene repertoires exists across ray-finned fish evolution, including several previously uncharacterized sites of expression. In this respect, our data provide an interesting link between myoglobin expression and the evolution of air breathing in teleosts. Together, our findings demonstrate great-unrecognized diversity in the repertoire and expression of nonblood globins that has arisen during ray-finned fish evolution.
Sharma, Bharti; Guo, Chunce; Kong, Hongzhi; Kramer, Elena M
2011-08-01
• The petals of the lower eudicot family Ranunculaceae are thought to have been derived many times independently from stamens. However, investigation of the genetic basis of their identity has suggested an alternative hypothesis: that they share a commonly inherited petal identity program. This theory is based on the fact that an ancient paralogous lineage of APETALA3 (AP3) in the Ranunculaceae appears to have a conserved, petal-specific expression pattern. • Here, we have used a combination of approaches, including RNAi, comparative gene expression and molecular evolutionary studies, to understand the function of this petal-specific AP3 lineage. • Functional analysis of the Aquilegia locus AqAP3-3 has demonstrated that the paralog is required for petal identity with little contribution to the identity of the other floral organs. Expanded expression studies and analyses of molecular evolutionary patterns provide further evidence that orthologs of AqAP3-3 are primarily expressed in petals and are under higher purifying selection across the family than the other AP3 paralogs. • Taken together, these findings suggest that the AqAP3-3 lineage underwent progressive subfunctionalization within the order Ranunculales, ultimately yielding a specific role in petal identity that has probably been conserved, in stark contrast with the multiple independent origins predicted by botanical theories. © 2011 The Authors. New Phytologist © 2011 New Phytologist Trust.
Jabbour, Florian; Cossard, Guillaume; Le Guilloux, Martine; Sannier, Julie; Nadot, Sophie; Damerval, Catherine
2014-01-01
Floral bilateral symmetry (zygomorphy) has evolved several times independently in angiosperms from radially symmetrical (actinomorphic) ancestral states. Homologs of the Antirrhinum majus Cycloidea gene (Cyc) have been shown to control floral symmetry in diverse groups in core eudicots. In the basal eudicot family Ranunculaceae, there is a single evolutionary transition from actinomorphy to zygomorphy in the stem lineage of the tribe Delphinieae. We characterized Cyc homologs in 18 genera of Ranunculaceae, including the four genera of Delphinieae, in a sampling that represents the floral morphological diversity of this tribe, and reconstructed the evolutionary history of this gene family in Ranunculaceae. Within each of the two RanaCyL (Ranunculaceae Cycloidea-like) lineages previously identified, an additional duplication possibly predating the emergence of the Delphinieae was found, resulting in up to four gene copies in zygomorphic species. Expression analyses indicate that the RanaCyL paralogs are expressed early in floral buds and that the duration of their expression varies between species and paralog class. At most one RanaCyL paralog was expressed during the late stages of floral development in the actinomorphic species studied whereas all paralogs from the zygomorphic species were expressed, composing a species-specific identity code for perianth organs. The contrasted asymmetric patterns of expression observed in the two zygomorphic species is discussed in relation to their distinct perianth architecture.
Jabbour, Florian; Cossard, Guillaume; Le Guilloux, Martine; Sannier, Julie; Nadot, Sophie; Damerval, Catherine
2014-01-01
Floral bilateral symmetry (zygomorphy) has evolved several times independently in angiosperms from radially symmetrical (actinomorphic) ancestral states. Homologs of the Antirrhinum majus Cycloidea gene (Cyc) have been shown to control floral symmetry in diverse groups in core eudicots. In the basal eudicot family Ranunculaceae, there is a single evolutionary transition from actinomorphy to zygomorphy in the stem lineage of the tribe Delphinieae. We characterized Cyc homologs in 18 genera of Ranunculaceae, including the four genera of Delphinieae, in a sampling that represents the floral morphological diversity of this tribe, and reconstructed the evolutionary history of this gene family in Ranunculaceae. Within each of the two RanaCyL (Ranunculaceae Cycloidea-like) lineages previously identified, an additional duplication possibly predating the emergence of the Delphinieae was found, resulting in up to four gene copies in zygomorphic species. Expression analyses indicate that the RanaCyL paralogs are expressed early in floral buds and that the duration of their expression varies between species and paralog class. At most one RanaCyL paralog was expressed during the late stages of floral development in the actinomorphic species studied whereas all paralogs from the zygomorphic species were expressed, composing a species-specific identity code for perianth organs. The contrasted asymmetric patterns of expression observed in the two zygomorphic species is discussed in relation to their distinct perianth architecture. PMID:24752428
Gain-of-function mutations in beet DODA2 identify key residues for betalain pigment evolution.
Bean, Alexander; Sunnadeniya, Rasika; Akhavan, Neda; Campbell, Annabelle; Brown, Matthew; Lloyd, Alan
2018-05-13
The key enzymatic step in betalain biosynthesis involves conversion of l-3,4-dihydroxyphenylalanine (l-DOPA) to betalamic acid. One class of enzymes capable of this is 3,4-dihydroxyphenylalanine 4,5-dioxygenase (DODA). In betalain-producing species, multiple paralogs of this gene are maintained. This study demonstrates which paralogs function in the betalain pathway and determines the residue changes required to evolve a betalain-nonfunctional DODA into a betalain-functional DODA. Functionalities of two pairs of DODAs were tested by expression in beets, Arabidopsis and yeast, and gene silencing was performed by virus-induced gene silencing. Site-directed mutagenesis identified amino acid residues essential for betalamic acid production. Beta vulgaris and Mirabilis jalapa both possess a DODA1 lineage that functions in the betalain pathway and at least one other lineage, DODA2, that does not. Site-directed mutagenesis resulted in betalain biosynthesis by a previously nonfunctional DODA, revealing key residues required for evolution of the betalain pathway. Divergent functionality of DODA paralogs, one clade involved in betalain biosynthesis but others not, is present in various Caryophyllales species. A minimum of seven amino acid residue changes conferred betalain enzymatic activity to a betalain-nonfunctional DODA paralog, providing insight into the evolution of the betalain pigment pathway in plants. © 2018 The Authors. New Phytologist © 2018 New Phytologist Trust.
Evolutionary acquisition of cysteines determines FOXO paralog-specific redox signaling.
Putker, Marrit; Vos, Harmjan R; van Dorenmalen, Kim; de Ruiter, Hesther; Duran, Ana G; Snel, Berend; Burgering, Boudewijn M T; Vermeulen, Michiel; Dansen, Tobias B
2015-01-01
Reduction-oxidation (redox) signaling, the translation of an oxidative intracellular environment into a cellular response, is mediated by the reversible oxidation of specific cysteine thiols. The latter can result in disulfide formation between protein hetero- or homodimers that alter protein function until the local cellular redox environment has returned to the basal state. We have previously shown that this mechanism promotes the nuclear localization and activity of the Forkhead Box O4 (FOXO4) transcription factor. In this study, we sought to investigate whether redox signaling differentially controls the human FOXO3 and FOXO4 paralogs. We present evidence that FOXO3 and FOXO4 have acquired paralog-specific cysteines throughout vertebrate evolution. Using a proteome-wide screen, we identified previously unknown redox-dependent FOXO3 interaction partners. The nuclear import receptors Importin-7 (IPO7) and Importin-8 (IPO8) form a disulfide-dependent heterodimer with FOXO3, which is required for its reactive oxygen species-induced nuclear translocation. FOXO4 does not interact with IPO7 or IPO8. IPO7 and IPO8 control the nuclear import of FOXO3, but not FOXO4, in a redox-sensitive and disulfide-dependent manner. Our findings suggest that evolutionary acquisition of cysteines has contributed to regulatory divergence of FOXO paralogs, and that phylogenetic analysis can aid in the identification of cysteines involved in redox signaling.
A Teaching-Learning Sequence about Weather Map Reading
ERIC Educational Resources Information Center
Mandrikas, Achilleas; Stavrou, Dimitrios; Skordoulis, Constantine
2017-01-01
In this paper a teaching-learning sequence (TLS) introducing pre-service elementary teachers (PET) to weather map reading, with emphasis on wind assignment, is presented. The TLS includes activities about recognition of wind symbols, assignment of wind direction and wind speed on a weather map and identification of wind characteristics in a…
Wagner, Florence F; Benajiba, Lina; Campbell, Arthur J; Weïwer, Michel; Sacher, Joshua R; Gale, Jennifer P; Ross, Linda; Puissant, Alexandre; Alexe, Gabriela; Conway, Amy; Back, Morgan; Pikman, Yana; Galinsky, Ilene; DeAngelo, Daniel J; Stone, Richard M; Kaya, Taner; Shi, Xi; Robers, Matthew B; Machleidt, Thomas; Wilkinson, Jennifer; Hermine, Olivier; Kung, Andrew; Stein, Adam J; Lakshminarasimhan, Damodharan; Hemann, Michael T; Scolnick, Edward; Zhang, Yan-Ling; Pan, Jen Q; Stegmaier, Kimberly; Holson, Edward B
2018-03-07
Glycogen synthase kinase 3 (GSK3), a key regulatory kinase in the wingless-type MMTV integration site family (WNT) pathway, is a therapeutic target of interest in many diseases. Although dual GSK3α/β inhibitors have entered clinical trials, none has successfully translated to clinical application. Mechanism-based toxicities, driven in part by the inhibition of both GSK3 paralogs and subsequent β-catenin stabilization, are a concern in the translation of this target class because mutations and overexpression of β-catenin are associated with many cancers. Knockdown of GSK3α or GSK3β individually does not increase β-catenin and offers a conceptual resolution to targeting GSK3: paralog-selective inhibition. However, inadequate chemical tools exist. The design of selective adenosine triphosphate (ATP)-competitive inhibitors poses a drug discovery challenge due to the high homology (95% identity and 100% similarity) in this binding domain. Taking advantage of an Asp 133 →Glu 196 "switch" in their kinase hinge, we present a rational design strategy toward the discovery of paralog-selective GSK3 inhibitors. These GSK3α- and GSK3β-selective inhibitors provide insights into GSK3 targeting in acute myeloid leukemia (AML), where GSK3α was identified as a therapeutic target using genetic approaches. The GSK3α-selective compound BRD0705 inhibits kinase function and does not stabilize β-catenin, mitigating potential neoplastic concerns. BRD0705 induces myeloid differentiation and impairs colony formation in AML cells, with no apparent effect on normal hematopoietic cells. Moreover, BRD0705 impairs leukemia initiation and prolongs survival in AML mouse models. These studies demonstrate feasibility of paralog-selective GSK3α inhibition, offering a promising therapeutic approach in AML. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
McIntyre, Chloe L.; Knowles, Nick J.
2013-01-01
Human rhinoviruses (HRVs) frequently cause mild upper respiratory tract infections and more severe disease manifestations such as bronchiolitis and asthma exacerbations. HRV is classified into three species within the genus Enterovirus of the family Picornaviridae. HRV species A and B contain 75 and 25 serotypes identified by cross-neutralization assays, although the use of such assays for routine HRV typing is hampered by the large number of serotypes, replacement of virus isolation by molecular methods in HRV diagnosis and the poor or absent replication of HRV species C in cell culture. To address these problems, we propose an alternative, genotypic classification of HRV-based genetic relatedness analogous to that used for enteroviruses. Nucleotide distances between 384 complete VP1 sequences of currently assigned HRV (sero)types identified divergence thresholds of 13, 12 and 13 % for species A, B and C, respectively, that divided inter- and intra-type comparisons. These were paralleled by 10, 9.5 and 10 % thresholds in the larger dataset of >3800 VP4 region sequences. Assignments based on VP1 sequences led to minor revisions of existing type designations (such as the reclassification of serotype pairs, e.g. A8/A95 and A29/A44, as single serotypes) and the designation of new HRV types A101–106, B101–103 and C34–C51. A protocol for assignment and numbering of new HRV types using VP1 sequences and the restriction of VP4 sequence comparisons to type identification and provisional type assignments is proposed. Genotypic assignment and identification of HRV types will be of considerable value in the future investigation of type-associated differences in disease outcomes, transmission and epidemiology. PMID:23677786
Choosing and Using Introns in Molecular Phylogenetics
Creer, Simon
2007-01-01
Introns are now commonly used in molecular phylogenetics in an attempt to recover gene trees that are concordant with species trees, but there are a range of genomic, logistical and analytical considerations that are infrequently discussed in empirical studies that utilize intron data. This review outlines expedient approaches for locus selection, overcoming paralogy problems, recombination detection methods and the identification and incorporation of LVHs in molecular systematics. A range of parsimony and Bayesian analytical approaches are also described in order to highlight the methods that can currently be employed to align sequences and treat indels in subsequent analyses. By covering the main points associated with the generation and analysis of intron data, this review aims to provide a comprehensive introduction to using introns (or any non-coding nuclear data partition) in contemporary phylogenetics. PMID:19461984
Diversity of human copy number variation and multicopy genes.
Sudmant, Peter H; Kitzman, Jacob O; Antonacci, Francesca; Alkan, Can; Malig, Maika; Tsalenko, Anya; Sampas, Nick; Bruhn, Laurakay; Shendure, Jay; Eichler, Evan E
2010-10-29
Copy number variants affect both disease and normal phenotypic variation, but those lying within heavily duplicated, highly identical sequence have been difficult to assay. By analyzing short-read mapping depth for 159 human genomes, we demonstrated accurate estimation of absolute copy number for duplications as small as 1.9 kilobase pairs, ranging from 0 to 48 copies. We identified 4.1 million "singly unique nucleotide" positions informative in distinguishing specific copies and used them to genotype the copy and content of specific paralogs within highly duplicated gene families. These data identify human-specific expansions in genes associated with brain development, reveal extensive population genetic diversity, and detect signatures consistent with gene conversion in the human species. Our approach makes ~1000 genes accessible to genetic studies of disease association.
Sequence-Level Mechanisms of Human Epigenome Evolution
Prendergast, James G.D.; Chambers, Emily V.; Semple, Colin A.M.
2014-01-01
DNA methylation and chromatin states play key roles in development and disease. However, the extent of recent evolutionary divergence in the human epigenome and the influential factors that have shaped it are poorly understood. To determine the links between genome sequence and human epigenome evolution, we examined the divergence of DNA methylation and chromatin states following segmental duplication events in the human lineage. Chromatin and DNA methylation states were found to have been generally well conserved following a duplication event, with the evolution of the epigenome largely uncoupled from the total number of genetic changes in the surrounding DNA sequence. However, the epigenome at tissue-specific, distal regulatory regions was observed to be unusually prone to diverge following duplication, with particular sequence differences, altering known sequence motifs, found to be associated with divergence in patterns of DNA methylation and chromatin. Alu elements were found to have played a particularly prominent role in shaping human epigenome evolution, and we show that human-specific AluY insertion events are strongly linked to the evolution of the DNA methylation landscape and gene expression levels, including at key neurological genes in the human brain. Studying paralogous regions within the same sample enables the study of the links between genome and epigenome evolution while controlling for biological and technical variation. We show DNA methylation and chromatin divergence between duplicated regions are linked to the divergence of particular genetic motifs, with Alu elements having played a disproportionate role in the evolution of the epigenome in the human lineage. PMID:24966180
Molecular determinants of origin discrimination by Orc1 initiators in archaea.
Dueber, Erin C; Costa, Alessandro; Corn, Jacob E; Bell, Stephen D; Berger, James M
2011-05-01
Unlike bacteria, many eukaryotes initiate DNA replication from genomic sites that lack apparent sequence conservation. These loci are identified and bound by the origin recognition complex (ORC), and subsequently activated by a cascade of events that includes recruitment of an additional factor, Cdc6. Archaeal organisms generally possess one or more Orc1/Cdc6 homologs, belonging to the Initiator clade of ATPases associated with various cellular activities (AAA(+)) superfamily; however, these proteins recognize specific sequences within replication origins. Atomic resolution studies have shown that archaeal Orc1 proteins contact double-stranded DNA through an N-terminal AAA(+) domain and a C-terminal winged-helix domain (WHD), but use remarkably few base-specific contacts. To investigate the biochemical effects of these associations, we mutated the DNA-interacting elements of the Orc1-1 and Orc1-3 paralogs from the archaeon Sulfolobus solfataricus, and tested their effect on origin binding and deformation. We find that the AAA(+) domain has an unpredicted role in controlling the sequence selectivity of DNA binding, despite an absence of base-specific contacts to this region. Our results show that both the WHD and ATPase region influence origin recognition by Orc1/Cdc6, and suggest that not only DNA sequence, but also local DNA structure help define archaeal initiator binding sites. © The Author(s) 2011. Published by Oxford University Press.
Miller, John J; Delwiche, Charles F
2015-06-01
Emiliania huxleyi is a haptophyte alga of uncertain phylogenetic affinity containing a secondarily derived, chlorophyll c containing plastid. We sought to characterize its relationships with other taxa by quantifying the bipartitions in which it was included from a group of single protein phylogenetic trees in a way that allowed for variation in taxonomic content and accounted for paralogous sequences. The largest number of sequences supported a phylogenetic relationship of E. huxleyi with the stramenopiles, in particular Aureococcus anophagefferens. Far fewer nuclear sequences gave strong support to the placement of this coccolithophorid with the cryptophyte, Guillardia theta. The majority of the sequences that did support this relationship did not have plastid related functions. These results suggest that the haptophytes may be more closely allied with the heterokonts than with the cryptophytes. Another small set of genes associated E. huxleyi with the Viridiplantae with high support. While these genes could have been acquired with a plastid, the lack of plastid related functions among the proteins for which they code and the lack of other organisms with chlorophyll c containing plastids within these bipartitions suggests other explanations may be possible. This study also identified several genes that may have been transferred from the haptophyte lineage to the dinoflagellates Karenia brevis and Karlodinium veneficum as a result of their haptophyte derived plastid, including some with non-photosynthetic functions. Published by Elsevier B.V.
Tolson, D A; Nicholson, N H
1998-01-01
The determination of DNA sequences by partial exonuclease digestion followed by Matrix-Assisted Laser Desorption Time of Flight Mass Spectrometry (MALDI-TOF) is a well established method. When the same procedure is applied to RNA, difficulties arise due to the small (1 Da) mass difference between the nucleotides U and C, which makes unambiguous assignment difficult using a MALDI-TOF instrument. Here we report our experiences with sequence specific endonucleases and chemical methods followed by MALDI-TOF to resolve these sequence ambiguities. We have found chemical methods superior to endonucleases both in terms of correct specificity and extent of sequence coverage. This methodology can be used in combination with exonuclease digestion to rapidly assign RNA sequences. PMID:9421498
Guard, Jean; Sanchez-Ingunza, Roxana; Morales, Cesar; Stewart, Tod; Liljebjelke, Karen; Kessel, JoAnn; Ingram, Kim; Jones, Deana; Jackson, Charlene; Fedorka-Cray, Paula; Frye, Jonathan; Gast, Richard; Hinton, Arthur
2012-01-01
Two DNA-based methods were compared for the ability to assign serotype to 139 isolates of Salmonella enterica ssp. I. Intergenic sequence ribotyping (ISR) evaluated single nucleotide polymorphisms occurring in a 5S ribosomal gene region and flanking sequences bordering the gene dkgB. A DNA microarray hybridization method that assessed the presence and the absence of sets of genes was the second method. Serotype was assigned for 128 (92.1%) of submissions by the two DNA methods. ISR detected mixtures of serotypes within single colonies and it cost substantially less than Kauffmann–White serotyping and DNA microarray hybridization. Decreasing the cost of serotyping S. enterica while maintaining reliability may encourage routine testing and research. PMID:22998607
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ream, Thomas S.; Haag, J. R.; Wierzbicki, A. T.
2009-01-30
In addition to RNA polymerases I, II, and III, the essential RNA polymerases present in all eukaryotes, plants have two additional nuclear RNA polymerases, abbreviated as Pol IV and Pol V, that play nonredundant roles in siRNA-directed DNA methylation and gene silencing. We show that Arabidopsis Pol IV and Pol V are composed of subunits that are paralogous or identical to the 12 subunits of Pol II. Four subunits of Pol IV are distinct from their Pol II paralogs, six subunits of Pol V are distinct from their Pol II paralogs, and four subunits differ between Pol IV and Polmore » V. Importantly, the subunit differences occur in key positions relative to the template entry and RNA exit paths. Our findings support the hypothesis that Pol IV and Pol V are Pol II-like enzymes that evolved specialized roles in the production of noncoding transcripts for RNA silencing and genome defense.« less
A theory of utility conditionals: Paralogical reasoning from decision-theoretic leakage.
Bonnefon, Jean-François
2009-10-01
Many "if p, then q" conditionals have decision-theoretic features, such as antecedents or consequents that relate to the utility functions of various agents. These decision-theoretic features leak into reasoning processes, resulting in various paralogical conclusions. The theory of utility conditionals offers a unified account of the various forms that this phenomenon can take. The theory is built on 2 main components: (1) a representational tool (the utility grid), which summarizes in compact form the decision-theoretic features of a conditional, and (2) a set of folk axioms of decision, which reflect reasoners' beliefs about the way most agents make their decisions. Applying the folk axioms to the utility grid of a conditional allows for the systematic prediction of the paralogical conclusions invited by the utility grid's decision-theoretic features. The theory of utility conditionals significantly extends the scope of current theories of conditional inference and moves reasoning research toward a greater integration with decision-making research.
Human HOX Proteins Use Diverse and Context-Dependent Motifs to Interact with TALE Class Cofactors.
Dard, Amélie; Reboulet, Jonathan; Jia, Yunlong; Bleicher, Françoise; Duffraisse, Marilyne; Vanaker, Jean-Marc; Forcet, Christelle; Merabet, Samir
2018-03-13
HOX proteins achieve numerous functions by interacting with the TALE class PBX and MEIS cofactors. In contrast to this established partnership in development and disease, how HOX proteins could interact with PBX and MEIS remains unclear. Here, we present a systematic analysis of HOX/PBX/MEIS interaction properties, scanning all paralog groups with human and mouse HOX proteins in vitro and in live cells. We demonstrate that a previously characterized HOX protein motif known to be critical for HOX-PBX interactions becomes dispensable in the presence of MEIS in all except the two most anterior paralog groups. We further identify paralog-specific TALE-binding sites that are used in a highly context-dependent manner. One of these binding sites is involved in the proliferative activity of HOXA7 in breast cancer cells. Together these findings reveal an extraordinary level of interaction flexibility between HOX proteins and their major class of developmental cofactors. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.
A survey of sRNA families in α-proteobacteria
del Val, Coral; Romero-Zaliz, Rocío; Torres-Quesada, Omar; Peregrina, Alexandra; Toro, Nicolás; Jiménez-Zurdo, Jose I
2012-01-01
We have performed a computational comparative analysis of six small non-coding RNA (sRNA) families in α-proteobacteria. Members of these families were first identified in the intergenic regions of the nitrogen-fixing endosymbiont S. meliloti by a combined bioinformatics screen followed by experimental verification. Consensus secondary structures inferred from covariance models for each sRNA family evidenced in some cases conserved motifs putatively relevant to the function of trans-encoded base-pairing sRNAs i.e., Hfq-binding signatures and exposed anti Shine-Dalgarno sequences. Two particular family models, namely αr15 and αr35, shared own sub-structural modules with the Rfam model suhB (RF00519) and the uncharacterized sRNA family αr35b, respectively. A third sRNA family, termed αr45, has homology to the cis-acting regulatory element speF (RF00518). However, new experimental data further confirmed that the S. meliloti αr45 representative is an Hfq-binding sRNA processed from or expressed independently of speF, thus refining the Rfam speF model annotation. All the six families have members in phylogenetically related plant-interacting bacteria and animal pathogens of the order of the Rhizobiales, some occurring with high levels of paralogy in individual genomes. In silico and experimental evidences predict differential regulation of paralogous sRNAs in S. meliloti 1021. The distribution patterns of these sRNA families suggest major contributions of vertical inheritance and extensive ancestral duplication events to the evolution of sRNAs in plant-interacting bacteria. PMID:22418845
Evolution and Classification of Myosins, a Paneukaryotic Whole-Genome Approach
Sebé-Pedrós, Arnau; Grau-Bové, Xavier; Richards, Thomas A.; Ruiz-Trillo, Iñaki
2014-01-01
Myosins are key components of the eukaryotic cytoskeleton, providing motility for a broad diversity of cargoes. Therefore, understanding the origin and evolutionary history of myosin classes is crucial to address the evolution of eukaryote cell biology. Here, we revise the classification of myosins using an updated taxon sampling that includes newly or recently sequenced genomes and transcriptomes from key taxa. We performed a survey of eukaryotic genomes and phylogenetic analyses of the myosin gene family, reconstructing the myosin toolkit at different key nodes in the eukaryotic tree of life. We also identified the phylogenetic distribution of myosin diversity in terms of number of genes, associated protein domains and number of classes in each taxa. Our analyses show that new classes (i.e., paralogs) and domain architectures were continuously generated throughout eukaryote evolution, with a significant expansion of myosin abundance and domain architectural diversity at the stem of Holozoa, predating the origin of animal multicellularity. Indeed, single-celled holozoans have the most complex myosin complement among eukaryotes, with paralogs of most myosins previously considered animal specific. We recover a dynamic evolutionary history, with several lineage-specific expansions (e.g., the myosin III-like gene family diversification in choanoflagellates), convergence in protein domain architectures (e.g., fungal and animal chitin synthase myosins), and important secondary losses. Overall, our evolutionary scheme demonstrates that the ancestral eukaryote likely had a complex myosin repertoire that included six genes with different protein domain architectures. Finally, we provide an integrative and robust classification, useful for future genomic and functional studies on this crucial eukaryotic gene family. PMID:24443438
Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng
2017-05-10
Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .
Evolution and Expression of Tissue Globins in Ray-Finned Fishes
Gallagher, Michael D.
2017-01-01
The globin gene family encodes oxygen-binding hemeproteins conserved across the major branches of multicellular life. The origins and evolutionary histories of complete globin repertoires have been established for many vertebrates, but there remain major knowledge gaps for ray-finned fish. Therefore, we used phylogenetic, comparative genomic and gene expression analyses to discover and characterize canonical “non-blood” globin family members (i.e., myoglobin, cytoglobin, neuroglobin, globin-X, and globin-Y) across multiple ray-finned fish lineages, revealing novel gene duplicates (paralogs) conserved from whole genome duplication (WGD) and small-scale duplication events. Our key findings were that: (1) globin-X paralogs in teleosts have been retained from the teleost-specific WGD, (2) functional paralogs of cytoglobin, neuroglobin, and globin-X, but not myoglobin, have been conserved from the salmonid-specific WGD, (3) triplicate lineage-specific myoglobin paralogs are conserved in arowanas (Osteoglossiformes), which arose by tandem duplication and diverged under positive selection, (4) globin-Y is retained in multiple early branching fish lineages that diverged before teleosts, and (5) marked variation in tissue-specific expression of globin gene repertoires exists across ray-finned fish evolution, including several previously uncharacterized sites of expression. In this respect, our data provide an interesting link between myoglobin expression and the evolution of air breathing in teleosts. Together, our findings demonstrate great-unrecognized diversity in the repertoire and expression of nonblood globins that has arisen during ray-finned fish evolution. PMID:28173090
Transcriptome Analysis of Sarracenia, an Insectivorous Plant
Srivastava, Anuj; Rogers, Willie L.; Breton, Catherine M.; Cai, Liming; Malmberg, Russell L.
2011-01-01
Sarracenia species (pitcher plants) are carnivorous plants which obtain a portion of their nutrients from insects captured in the pitchers. To investigate these plants, we sequenced the transcriptome of two species, Sarracenia psittacina and Sarracenia purpurea, using Roche 454 pyrosequencing technology. We obtained 46 275 and 36 681 contigs by de novo assembly methods for S. psittacina and S. purpurea, respectively, and further identified 16 163 orthologous contigs between them. Estimation of synonymous substitution rates between orthologous and paralogous contigs indicates the events of genome duplication and speciation within the Sarracenia genus both occurred ∼2 million years ago. The ratios of synonymous and non-synonymous substitution rates indicated that 491 contigs have been under positive selection (Ka/Ks > 1). Significant proportions of these contigs were involved in functions related to binding activity. We also found that the greatest sequence similarity for both of these species was to Vitis vinifera, which is most consistent with a non-current classification of the order Ericales as an asterid. This study has provided new insights into pitcher plants and will contribute greatly to future research on this genus and its distinctive ecological adaptations. PMID:21676972
Transcriptome analysis of sarracenia, an insectivorous plant.
Srivastava, Anuj; Rogers, Willie L; Breton, Catherine M; Cai, Liming; Malmberg, Russell L
2011-08-01
Sarracenia species (pitcher plants) are carnivorous plants which obtain a portion of their nutrients from insects captured in the pitchers. To investigate these plants, we sequenced the transcriptome of two species, Sarracenia psittacina and Sarracenia purpurea, using Roche 454 pyrosequencing technology. We obtained 46 275 and 36 681 contigs by de novo assembly methods for S. psittacina and S. purpurea, respectively, and further identified 16 163 orthologous contigs between them. Estimation of synonymous substitution rates between orthologous and paralogous contigs indicates the events of genome duplication and speciation within the Sarracenia genus both occurred ∼2 million years ago. The ratios of synonymous and non-synonymous substitution rates indicated that 491 contigs have been under positive selection (K(a)/K(s) > 1). Significant proportions of these contigs were involved in functions related to binding activity. We also found that the greatest sequence similarity for both of these species was to Vitis vinifera, which is most consistent with a non-current classification of the order Ericales as an asterid. This study has provided new insights into pitcher plants and will contribute greatly to future research on this genus and its distinctive ecological adaptations.
Lovejoy, David A; Pavlović, Téa
2015-11-01
In humans, the teneurin gene family consists of four highly conserved paralogous genes that are the result of early vertebrate gene duplications arising from a gene introduced into multicellular organisms from a bacterial ancestor. In vertebrates and humans, the teneurins have become integrated into a number of critical physiological systems including several aspects of reproductive physiology. Structurally complex, these genes possess a sequence in their terminal exon that encodes for a bioactive peptide sequence termed the 'teneurin C-terminal associated peptide' (TCAP). The teneurin/TCAP protein forms an intercellular adhesive unit with its receptor, latrophilin, an Adhesion family G-protein coupled receptor. It is present in numerous cell types and has been implicated in gamete migration and gonadal morphology. Moreover, TCAP is highly effective at reducing the corticotropin-releasing factor (CRF) stress response. As a result, TCAP may also play a role in regulating the stress-associated inhibition of reproduction. In addition, the teneurins and TCAP have been implicated in tumorigenesis associated with reproductive tissues. Therefore, the teneurin/TCAP system may offer clinicians a novel biomarker system upon which to diagnose some reproductive pathologies.
Park, Eonyoung; Maquat, Lynne E.
2013-01-01
Staufen1 (STAU1)-mediated mRNA decay (SMD) is an mRNA degradation process in mammalian cells that is mediated by the binding of STAU1 to a STAU1-binding site (SBS) within the 3'-untranslated region (3'UTR) of target mRNAs. During SMD, STAU1, a double-stranded (ds) RNA-binding protein, recognizes dsRNA structures formed either by intramolecular base-pairing of 3'UTR sequences or by intermolecular base-pairing of 3'UTR sequences with a long noncoding RNA (lncRNA) via partially complementary Alu elements. Recently, STAU2, a paralog of STAU1, has also been reported to mediate SMD. Both STAU1 and STAU2 interact directly with the ATP-dependent RNA helicase UPF1, a key SMD factor, enhancing its helicase activity to promote effective SMD. Moreover, STAU1 and STAU2 form homodimeric and heterodimeric interactions via domain-swapping. Since both SMD and the mechanistically related nonsense-mediated mRNA decay (NMD) employ UPF1, SMD and NMD are competitive pathways. Competition contributes to cellular differentiation processes, such as myogenesis and adipogenesis, placing SMD at the heart of various physiologically important mechanisms. PMID:23681777
A draft annotation and overview of the human genome
Wright, Fred A; Lemon, William J; Zhao, Wei D; Sears, Russell; Zhuo, Degen; Wang, Jian-Ping; Yang, Hee-Yung; Baer, Troy; Stredney, Don; Spitzner, Joe; Stutz, Al; Krahe, Ralf; Yuan, Bo
2001-01-01
Background The recent draft assembly of the human genome provides a unified basis for describing genomic structure and function. The draft is sufficiently accurate to provide useful annotation, enabling direct observations of previously inferred biological phenomena. Results We report here a functionally annotated human gene index placed directly on the genome. The index is based on the integration of public transcript, protein, and mapping information, supplemented with computational prediction. We describe numerous global features of the genome and examine the relationship of various genetic maps with the assembly. In addition, initial sequence analysis reveals highly ordered chromosomal landscapes associated with paralogous gene clusters and distinct functional compartments. Finally, these annotation data were synthesized to produce observations of gene density and number that accord well with historical estimates. Such a global approach had previously been described only for chromosomes 21 and 22, which together account for 2.2% of the genome. Conclusions We estimate that the genome contains 65,000-75,000 transcriptional units, with exon sequences comprising 4%. The creation of a comprehensive gene index requires the synthesis of all available computational and experimental evidence. PMID:11516338
Rogers, Julia M; Bulyk, Martha L
2018-04-25
Sequence-specific transcription factors (TFs) bind short DNA sequences in the genome to regulate the expression of target genes. In the last decade, numerous technical advances have enabled the determination of the DNA-binding specificities of many of these factors. Large-scale screens of many TFs enabled the creation of databases of TF DNA-binding specificities, typically represented as position weight matrices (PWMs). Although great progress has been made in determining and predicting binding specificities systematically, there are still many surprises to be found when studying a particular TF's interactions with DNA in detail. Paralogous TFs' binding specificities can differ in subtle ways, in a manner that is not immediately apparent from looking at their PWMs. These differences affect gene regulatory outputs and enable TFs to rewire transcriptional networks over evolutionary time. This review discusses recent observations made in the study of TF-DNA interactions that highlight the importance of continued in-depth analysis of TF-DNA interactions and their inherent complexity. This article is categorized under: Biological Mechanisms > Regulatory Biology. © 2018 Wiley Periodicals, Inc.
BiDiBlast: comparative genomics pipeline for the PC.
de Almeida, João M G C F
2010-06-01
Bi-directional BLAST is a simple approach to detect, annotate, and analyze candidate orthologous or paralogous sequences in a single go. This procedure is usually confined to the realm of customized Perl scripts, usually tuned for UNIX-like environments. Porting those scripts to other operating systems involves refactoring them, and also the installation of the Perl programming environment with the required libraries. To overcome these limitations, a data pipeline was implemented in Java. This application submits two batches of sequences to local versions of the NCBI BLAST tool, manages result lists, and refines both bi-directional and simple hits. GO Slim terms are attached to hits, several statistics are derived, and molecular evolution rates are estimated through PAML. The results are written to a set of delimited text tables intended for further analysis. The provided graphic user interface allows a friendly interaction with this application, which is documented and available to download at http://moodle.fct.unl.pt/course/view.php?id=2079 or https://sourceforge.net/projects/bidiblast/ under the GNU GPL license. Copyright 2010 Beijing Genomics Institute. Published by Elsevier Ltd. All rights reserved.
Hellman, Maarit; Piirainen, Henni; Jaakola, Veli-Pekka; Permi, Perttu
2014-01-01
NMR spectroscopy is by far the most versatile and information rich technique to study intrinsically disordered proteins (IDPs). While NMR is able to offer residue level information on structure and dynamics, assignment of chemical shift resonances in IDPs is not a straightforward process. Consequently, numerous pulse sequences and assignment protocols have been developed during past several years, targeted especially for the assignment of IDPs, including experiments that employ H(N), H(α) or (13)C detection combined with two to six indirectly detected dimensions. Here we propose two new HN-detection based pulse sequences, (HCA)CON(CAN)H and (HCA)N(CA)CO(N)H, that provide correlations with (1)H(N)(i - 1), (13)C'(i - 1) and (15)N(i), and (1)H(N)(i + 1), (13)C'(i) and (15)N(i) frequencies, respectively. Most importantly, they offer sequential links across the proline bridges and enable filling the single proline gaps during the assignment. We show that the novel experiments can efficiently complement the information available from existing HNCO and intraresidual i(HCA)CO(CA)NH pulse sequences and their concomitant usage enabled >95 % assignment of backbone resonances in cytoplasmic tail of adenosine receptor A2A in comparison to 73 % complete assignment using the HNCO/i(HCA)CO(CA)NH data alone.
Diepeveen, Eveline T; Kim, Fabienne D; Salzburger, Walter
2013-07-17
Gen(om)e duplication events are hypothesized as key mechanisms underlying the origin of phenotypic diversity and evolutionary innovation. The diverse and species-rich lineage of teleost fishes is a renowned example of this scenario, because of the fish-specific genome duplication. Gene families, generated by this and other gene duplication events, have been previously found to play a role in the evolution and development of innovations in cichlid fishes - a prime model system to study the genetic basis of rapid speciation, adaptation and evolutionary innovation. The distal-less homeobox genes are particularly interesting candidate genes for evolutionary novelties, such as the pharyngeal jaw apparatus and the anal fin egg-spots. Here we study the dlx repertoire in 23 East African cichlid fishes to determine the rate of evolution and the signatures of selection pressure. Four intact dlx clusters were retrieved from cichlid draft genomes. Phylogenetic analyses of these eight dlx loci in ten teleost species, followed by an in-depth analysis of 23 East African cichlid species, show that there is disparity in the rates of evolution of the dlx paralogs. Dlx3a and dlx4b are the fastest evolving dlx genes, while dlx1a and dlx6a evolved more slowly. Subsequent analyses of the nonsynonymous-synonymous substitution rate ratios indicate that dlx3b, dlx4a and dlx5a evolved under purifying selection, while signs of positive selection were found for dlx1a, dlx2a, dlx3a and dlx4b. Our results indicate that the dlx repertoire of teleost fishes and cichlid fishes in particular, is shaped by differential selection pressures and rates of evolution after gene duplication. Although the divergence of the dlx paralogs are putative signs of new or altered functions, comparisons with available expression patterns indicate that the three dlx loci under strong purifying selection, dlx3b, dlx4a and dlx5a, are transcribed at high levels in the cichlids' pharyngeal jaw and anal fin. The dlx paralogs emerge as excellent candidate genes for the development of evolutionary innovations in cichlids, although further functional analyses are necessary to elucidate their respective contribution.
Fetterman, Christina D; Rannala, Bruce; Walter, Michael A
2008-09-24
Members of the forkhead gene family act as transcription regulators in biological processes including development and metabolism. The evolution of forkhead genes has not been widely examined and selection pressures at the molecular level influencing subfamily evolution and differentiation have not been explored. Here, in silico methods were used to examine selection pressures acting on the coding sequence of five multi-species FOX protein subfamily clusters; FoxA, FoxD, FoxI, FoxO and FoxP. Application of site models, which estimate overall selection pressures on individual codons throughout the phylogeny, showed that the amino acid changes observed were either neutral or under negative selection. Branch-site models, which allow estimated selection pressures along specified lineages to vary as compared to the remaining phylogeny, identified positive selection along branches leading to the FoxA3 and Protostomia clades in the FoxA cluster and the branch leading to the FoxO3 clade in the FoxO cluster. Residues that may differentiate paralogs were identified in the FoxA and FoxO clusters and residues that differentiate orthologs were identified in the FoxA cluster. Neutral amino acid changes were identified in the forkhead domain of the FoxA, FoxD and FoxP clusters while positive selection was identified in the forkhead domain of the Protostomia lineage of the FoxA cluster. A series of residues under strong negative selection adjacent to the N- and C-termini of the forkhead domain were identified in all clusters analyzed suggesting a new method for refinement of domain boundaries. Extrapolation of domains among cluster members in conjunction with selection pressure information allowed prediction of residue function in the FoxA, FoxO and FoxP clusters and exclusion of known domain function in residues of the FoxA and FoxI clusters. Consideration of selection pressures observed in conjunction with known functional information allowed prediction of residue function and refinement of domain boundaries. Identification of residues that differentiate orthologs and paralogs provided insight into the development and functional consequences of paralogs and forkhead subfamily composition differences among species. Overall we found that after gene duplication of forkhead family members, rapid differentiation and subsequent fixation of amino acid changes through negative selection has occurred.
Fragment assignment in the cloud with eXpress-D
2013-01-01
Background Probabilistic assignment of ambiguously mapped fragments produced by high-throughput sequencing experiments has been demonstrated to greatly improve accuracy in the analysis of RNA-Seq and ChIP-Seq, and is an essential step in many other sequence census experiments. A maximum likelihood method using the expectation-maximization (EM) algorithm for optimization is commonly used to solve this problem. However, batch EM-based approaches do not scale well with the size of sequencing datasets, which have been increasing dramatically over the past few years. Thus, current approaches to fragment assignment rely on heuristics or approximations for tractability. Results We present an implementation of a distributed EM solution to the fragment assignment problem using Spark, a data analytics framework that can scale by leveraging compute clusters within datacenters–“the cloud”. We demonstrate that our implementation easily scales to billions of sequenced fragments, while providing the exact maximum likelihood assignment of ambiguous fragments. The accuracy of the method is shown to be an improvement over the most widely used tools available and can be run in a constant amount of time when cluster resources are scaled linearly with the amount of input data. Conclusions The cloud offers one solution for the difficulties faced in the analysis of massive high-thoughput sequencing data, which continue to grow rapidly. Researchers in bioinformatics must follow developments in distributed systems–such as new frameworks like Spark–for ways to port existing methods to the cloud and help them scale to the datasets of the future. Our software, eXpress-D, is freely available at: http://github.com/adarob/express-d. PMID:24314033
Dores, Robert M.
2016-01-01
The evolution of the melanocortin receptors (MCRs) is closely associated with the evolution of the melanocortin-2 receptor accessory proteins (MRAPs). Recent annotation of the elephant shark genome project revealed the sequence of a putative MRAP1 ortholog. The presence of this sequence in the genome of a cartilaginous fish raises the possibility that the mrap1 and mrap2 genes in the genomes of gnathostome vertebrates were the result of the chordate 2R genome duplication event. The presence of a putative MRAP1 ortholog in a cartilaginous fish genome is perplexing. Recent studies on melanocortin-2 receptor (MC2R) in the genomes of the elephant shark and the Japanese stingray indicate that these MC2R orthologs can be functionally expressed in CHO cells without co-expression of an exogenous mrap1 cDNA. The novel ligand selectivity of these cartilaginous fish MC2R orthologs is discussed. Finally, the origin of the mc2r and mc5r genes is reevaluated. The distinctive primary sequence conservation of MC2R and MC5R is discussed in light of the physiological roles of these two MCR paralogs. PMID:27445982
Pydiura, N A; Bayer, G Ya; Galinousky, D V; Yemets, A I; Pirko, Ya V; Podvitski, T A; Anisimova, N V; Khotyleva, L V; Kilchevsky, A V; Blume, Ya B
2015-01-01
A bioinformatic search of sequences encoding cellulose synthase genes in the flax genome, and their comparison to dicots orthologs was carried out. The analysis revealed 32 cellulose synthase gene candidates, 16 of which are highly likely to encode cellulose synthases, and the remaining 16--cellulose synthase-like proteins (Csl). Phylogenetic analysis of gene products of cellulose synthase genes allowed distinguishing 6 groups of cellulose synthase genes of different classes: CesA1/10, CesA3, CesA4, CesA5/6/2/9, CesA7 and CesA8. Paralogous sequences within classes CesA1/10 and CesA5/6/2/9 which are associated with the primary cell wall formation are characterized by a greater similarity within these classes than orthologous sequences. Whereas the genes controlling the biosynthesis of secondary cell wall cellulose form distinct clades: CesA4, CesA7, and CesA8. The analysis of 16 identified flax cellulose synthase gene candidates shows the presence of at least 12 different cellulose synthase gene variants in flax genome which are represented in all six clades of cellulose synthase genes. Thus, at this point genes of all ten known cellulose synthase classes are identify in flax genome, but their correct classification requires additional research.
A framework genetic map for Miscanthus sinensis from RNAseq-based markers shows recent tetraploidy
2012-01-01
Background Miscanthus (subtribe Saccharinae, tribe Andropogoneae, family Poaceae) is a genus of temperate perennial C4 grasses whose high biomass production makes it, along with its close relatives sugarcane and sorghum, attractive as a biofuel feedstock. The base chromosome number of Miscanthus (x = 19) is different from that of other Saccharinae and approximately twice that of the related Sorghum bicolor (x = 10), suggesting large-scale duplications may have occurred in recent ancestors of Miscanthus. Owing to the complexity of the Miscanthus genome and the complications of self-incompatibility, a complete genetic map with a high density of markers has not yet been developed. Results We used deep transcriptome sequencing (RNAseq) from two M. sinensis accessions to define 1536 single nucleotide variants (SNVs) for a GoldenGate™ genotyping array, and found that simple sequence repeat (SSR) markers defined in sugarcane are often informative in M. sinensis. A total of 658 SNP and 210 SSR markers were validated via segregation in a full sibling F1 mapping population. Using 221 progeny from this mapping population, we constructed a genetic map for M. sinensis that resolves into 19 linkage groups, the haploid chromosome number expected from cytological evidence. Comparative genomic analysis documents a genome-wide duplication in Miscanthus relative to Sorghum bicolor, with subsequent insertional fusion of a pair of chromosomes. The utility of the map is confirmed by the identification of two paralogous C4-pyruvate, phosphate dikinase (C4-PPDK) loci in Miscanthus, at positions syntenic to the single orthologous gene in Sorghum. Conclusions The genus Miscanthus experienced an ancestral tetraploidy and chromosome fusion prior to its diversification, but after its divergence from the closely related sugarcane clade. The recent timing of this tetraploidy complicates discovery and mapping of genetic markers for Miscanthus species, since alleles and fixed differences between paralogs are comparable. These difficulties can be overcome by careful analysis of segregation patterns in a mapping population and genotyping of doubled haploids. The genetic map for Miscanthus will be useful in biological discovery and breeding efforts to improve this emerging biofuel crop, and also provide a valuable resource for understanding genomic responses to tetraploidy and chromosome fusion. PMID:22524439
Faës, Pascal; Deleu, Carole; Aïnouche, Abdelkader; Le Cahérec, Françoise; Montes, Emilie; Clouet, Vanessa; Gouraud, Anne-Marie; Albert, Benjamin; Orsel, Mathilde; Lassalle, Gilles; Leport, Laurent; Bouchereau, Alain; Niogret, Marie-Françoise
2015-02-01
Six BnaProDH1 and two BnaProDH2 genes were identified in Brassica napus genome. The BnaProDH1 genes are mainly expressed in pollen and roots' organs while BnaProDH2 gene expression is associated with leaf vascular tissues at senescence. Proline dehydrogenase (ProDH) catalyzes the first step in the catabolism of proline. The ProDH gene family in oilseed rape (Brassica napus) was characterized and compared to other Brassicaceae ProDH sequences to establish the phylogenetic relationships between genes. Six BnaProDH1 genes and two BnaProDH2 genes were identified in the B. napus genome. Expression of the three paralogous pairs of BnaProDH1 genes and the two homoeologous BnaProDH2 genes was measured by real-time quantitative RT-PCR in plants at vegetative and reproductive stages. The BnaProDH2 genes are specifically expressed in vasculature in an age-dependent manner, while BnaProDH1 genes are strongly expressed in pollen grains and roots. Compared to the abundant expression of BnaProDH1, the overall expression of BnaProDH2 is low except in roots and senescent leaves. The BnaProDH1 paralogs showed different levels of expression with BnaA&C.ProDH1.a the most strongly expressed and BnaA&C.ProDH1.c the least. The promoters of two BnaProDH1 and two BnaProDH2 genes were fused with uidA reporter gene (GUS) to characterize organ and tissue expression profiles in transformed B. napus plants. The transformants with promoters from different genes showed contrasting patterns of GUS activity, which corresponded to the spatial expression of their respective transcripts. ProDHs probably have non-redundant functions in different organs and at different phenological stages. In terms of molecular evolution, all BnaProDH sequences appear to have undergone strong purifying selection and some copies are becoming subfunctionalized. This detailed description of oilseed rape ProDH genes provides new elements to investigate the function of proline metabolism in plant development.
A universal genomic coordinate translator for comparative genomics
2014-01-01
Background Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. Results Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. Conclusions Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken. PMID:24976580
A universal genomic coordinate translator for comparative genomics.
Zamani, Neda; Sundström, Görel; Meadows, Jennifer R S; Höppner, Marc P; Dainat, Jacques; Lantz, Henrik; Haas, Brian J; Grabherr, Manfred G
2014-06-30
Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken.
ERIC Educational Resources Information Center
Portland Project Committee, OR.
This teacher's guide includes parts one and two of the four-part third year Portland Project, a three-year integrated secondary science curriculum sequence. The Harvard Project Physics textbook is used for reading assignments for part one. Assignments relate to waves, light, electricity, magnetic fields, Faraday and the electrical age,…
Molecular evolution of the polyamine oxidase gene family in Metazoa
2012-01-01
Background Polyamine oxidase enzymes catalyze the oxidation of polyamines and acetylpolyamines. Since polyamines are basic regulators of cell growth and proliferation, their homeostasis is crucial for cell life. Members of the polyamine oxidase gene family have been identified in a wide variety of animals, including vertebrates, arthropodes, nematodes, placozoa, as well as in plants and fungi. Polyamine oxidases (PAOs) from yeast can oxidize spermine, N1-acetylspermine, and N1-acetylspermidine, however, in vertebrates two different enzymes, namely spermine oxidase (SMO) and acetylpolyamine oxidase (APAO), specifically catalyze the oxidation of spermine, and N1-acetylspermine/N1-acetylspermidine, respectively. Little is known about the molecular evolutionary history of these enzymes. However, since the yeast PAO is able to catalyze the oxidation of both acetylated and non acetylated polyamines, and in vertebrates these functions are addressed by two specialized polyamine oxidase subfamilies (APAO and SMO), it can be hypothesized an ancestral reference for the former enzyme from which the latter would have been derived. Results We analysed 36 SMO, 26 APAO, and 14 PAO homologue protein sequences from 54 taxa including various vertebrates and invertebrates. The analysis of the full-length sequences and the principal domains of vertebrate and invertebrate PAOs yielded consensus primary protein sequences for vertebrate SMOs and APAOs, and invertebrate PAOs. This analysis, coupled to molecular modeling techniques, also unveiled sequence regions that confer specific structural and functional properties, including substrate specificity, by the different PAO subfamilies. Molecular phylogenetic trees revealed a basal position of all the invertebrates PAO enzymes relative to vertebrate SMOs and APAOs. PAOs from insects constitute a monophyletic clade. Two PAO variants sampled in the amphioxus are basal to the dichotomy between two well supported monophyletic clades including, respectively, all the SMOs and APAOs from vertebrates. The two vertebrate monophyletic clades clustered strictly mirroring the organismal phylogeny of fishes, amphibians, reptiles, birds, and mammals. Evidences from comparative genomic analysis, structural evolution and functional divergence in a phylogenetic framework across Metazoa suggested an evolutionary scenario where the ancestor PAO coding sequence, present in invertebrates as an orthologous gene, has been duplicated in the vertebrate branch to originate the paralogous SMO and APAO genes. A further genome evolution event concerns the SMO gene of placental, but not marsupial and monotremate, mammals which increased its functional variation following an alternative splicing (AS) mechanism. Conclusions In this study the explicit integration in a phylogenomic framework of phylogenetic tree construction, structure prediction, and biochemical function data/prediction, allowed inferring the molecular evolutionary history of the PAO gene family and to disambiguate paralogous genes related by duplication event (SMO and APAO) and orthologous genes related by speciation events (PAOs, SMOs/APAOs). Further, while in vertebrates experimental data corroborate SMO and APAO molecular function predictions, in invertebrates the finding of a supported phylogenetic clusters of insect PAOs and the co-occurrence of two PAO variants in the amphioxus urgently claim the need for future structure-function studies. PMID:22716069
Molecular evolution of the polyamine oxidase gene family in Metazoa.
Polticelli, Fabio; Salvi, Daniele; Mariottini, Paolo; Amendola, Roberto; Cervelli, Manuela
2012-06-20
Polyamine oxidase enzymes catalyze the oxidation of polyamines and acetylpolyamines. Since polyamines are basic regulators of cell growth and proliferation, their homeostasis is crucial for cell life. Members of the polyamine oxidase gene family have been identified in a wide variety of animals, including vertebrates, arthropodes, nematodes, placozoa, as well as in plants and fungi. Polyamine oxidases (PAOs) from yeast can oxidize spermine, N1-acetylspermine, and N1-acetylspermidine, however, in vertebrates two different enzymes, namely spermine oxidase (SMO) and acetylpolyamine oxidase (APAO), specifically catalyze the oxidation of spermine, and N1-acetylspermine/N1-acetylspermidine, respectively. Little is known about the molecular evolutionary history of these enzymes. However, since the yeast PAO is able to catalyze the oxidation of both acetylated and non acetylated polyamines, and in vertebrates these functions are addressed by two specialized polyamine oxidase subfamilies (APAO and SMO), it can be hypothesized an ancestral reference for the former enzyme from which the latter would have been derived. We analysed 36 SMO, 26 APAO, and 14 PAO homologue protein sequences from 54 taxa including various vertebrates and invertebrates. The analysis of the full-length sequences and the principal domains of vertebrate and invertebrate PAOs yielded consensus primary protein sequences for vertebrate SMOs and APAOs, and invertebrate PAOs. This analysis, coupled to molecular modeling techniques, also unveiled sequence regions that confer specific structural and functional properties, including substrate specificity, by the different PAO subfamilies. Molecular phylogenetic trees revealed a basal position of all the invertebrates PAO enzymes relative to vertebrate SMOs and APAOs. PAOs from insects constitute a monophyletic clade. Two PAO variants sampled in the amphioxus are basal to the dichotomy between two well supported monophyletic clades including, respectively, all the SMOs and APAOs from vertebrates. The two vertebrate monophyletic clades clustered strictly mirroring the organismal phylogeny of fishes, amphibians, reptiles, birds, and mammals. Evidences from comparative genomic analysis, structural evolution and functional divergence in a phylogenetic framework across Metazoa suggested an evolutionary scenario where the ancestor PAO coding sequence, present in invertebrates as an orthologous gene, has been duplicated in the vertebrate branch to originate the paralogous SMO and APAO genes. A further genome evolution event concerns the SMO gene of placental, but not marsupial and monotremate, mammals which increased its functional variation following an alternative splicing (AS) mechanism. In this study the explicit integration in a phylogenomic framework of phylogenetic tree construction, structure prediction, and biochemical function data/prediction, allowed inferring the molecular evolutionary history of the PAO gene family and to disambiguate paralogous genes related by duplication event (SMO and APAO) and orthologous genes related by speciation events (PAOs, SMOs/APAOs). Further, while in vertebrates experimental data corroborate SMO and APAO molecular function predictions, in invertebrates the finding of a supported phylogenetic clusters of insect PAOs and the co-occurrence of two PAO variants in the amphioxus urgently claim the need for future structure-function studies.
2011-01-01
Background The inorganic (Pi) phosphate transporter (PiT) family comprises known and putative Na+- or H+-dependent Pi-transporting proteins with representatives from all kingdoms. The mammalian members are placed in the outer cell membranes and suggested to supply cells with Pi to maintain house-keeping functions. Alignment of protein sequences representing PiT family members from all kingdoms reveals the presence of conserved amino acids and that bacterial phosphate permeases and putative phosphate permeases from archaea lack substantial parts of the protein sequence when compared to the mammalian PiT family members. Besides being Na+-dependent Pi (NaPi) transporters, the mammalian PiT paralogs, PiT1 and PiT2, also are receptors for gamma-retroviruses. We have here exploited the dual-function of PiT1 and PiT2 to study the structure-function relationship of PiT proteins. Results We show that the human PiT2 histidine, H502, and the human PiT1 glutamate, E70, - both conserved in eukaryotic PiT family members - are critical for Pi transport function. Noticeably, human PiT2 H502 is located in the C-terminal PiT family signature sequence, and human PiT1 E70 is located in ProDom domains characteristic for all PiT family members. A human PiT2 truncation mutant, which consists of the predicted 10 transmembrane (TM) domain backbone without a large intracellular domain (human PiT2ΔR254-V483), was found to be a fully functional Pi transporter. Further truncation of the human PiT2 protein by additional removal of two predicted TM domains together with the large intracellular domain created a mutant that resembles a bacterial phosphate permease and an archaeal putative phosphate permease. This human PiT2 truncation mutant (human PiT2ΔL183-V483) did also support Pi transport albeit at very low levels. Conclusions The results suggest that the overall structure of the Pi-transporting unit of the PiT family proteins has remained unchanged during evolution. Moreover, in combination, our studies of the gene structure of the human PiT1 and PiT2 genes (SLC20A1 and SLC20A2, respectively) and alignment of protein sequences of PiT family members from all kingdoms, along with the studies of the dual functions of the human PiT paralogs show that these proteins are excellent as models for studying the evolution of a protein's structure-function relationship. PMID:21586110
Bøttger, Pernille; Pedersen, Lene
2011-05-17
The inorganic (Pi) phosphate transporter (PiT) family comprises known and putative Na(+)- or H(+)-dependent Pi-transporting proteins with representatives from all kingdoms. The mammalian members are placed in the outer cell membranes and suggested to supply cells with Pi to maintain house-keeping functions. Alignment of protein sequences representing PiT family members from all kingdoms reveals the presence of conserved amino acids and that bacterial phosphate permeases and putative phosphate permeases from archaea lack substantial parts of the protein sequence when compared to the mammalian PiT family members. Besides being Na(+)-dependent P(i) (NaP(i)) transporters, the mammalian PiT paralogs, PiT1 and PiT2, also are receptors for gamma-retroviruses. We have here exploited the dual-function of PiT1 and PiT2 to study the structure-function relationship of PiT proteins. We show that the human PiT2 histidine, H(502), and the human PiT1 glutamate, E(70),--both conserved in eukaryotic PiT family members--are critical for P(i) transport function. Noticeably, human PiT2 H(502) is located in the C-terminal PiT family signature sequence, and human PiT1 E(70) is located in ProDom domains characteristic for all PiT family members.A human PiT2 truncation mutant, which consists of the predicted 10 transmembrane (TM) domain backbone without a large intracellular domain (human PiT2ΔR(254)-V(483)), was found to be a fully functional P(i) transporter. Further truncation of the human PiT2 protein by additional removal of two predicted TM domains together with the large intracellular domain created a mutant that resembles a bacterial phosphate permease and an archaeal putative phosphate permease. This human PiT2 truncation mutant (human PiT2ΔL(183)-V(483)) did also support P(i) transport albeit at very low levels. The results suggest that the overall structure of the P(i)-transporting unit of the PiT family proteins has remained unchanged during evolution. Moreover, in combination, our studies of the gene structure of the human PiT1 and PiT2 genes (SLC20A1 and SLC20A2, respectively) and alignment of protein sequences of PiT family members from all kingdoms, along with the studies of the dual functions of the human PiT paralogs show that these proteins are excellent as models for studying the evolution of a protein's structure-function relationship. © 2011 Bøttger and Pedersen; licensee BioMed Central Ltd.
Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas
2009-06-01
The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.
Application of a fast sorting algorithm to the assignment of mass spectrometric cross-linking data.
Petrotchenko, Evgeniy V; Borchers, Christoph H
2014-09-01
Cross-linking combined with MS involves enzymatic digestion of cross-linked proteins and identifying cross-linked peptides. Assignment of cross-linked peptide masses requires a search of all possible binary combinations of peptides from the cross-linked proteins' sequences, which becomes impractical with increasing complexity of the protein system and/or if digestion enzyme specificity is relaxed. Here, we describe the application of a fast sorting algorithm to search large sequence databases for cross-linked peptide assignments based on mass. This same algorithm has been used previously for assigning disulfide-bridged peptides (Choi et al., ), but has not previously been applied to cross-linking studies. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Lappin, Fiona M; Shaw, Rebecca L; Macqueen, Daniel J
2016-12-01
High-throughput sequencing has revolutionised comparative and evolutionary genome biology. It has now become relatively commonplace to generate multiple genomes and/or transcriptomes to characterize the evolution of large taxonomic groups of interest. Nevertheless, such efforts may be unsuited to some research questions or remain beyond the scope of some research groups. Here we show that targeted high-throughput sequencing offers a viable alternative to study genome evolution across a vertebrate family of great scientific interest. Specifically, we exploited sequence capture and Illumina sequencing to characterize the evolution of key components from the insulin-like growth (IGF) signalling axis of salmonid fish at unprecedented phylogenetic resolution. The IGF axis represents a central governor of vertebrate growth and its core components were expanded by whole genome duplication in the salmonid ancestor ~95Ma. Using RNA baits synthesised to genes encoding the complete family of IGF binding proteins (IGFBP) and an IGF hormone (IGF2), we captured, sequenced and assembled orthologous and paralogous exons from species representing all ten salmonid genera. This approach generated 299 novel sequences, most as complete or near-complete protein-coding sequences. Phylogenetic analyses confirmed congruent evolutionary histories for all nineteen recognized salmonid IGFBP family members and identified novel salmonid-specific IGF2 paralogues. Moreover, we reconstructed the evolution of duplicated IGF axis paralogues across a replete salmonid phylogeny, revealing complex historic selection regimes - both ancestral to salmonids and lineage-restricted - that frequently involved asymmetric paralogue divergence under positive and/or relaxed purifying selection. Our findings add to an emerging literature highlighting diverse applications for targeted sequencing in comparative-evolutionary genomics. We also set out a viable approach to obtain large sets of nuclear genes for any member of the salmonid family, which should enable insights into the evolutionary role of whole genome duplication before additional nuclear genome sequences become available. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
The practical evaluation of DNA barcode efficacy.
Spouge, John L; Mariño-Ramírez, Leonardo
2012-01-01
This chapter describes a workflow for measuring the efficacy of a barcode in identifying species. First, assemble individual sequence databases corresponding to each barcode marker. A controlled collection of taxonomic data is preferable to GenBank data, because GenBank data can be problematic, particularly when comparing barcodes based on more than one marker. To ensure proper controls when evaluating species identification, specimens not having a sequence in every marker database should be discarded. Second, select a computer algorithm for assigning species to barcode sequences. No algorithm has yet improved notably on assigning a specimen to the species of its nearest neighbor within a barcode database. Because global sequence alignments (e.g., with the Needleman-Wunsch algorithm, or some related algorithm) examine entire barcode sequences, they generally produce better species assignments than local sequence alignments (e.g., with BLAST). No neighboring method (e.g., global sequence similarity, global sequence distance, or evolutionary distance based on a global alignment) has yet shown a notable superiority in identifying species. Finally, "the probability of correct identification" (PCI) provides an appropriate measurement of barcode efficacy. The overall PCI for a data set is the average of the species PCIs, taken over all species in the data set. This chapter states explicitly how to calculate PCI, how to estimate its statistical sampling error, and how to use data on PCR failure to set limits on how much improvements in PCR technology can improve species identification.
Identification of Mycobacterium spp. of veterinary importance using rpoB gene sequencing
2011-01-01
Background Studies conducted on Mycobacterium spp. isolated from human patients indicate that sequencing of a 711 bp portion of the rpoB gene can be useful in assigning a species identity, particularly for members of the Mycobacterium avium complex (MAC). Given that MAC are important pathogens in livestock, companion animals, and zoo/exotic animals, we were interested in evaluating the use of rpoB sequencing for identification of Mycobacterium isolates of veterinary origin. Results A total of 386 isolates, collected over 2008 - June 2011 from 378 animals (amphibians, reptiles, birds, and mammals) underwent PCR and sequencing of a ~ 711 bp portion of the rpoB gene; 310 isolates (80%) were identified to the species level based on similarity at ≥ 98% with a reference sequence. The remaining 76 isolates (20%) displayed < 98% similarity with reference sequences and were assigned to a clade based on their location in a neighbor-joining tree containing reference sequences. For a subset of 236 isolates that received both 16S rRNA and rpoB sequencing, 167 (70%) displayed a similar species/clade assignation for both sequencing methods. For the remaining 69 isolates, species/clade identities were different with each sequencing method. Mycobacterium avium subsp. hominissuis was the species most frequently isolated from specimens from pigs, cervids, companion animals, cattle, and exotic/zoo animals. Conclusions rpoB sequencing proved useful in identifying Mycobacterium isolates of veterinary origin to clade, species, or subspecies levels, particularly for assemblages (such as the MAC) where 16S rRNA sequencing alone is not adequate to demarcate these taxa. rpoB sequencing can represent a cost-effective identification tool suitable for routine use in the veterinary diagnostic laboratory. PMID:22118247
Domain mapping of the Rad51 paralog protein complexes
Miller, Kristi A.; Sawicka, Dorota; Barsky, Daniel; Albala, Joanna S.
2004-01-01
The five human Rad51 paralogs are suggested to play an important role in the maintenance of genome stability through their function in DNA double-strand break repair. These proteins have been found to form two distinct complexes in vivo, Rad51B–Rad51C–Rad51D–Xrcc2 (BCDX2) and Rad51C–Xrcc3 (CX3). Based on the recent Pyrococcus furiosus Rad51 structure, we have used homology modeling to design deletion mutants of the Rad51 paralogs. The models of the human Rad51B, Rad51C, Xrcc3 and murine Rad51D (mRad51D) proteins reveal distinct N-terminal and C-terminal domains connected by a linker region. Using yeast two-hybrid and co-immunoprecipitation techniques, we have demonstrated that a fragment of Rad51B containing amino acid residues 1–75 interacts with the C-terminus and linker of Rad51C, residues 79–376, and this region of Rad51C also interacts with mRad51D and Xrcc3. We have also determined that the N-terminal domain of mRad51D, residues 4–77, binds to Xrcc2 while the C-terminal domain of mRad51D, residues 77–328, binds Rad51C. By this, we have identified the binding domains of the BCDX2 and CX3 complexes to further characterize the interaction of these proteins and propose a scheme for the three-dimensional architecture of the BCDX2 and CX3 paralog complexes. PMID:14704354
Panina, Ekaterina M; Mironov, Andrey A; Gelfand, Mikhail S
2003-08-19
Zinc is an important component of many proteins, but in large concentrations it is poisonous to the cell. Thus its transport is regulated by zinc repressors ZUR of proteobacteria and Gram-positive bacteria from the Bacillus group and AdcR of bacteria from the Streptococcus group. Comparative computational analysis allowed us to identify binding signals of ZUR repressors GAAATGTTATANTATAACATTTC for gamma-proteobacteria, GTAATGTAATAACATTAC for the Agrobacterium group, GATATGTTATAACATATC for the Rhododoccus group, TAAATCGTAATNATTACGATTTA for Gram-positive bacteria, and TTAACYRGTTAA of the streptococcal AdcR repressor. In addition to known transporters and their paralogs, zinc regulons were predicted to contain a candidate component of the ATP binding cassette, zinT (b1995 in Escherichia coli and yrpE in Bacillus subtilis). Candidate AdcR-binding sites were identified upstream of genes encoding pneumococcal histidine triad (PHT) proteins from a number of pathogenic streptococci. Protein functional analysis of this family suggests that PHT proteins are involved in the invasion process. Finally, repression by zinc was predicted for genes encoding a variety of paralogs of ribosomal proteins. The original copies of all these proteins contain zinc-ribbon motifs and thus likely bind zinc, whereas these motifs are destroyed in zinc-regulated paralogs. We suggest that the induction of these paralogs in conditions of zinc starvation leads to their incorporation in a fraction of ribosomes instead of the original ribosomal proteins; the latter are then degraded with subsequent release of some zinc for the utilization by other proteins. Thus we predict a mechanism for maintaining zinc availability for essential enzymes.
Chung, George; Rose, Ann M.; Petalcorin, Mark I.R.; Martin, Julie S.; Kessler, Zebulin; Sanchez-Pulido, Luis; Ponting, Chris P.; Yanowitz, Judith L.; Boulton, Simon J.
2015-01-01
The Caenorhabditis elegans gene rec-1 was the first genetic locus identified in metazoa to affect the distribution of meiotic crossovers along the chromosome. We report that rec-1 encodes a distant paralog of HIM-5, which was discovered by whole-genome sequencing and confirmed by multiple genome-edited alleles. REC-1 is phosphorylated by cyclin-dependent kinase (CDK) in vitro, and mutation of the CDK consensus sites in REC-1 compromises meiotic crossover distribution in vivo. Unexpectedly, rec-1; him-5 double mutants are synthetic-lethal due to a defect in meiotic double-strand break formation. Thus, we uncovered an unexpected robustness to meiotic DSB formation and crossover positioning that is executed by HIM-5 and REC-1 and regulated by phosphorylation. PMID:26385965
Sammler, Svenja; Ketmaier, Valerio; Havenstein, Katja; Tiedemann, Ralph
2013-12-01
Philippine hornbills of the genera Aceros and Penelopides (Bucerotidae) are known to possess a large tandemly duplicated fragment in their mitochondrial genome, whose paralogous parts largely evolve in concert. In the present study, we surveyed the two distinguishable duplicated control regions in several individuals of the Luzon Tarictic Hornbill Penelopides manillae, compare their characteristics within and across individuals, and report on an intraspecific mitochondrial gene rearrangement found in one single specimen, i.e., an interchange between the two control regions. To our knowledge, this is the first observation of two distinct mitochondrial genome rearrangements within a bird species. We briefly discuss a possible evolutionary mechanism responsible for this pattern, and highlight potential implications for the application of control region sequences as a marker in population genetics and phylogeography.
Genes of the antioxidant system of the honey bee: annotation and phylogeny.
Corona, M; Robinson, G E
2006-10-01
Antioxidant enzymes perform a variety of vital functions including the reduction of life-shortening oxidative damage. We used the honey bee genome sequence to identify the major components of the honey bee antioxidant system. A comparative analysis of honey bee with Drosophila melanogaster and Anopheles gambiae shows that although the basic components of the antioxidant system are conserved, there are important species differences in the number of paralogs. These include the duplication of thioredoxin reductase and the expansion of the thioredoxin family in fly; lack of expansion of the Theta, Delta and Omega GST classes in bee and no expansion of the Sigma class in dipteran species. The differential expansion of antioxidant gene families among honey bees and dipteran species might reflect the marked differences in life history and ecological niches between social and solitary species.
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing technologies were used to rapidly and efficiently sequence the genome of the domestic turkey (Meleagris gallopavo). The current genome assembly (~1.1 Gb) includes 917 Mb of sequence assigned to chromosomes. Innate heterozygosity of the sequenced bird allowed discovery of...
Hindt, Maria N; Akmakjian, Garo Z; Pivarski, Kara L; Punshon, Tracy; Baxter, Ivan; Salt, David E; Guerinot, Mary Lou
2017-07-19
Iron (Fe) is required for plant health, but it can also be toxic when present in excess. Therefore, Fe levels must be tightly controlled. The Arabidopsis thaliana E3 ligase BRUTUS (BTS) is involved in the negative regulation of the Fe deficiency response and we show here that the two A. thaliana BTS paralogs, BTS LIKE1 (BTSL1) and BTS LIKE2 (BTSL2) encode proteins that act redundantly as negative regulators of the Fe deficiency response. Loss of both of these E3 ligases enhances tolerance to Fe deficiency. We further generated a triple mutant with loss of both BTS paralogs and a partial loss of BTS expression that exhibits even greater tolerance to Fe-deficient conditions and increased Fe accumulation without any resulting Fe toxicity effects. Finally, we identified a mutant carrying a novel missense mutation of BTS that exhibits an Fe deficiency response in the root when grown under both Fe-deficient and Fe-sufficient conditions, leading to Fe toxicity when plants are grown under Fe-sufficient conditions.
Liang, Liang; Reinick, Christina; Angleson, Joseph K; Dores, Robert M
2013-01-15
There is general agreement that the presence of five melanocortin receptor genes in tetrapods is the result of two genome duplications that occurred prior to the emergence of the gnathostomes, and at least one local gene duplication that occurred early in the radiation of the ancestral gnathostomes. Hence, it is assumed that representatives from the extant classes of gnathostomes (i.e., Chondrichthyes, Actinopterygii, Sarcopterygii) should also have five paralogous melanocortin genes. Current studies on cartilaginous fishes indicate that while there is evidence for five paralogous melanocortin receptor genes in this class, to date all five paralogs have not been detected in the genome of a single species. This mini-review will discuss the ligand selectivity properties of the melanocortin-3 receptor of the elephant shark (subclass Holocephali) and the ligand selectivity properties of the melanocortin-3 receptor, melanocortin-4 receptor, and the melanocortin-5 receptor of the dogfish (subclass Elasmobranchii). The potential relationship of these melanocortin receptors to the hypothalamus/pituitary/interrenal axis will be discussed. Copyright © 2012 Elsevier Inc. All rights reserved.
The multi-replication protein A (RPA) system--a new perspective.
Sakaguchi, Kengo; Ishibashi, Toyotaka; Uchiyama, Yukinobu; Iwabata, Kazuki
2009-02-01
Replication protein A (RPA) complex has been shown, using both in vivo and in vitro approaches, to be required for most aspects of eukaryotic DNA metabolism: replication, repair, telomere maintenance and homologous recombination. Here, we review recent data concerning the function and biological importance of the multi-RPA complex. There are distinct complexes of RPA found in the biological kingdoms, although for a long time only one type of RPA complex was believed to be present in eukaryotes. Each complex probably serves a different role. In higher plants, three distinct large and medium subunits are present, but only one species of the smallest subunit. Each of these protein subunits forms stable complexes with their respective partners. They are paralogs as complex. Humans possess two paralogs and one analog of RPA. The multi-RPA system can be regarded as universal in eukaryotes. Among eukaryotic kingdoms, paralogs, orthologs, analogs and heterologs of many DNA synthesis-related factors, including RPA, are ubiquitous. Convergent evolution seems to be ubiquitous in these processes. Using recent findings, we review the composition and biological functions of RPA complexes.
Gambogic acid identifies an isoform-specific druggable pocket in the middle domain of Hsp90β
Yim, Kendrick H.; Prince, Thomas L.; Qu, Shiwei; Bai, Fang; Jennings, Patricia A.; Onuchic, José N.; Theodorakis, Emmanuel A.; Neckers, Leonard
2016-01-01
Because of their importance in maintaining protein homeostasis, molecular chaperones, including heat-shock protein 90 (Hsp90), represent attractive drug targets. Although a number of Hsp90 inhibitors are in preclinical/clinical development, none strongly differentiate between constitutively expressed Hsp90β and stress-induced Hsp90α, the two cytosolic paralogs of this molecular chaperone. Thus, the importance of inhibiting one or the other paralog in different disease states remains unknown. We show that the natural product, gambogic acid (GBA), binds selectively to a site in the middle domain of Hsp90β, identifying GBA as an Hsp90β-specific Hsp90 inhibitor. Furthermore, using computational and medicinal chemistry, we identified a GBA analog, referred to as DAP-19, which binds potently and selectively to Hsp90β. Because of its unprecedented selectivity for Hsp90β among all Hsp90 paralogs, GBA thus provides a new chemical tool to study the unique biological role of this abundantly expressed molecular chaperone in health and disease. PMID:27466407
Wei, Chaoling; Yang, Hua; Wang, Songbo; Zhao, Jian; Liu, Chun; Gao, Liping; Xia, Enhua; Lu, Ying; Tai, Yuling; She, Guangbiao; Sun, Jun; Cao, Haisheng; Tong, Wei; Gao, Qiang; Li, Yeyun; Deng, Weiwei; Jiang, Xiaolan; Wang, Wenzhao; Chen, Qi; Zhang, Shihua; Li, Haijing; Wu, Junlan; Wang, Ping; Li, Penghui; Shi, Chengying; Zheng, Fengya; Jian, Jianbo; Huang, Bei; Shan, Dai; Shi, Mingming; Fang, Congbing; Yue, Yi; Li, Fangdong; Li, Daxiang; Wei, Shu; Han, Bin; Jiang, Changjun; Yin, Ye; Xia, Tao; Zhang, Zhengzhu; Bennetzen, Jeffrey L; Zhao, Shancen; Wan, Xiaochun
2018-05-01
Tea, one of the world's most important beverage crops, provides numerous secondary metabolites that account for its rich taste and health benefits. Here we present a high-quality sequence of the genome of tea, Camellia sinensis var. sinensis (CSS), using both Illumina and PacBio sequencing technologies. At least 64% of the 3.1-Gb genome assembly consists of repetitive sequences, and the rest yields 33,932 high-confidence predictions of encoded proteins. Divergence between two major lineages, CSS and Camellia sinensis var. assamica (CSA), is calculated to ∼0.38 to 1.54 million years ago (Mya). Analysis of genic collinearity reveals that the tea genome is the product of two rounds of whole-genome duplications (WGDs) that occurred ∼30 to 40 and ∼90 to 100 Mya. We provide evidence that these WGD events, and subsequent paralogous duplications, had major impacts on the copy numbers of secondary metabolite genes, particularly genes critical to producing three key quality compounds: catechins, theanine, and caffeine. Analyses of transcriptome and phytochemistry data show that amplification and transcriptional divergence of genes encoding a large acyltransferase family and leucoanthocyanidin reductases are associated with the characteristic young leaf accumulation of monomeric galloylated catechins in tea, while functional divergence of a single member of the glutamine synthetase gene family yielded theanine synthetase. This genome sequence will facilitate understanding of tea genome evolution and tea metabolite pathways, and will promote germplasm utilization for breeding improved tea varieties. Copyright © 2018 the Author(s). Published by PNAS.
Nikolaidis, Nikolas; Nei, Masatoshi
2004-03-01
We have identified the Hsp70 gene superfamily of the nematode Caenorhabditis briggsae and investigated the evolution of these genes in comparison with Hsp70 genes from C. elegans, Drosophila, and yeast. The Hsp70 genes are classified into three monophyletic groups according to their subcellular localization, namely, cytoplasm (CYT), endoplasmic reticulum (ER), and mitochondria (MT). The Hsp110 genes can be classified into the polyphyletic CYT group and the monophyletic ER group. The different Hsp70 and Hsp110 groups appeared to evolve following the model of divergent evolution. This model can also explain the evolution of the ER and MT genes. On the other hand, the CYT genes are divided into heat-inducible and constitutively expressed genes. The constitutively expressed genes have evolved more or less following the birth-and-death process, and the rates of gene birth and gene death are different between the two nematode species. By contrast, some heat-inducible genes show an intraspecies phylogenetic clustering. This suggests that they are subject to sequence homogenization resulting from gene conversion-like events. In addition, the heat-inducible genes show high levels of sequence conservation in both intra-species and inter-species comparisons, and in most cases, amino acid sequence similarity is higher than nucleotide sequence similarity. This indicates that purifying selection also plays an important role in maintaining high sequence similarity among paralogous Hsp70 genes. Therefore, we suggest that the CYT heat-inducible genes have been subjected to a combination of purifying selection, birth-and-death process, and gene conversion-like events.
Wei, Chaoling; Yang, Hua; Wang, Songbo; Zhao, Jian; Liu, Chun; Gao, Liping; Xia, Enhua; Lu, Ying; Tai, Yuling; She, Guangbiao; Sun, Jun; Cao, Haisheng; Tong, Wei; Gao, Qiang; Li, Yeyun; Deng, Weiwei; Jiang, Xiaolan; Wang, Wenzhao; Chen, Qi; Zhang, Shihua; Li, Haijing; Wu, Junlan; Wang, Ping; Li, Penghui; Shi, Chengying; Zheng, Fengya; Jian, Jianbo; Huang, Bei; Shan, Dai; Shi, Mingming; Fang, Congbing; Yue, Yi; Li, Fangdong; Li, Daxiang; Wei, Shu; Han, Bin; Jiang, Changjun; Yin, Ye; Xia, Tao; Zhang, Zhengzhu; Bennetzen, Jeffrey L.; Zhao, Shancen; Wan, Xiaochun
2018-01-01
Tea, one of the world’s most important beverage crops, provides numerous secondary metabolites that account for its rich taste and health benefits. Here we present a high-quality sequence of the genome of tea, Camellia sinensis var. sinensis (CSS), using both Illumina and PacBio sequencing technologies. At least 64% of the 3.1-Gb genome assembly consists of repetitive sequences, and the rest yields 33,932 high-confidence predictions of encoded proteins. Divergence between two major lineages, CSS and Camellia sinensis var. assamica (CSA), is calculated to ∼0.38 to 1.54 million years ago (Mya). Analysis of genic collinearity reveals that the tea genome is the product of two rounds of whole-genome duplications (WGDs) that occurred ∼30 to 40 and ∼90 to 100 Mya. We provide evidence that these WGD events, and subsequent paralogous duplications, had major impacts on the copy numbers of secondary metabolite genes, particularly genes critical to producing three key quality compounds: catechins, theanine, and caffeine. Analyses of transcriptome and phytochemistry data show that amplification and transcriptional divergence of genes encoding a large acyltransferase family and leucoanthocyanidin reductases are associated with the characteristic young leaf accumulation of monomeric galloylated catechins in tea, while functional divergence of a single member of the glutamine synthetase gene family yielded theanine synthetase. This genome sequence will facilitate understanding of tea genome evolution and tea metabolite pathways, and will promote germplasm utilization for breeding improved tea varieties. PMID:29678829
Development of a set of SNP markers present in expressed genes of the apple.
Chagné, David; Gasic, Ksenija; Crowhurst, Ross N; Han, Yuepeng; Bassett, Heather C; Bowatte, Deepa R; Lawrence, Timothy J; Rikkerink, Erik H A; Gardiner, Susan E; Korban, Schuyler S
2008-11-01
Molecular markers associated with gene coding regions are useful tools for bridging functional and structural genomics. Due to their high abundance in plant genomes, single nucleotide polymorphisms (SNPs) are present within virtually all genomic regions, including most coding sequences. The objective of this study was to develop a set of SNPs for the apple by taking advantage of the wealth of genomics resources available for the apple, including a large collection of expressed sequenced tags (ESTs). Using bioinformatics tools, a search for SNPs within an EST database of approximately 350,000 sequences developed from a variety of apple accessions was conducted. This resulted in the identification of a total of 71,482 putative SNPs. As the apple genome is reported to be an ancient polyploid, attempts were made to verify whether those SNPs detected in silico were attributable either to allelic polymorphisms or to gene duplication or paralogous or homeologous sequence variations. To this end, a set of 464 PCR primer pairs was designed, PCR was amplified using two subsets of plants, and the PCR products were sequenced. The SNPs retrieved from these sequences were then mapped onto apple genetic maps, including a newly constructed map of a Royal Gala x A689-24 cross and a Malling 9 x Robusta 5, map using a bin mapping strategy. The SNP genotyping was performed using the high-resolution melting (HRM) technique. A total of 93 new markers containing 210 coding SNPs were successfully mapped. This new set of SNP markers for the apple offers new opportunities for understanding the genetic control of important horticultural traits using quantitative trait loci (QTL) or linkage disequilibrium analysis. These also serve as useful markers for aligning physical and genetic maps, and as potential transferable markers across the Rosaceae family.
Hohenlohe, Paul A.; Day, Mitch D.; Amish, Stephen J.; Miller, Michael R.; Kamps-Hughes, Nick; Boyer, Matthew C.; Muhlfeld, Clint C.; Allendorf, Fred W.; Johnson, Eric A.; Luikart, Gordon
2013-01-01
Rapid and inexpensive methods for genomewide single nucleotide polymorphism (SNP) discovery and genotyping are urgently needed for population management and conservation. In hybridized populations, genomic techniques that can identify and genotype thousands of species-diagnostic markers would allow precise estimates of population- and individual-level admixture as well as identification of 'super invasive' alleles, which show elevated rates of introgression above the genomewide background (likely due to natural selection). Techniques like restriction-site-associated DNA (RAD) sequencing can discover and genotype large numbers of SNPs, but they have been limited by the length of continuous sequence data they produce with Illumina short-read sequencing. We present a novel approach, overlapping paired-end RAD sequencing, to generate RAD contigs of >300–400 bp. These contigs provide sufficient flanking sequence for design of high-throughput SNP genotyping arrays and strict filtering to identify duplicate paralogous loci. We applied this approach in five populations of native westslope cutthroat trout that previously showed varying (low) levels of admixture from introduced rainbow trout (RBT). We produced 77 141 RAD contigs and used these data to filter and genotype 3180 previously identified species-diagnostic SNP loci. Our population-level and individual-level estimates of admixture were generally consistent with previous microsatellite-based estimates from the same individuals. However, we observed slightly lower admixture estimates from genomewide markers, which might result from natural selection against certain genome regions, different genomic locations for microsatellites vs. RAD-derived SNPs and/or sampling error from the small number of microsatellite loci (n = 7). We also identified candidate adaptive super invasive alleles from RBT that had excessively high admixture proportions in hybridized cutthroat trout populations.
PrionHome: a database of prions and other sequences relevant to prion phenomena.
Harbi, Djamel; Parthiban, Marimuthu; Gendoo, Deena M A; Ehsani, Sepehr; Kumar, Manish; Schmitt-Ulms, Gerold; Sowdhamini, Ramanathan; Harrison, Paul M
2012-01-01
Prions are units of propagation of an altered state of a protein or proteins; prions can propagate from organism to organism, through cooption of other protein copies. Prions contain no necessary nucleic acids, and are important both as both pathogenic agents, and as a potential force in epigenetic phenomena. The original prions were derived from a misfolded form of the mammalian Prion Protein PrP. Infection by these prions causes neurodegenerative diseases. Other prions cause non-Mendelian inheritance in budding yeast, and sometimes act as diseases of yeast. We report the bioinformatic construction of the PrionHome, a database of >2000 prion-related sequences. The data was collated from various public and private resources and filtered for redundancy. The data was then processed according to a transparent classification system of prionogenic sequences (i.e., sequences that can make prions), prionoids (i.e., proteins that propagate like prions between individual cells), and other prion-related phenomena. There are eight PrionHome classifications for sequences. The first four classifications are derived from experimental observations: prionogenic sequences, prionoids, other prion-related phenomena, and prion interactors. The second four classifications are derived from sequence analysis: orthologs, paralogs, pseudogenes, and candidate-prionogenic sequences. Database entries list: supporting information for PrionHome classifications, prion-determinant areas (where relevant), and disordered and compositionally-biased regions. Also included are literature references for the PrionHome classifications, transcripts and genomic coordinates, and structural data (including comparative models made for the PrionHome from manually curated alignments). We provide database usage examples for both vertebrate and fungal prion contexts. Using the database data, we have performed a detailed analysis of the compositional biases in known budding-yeast prionogenic sequences, showing that the only abundant bias pattern is for asparagine bias with subsidiary serine bias. We anticipate that this database will be a useful experimental aid and reference resource. It is freely available at: http://libaio.biol.mcgill.ca/prion.
PrionHome: A Database of Prions and Other Sequences Relevant to Prion Phenomena
Harbi, Djamel; Parthiban, Marimuthu; Gendoo, Deena M. A.; Ehsani, Sepehr; Kumar, Manish; Schmitt-Ulms, Gerold; Sowdhamini, Ramanathan; Harrison, Paul M.
2012-01-01
Prions are units of propagation of an altered state of a protein or proteins; prions can propagate from organism to organism, through cooption of other protein copies. Prions contain no necessary nucleic acids, and are important both as both pathogenic agents, and as a potential force in epigenetic phenomena. The original prions were derived from a misfolded form of the mammalian Prion Protein PrP. Infection by these prions causes neurodegenerative diseases. Other prions cause non-Mendelian inheritance in budding yeast, and sometimes act as diseases of yeast. We report the bioinformatic construction of the PrionHome, a database of >2000 prion-related sequences. The data was collated from various public and private resources and filtered for redundancy. The data was then processed according to a transparent classification system of prionogenic sequences (i.e., sequences that can make prions), prionoids (i.e., proteins that propagate like prions between individual cells), and other prion-related phenomena. There are eight PrionHome classifications for sequences. The first four classifications are derived from experimental observations: prionogenic sequences, prionoids, other prion-related phenomena, and prion interactors. The second four classifications are derived from sequence analysis: orthologs, paralogs, pseudogenes, and candidate-prionogenic sequences. Database entries list: supporting information for PrionHome classifications, prion-determinant areas (where relevant), and disordered and compositionally-biased regions. Also included are literature references for the PrionHome classifications, transcripts and genomic coordinates, and structural data (including comparative models made for the PrionHome from manually curated alignments). We provide database usage examples for both vertebrate and fungal prion contexts. Using the database data, we have performed a detailed analysis of the compositional biases in known budding-yeast prionogenic sequences, showing that the only abundant bias pattern is for asparagine bias with subsidiary serine bias. We anticipate that this database will be a useful experimental aid and reference resource. It is freely available at: http://libaio.biol.mcgill.ca/prion. PMID:22363733
Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis
Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia
2011-01-01
Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358
Zerbe, Philipp; Chiang, Angela; Yuen, Macaire; Hamberger, Björn; Hamberger, Britta; Draper, Jason A.; Britton, Robert; Bohlmann, Jörg
2012-01-01
The labdanoid diterpene alcohol cis-abienol is a major component of the aromatic oleoresin of balsam fir (Abies balsamea) and serves as a valuable bioproduct material for the fragrance industry. Using high-throughput 454 transcriptome sequencing and metabolite profiling of balsam fir bark tissue, we identified candidate diterpene synthase sequences for full-length cDNA cloning and functional characterization. We discovered a bifunctional class I/II cis-abienol synthase (AbCAS), along with the paralogous levopimaradiene/abietadiene synthase and isopimaradiene synthase, all of which are members of the gymnosperm-specific TPS-d subfamily. The AbCAS-catalyzed formation of cis-abienol proceeds via cyclization and hydroxylation at carbon C-8 of a postulated carbocation intermediate in the class II active site, followed by cleavage of the diphosphate group and termination of the reaction sequence without further cyclization in the class I active site. This reaction mechanism is distinct from that of synthases of the isopimaradiene- or levopimaradiene/abietadiene synthase type, which employ deprotonation reactions in the class II active site and secondary cyclizations in the class I active site, leading to tricyclic diterpenes. Comparative homology modeling suggested the active site residues Asp-348, Leu-617, Phe-696, and Gly-723 as potentially important for the specificity of AbCAS. As a class I/II bifunctional enzyme, AbCAS is a promising target for metabolic engineering of cis-abienol production. PMID:22337889
Shiller, Jason; Van de Wouw, Angela P.; Taranto, Adam P.; Bowen, Joanna K.; Dubois, David; Robinson, Andrew; Deng, Cecilia H.; Plummer, Kim M.
2015-01-01
Venturia inaequalis and V. pirina are Dothideomycete fungi that cause apple scab and pear scab disease, respectively. Whole genome sequencing of V. inaequalis and V. pirina isolates has revealed predicted proteins with sequence similarity to AvrLm6, a Leptosphaeria maculans effector that triggers a resistance response in Brassica napus and B. juncea carrying the resistance gene, Rlm6. AvrLm6-like genes are present as large families (>15 members) in all sequenced strains of V. inaequalis and V. pirina, while in L. maculans, only AvrLm6 and a single paralog have been identified. The Venturia AvrLm6-like genes are located in gene-poor regions of the genomes, and mostly in close proximity to transposable elements, which may explain the expansion of these gene families. An AvrLm6-like gene from V. inaequalis with the highest sequence identity to AvrLm6 was unable to trigger a resistance response in Rlm6-carrying B. juncea. RNA-seq and qRT-PCR gene expression analyses, of in planta- and in vitro-grown V. inaequalis, has revealed that many of the AvrLm6-like genes are expressed during infection. An AvrLm6 homolog from V. inaequalis that is up-regulated during infection was shown (using an eYFP-fusion protein construct) to be localized to the sub-cuticular stroma during biotrophic infection of apple hypocotyls. PMID:26635823
DNA barcode data accurately assign higher spider taxa
Coddington, Jonathan A.; Agnarsson, Ingi; Cheng, Ren-Chung; Čandek, Klemen; Driskell, Amy; Frick, Holger; Gregorič, Matjaž; Kostanjšek, Rok; Kropf, Christian; Kweskin, Matthew; Lokovšek, Tjaša; Pipan, Miha; Vidergar, Nina
2016-01-01
The use of unique DNA sequences as a method for taxonomic identification is no longer fundamentally controversial, even though debate continues on the best markers, methods, and technology to use. Although both existing databanks such as GenBank and BOLD, as well as reference taxonomies, are imperfect, in best case scenarios “barcodes” (whether single or multiple, organelle or nuclear, loci) clearly are an increasingly fast and inexpensive method of identification, especially as compared to manual identification of unknowns by increasingly rare expert taxonomists. Because most species on Earth are undescribed, a complete reference database at the species level is impractical in the near term. The question therefore arises whether unidentified species can, using DNA barcodes, be accurately assigned to more inclusive groups such as genera and families—taxonomic ranks of putatively monophyletic groups for which the global inventory is more complete and stable. We used a carefully chosen test library of CO1 sequences from 49 families, 313 genera, and 816 species of spiders to assess the accuracy of genus and family-level assignment. We used BLAST queries of each sequence against the entire library and got the top ten hits. The percent sequence identity was reported from these hits (PIdent, range 75–100%). Accurate assignment of higher taxa (PIdent above which errors totaled less than 5%) occurred for genera at PIdent values >95 and families at PIdent values ≥ 91, suggesting these as heuristic thresholds for accurate generic and familial identifications in spiders. Accuracy of identification increases with numbers of species/genus and genera/family in the library; above five genera per family and fifteen species per genus all higher taxon assignments were correct. We propose that using percent sequence identity between conventional barcode sequences may be a feasible and reasonably accurate method to identify animals to family/genus. However, the quality of the underlying database impacts accuracy of results; many outliers in our dataset could be attributed to taxonomic and/or sequencing errors in BOLD and GenBank. It seems that an accurate and complete reference library of families and genera of life could provide accurate higher level taxonomic identifications cheaply and accessibly, within years rather than decades. PMID:27547527
Finn, Roderick Nigel; Chauvigné, François; Hlidberg, Jón Baldur; Cutler, Christopher P.; Cerdà, Joan
2014-01-01
A major physiological barrier for aquatic organisms adapting to terrestrial life is dessication in the aerial environment. This barrier was nevertheless overcome by the Devonian ancestors of extant Tetrapoda, but the origin of specific molecular mechanisms that solved this water problem remains largely unknown. Here we show that an ancient aquaporin gene cluster evolved specifically in the sarcopterygian lineage, and subsequently diverged into paralogous forms of AQP2, -5, or -6 to mediate water conservation in extant Tetrapoda. To determine the origin of these apomorphic genomic traits, we combined aquaporin sequencing from jawless and jawed vertebrates with broad taxon assembly of >2,000 transcripts amongst 131 deuterostome genomes and developed a model based upon Bayesian inference that traces their convergent roots to stem subfamilies in basal Metazoa and Prokaryota. This approach uncovered an unexpected diversity of aquaporins in every lineage investigated, and revealed that the vertebrate superfamily consists of 17 classes of aquaporins (Aqp0 - Aqp16). The oldest orthologs associated with water conservation in modern Tetrapoda are traced to a cluster of three aqp2-like genes in Actinistia that likely arose >500 Ma through duplication of an aqp0-like gene present in a jawless ancestor. In sea lamprey, we show that aqp0 first arose in a protocluster comprised of a novel aqp14 paralog and a fused aqp01 gene. To corroborate these findings, we conducted phylogenetic analyses of five syntenic nuclear receptor subfamilies, which, together with observations of extensive genome rearrangements, support the coincident loss of ancestral aqp2-like orthologs in Actinopterygii. We thus conclude that the divergence of sarcopterygian-specific aquaporin gene clusters was permissive for the evolution of water conservation mechanisms that facilitated tetrapod terrestrial adaptation. PMID:25426855
Elusive Origins of the Extra Genes in Aspergillus oryzae
Khaldi, Nora; Wolfe, Kenneth H.
2008-01-01
The genome sequence of Aspergillus oryzae revealed unexpectedly that this species has approximately 20% more genes than its congeneric species A. nidulans and A. fumigatus. Where did these extra genes come from? Here, we evaluate several possible causes of the elevated gene number. Many gene families are expanded in A. oryzae relative to A. nidulans and A. fumigatus, but we find no evidence of ancient whole-genome duplication or other segmental duplications, either in A. oryzae or in the common ancestor of the genus Aspergillus. We show that the presence of divergent pairs of paralogs is a feature peculiar to A. oryzae and is not shared with A. nidulans or A. fumigatus. In phylogenetic trees that include paralog pairs from A. oryzae, we frequently find that one of the genes in a pair from A. oryzae has the expected orthologous relationship with A. nidulans, A. fumigatus and other species in the subphylum Eurotiomycetes, whereas the other A. oryzae gene falls outside this clade but still within the Ascomycota. We identified 456 such gene pairs in A. oryzae. Further phylogenetic analysis did not however indicate a single consistent evolutionary origin for the divergent members of these pairs. Approximately one-third of them showed phylogenies that are suggestive of horizontal gene transfer (HGT) from Sordariomycete species, and these genes are closer together in the A. oryzae genome than expected by chance, but no unique Sordariomycete donor species was identifiable. The postulated HGTs from Sordariomycetes still leave the majority of extra A. oryzae genes unaccounted for. One possible explanation for our observations is that A. oryzae might have been the recipient of many separate HGT events from diverse donors. PMID:18725939
Biewer, M; Lechner, S; Hasselmann, M
2016-01-01
Studying the fate of duplicated genes provides informative insight into the evolutionary plasticity of biological pathways to which they belong. In the paralogous sex-determining genes complementary sex determiner (csd) and feminizer (fem) of honey bee species (genus Apis), only heterozygous csd initiates female development. Here, the full-length coding sequences of the genes csd and fem of the phylogenetically basal dwarf honey bee Apis florea are characterized. Compared with other Apis species, remarkable evolutionary changes in the formation and localization of a protein-interacting (coiled-coil) motif and in the amino acids coding for the csd characteristic hypervariable region (HVR) are observed. Furthermore, functionally different csd alleles were isolated as genomic fragments from a random population sample. In the predicted potential specifying domain (PSD), a high ratio of πN/πS=1.6 indicated positive selection, whereas signs of balancing selection, commonly found in other Apis species, are missing. Low nucleotide diversity on synonymous and genome-wide, non-coding sites as well as site frequency analyses indicated a strong impact of genetic drift in A. florea, likely linked to its biology. Along the evolutionary trajectory of ~30 million years of csd evolution, episodic diversifying selection seems to have acted differently among distinct Apis branches. Consistently low amino-acid differences within the PSD among pairs of functional heterozygous csd alleles indicate that the HVR is the most important region for determining allele specificity. We propose that in the early history of the lineage-specific fem duplication giving rise to csd in Apis, A. florea csd stands as a remarkable example for the plasticity of initial sex-determining signals.
Giardia intestinalis incorporates heme into cytosolic cytochrome b₅.
Pyrih, Jan; Harant, Karel; Martincová, Eva; Sutak, Robert; Lesuisse, Emmanuel; Hrdý, Ivan; Tachezy, Jan
2014-02-01
The anaerobic intestinal pathogen Giardia intestinalis does not possess enzymes for heme synthesis, and it also lacks the typical set of hemoproteins that are involved in mitochondrial respiration and cellular oxygen stress management. Nevertheless, G. intestinalis may require heme for the function of particular hemoproteins, such as cytochrome b5 (cytb5). We have analyzed the sequences of eukaryotic cytb5 proteins and identified three distinct cytb5 groups: group I, which consists of C-tail membrane-anchored cytb5 proteins; group II, which includes soluble cytb5 proteins; and group III, which comprises the fungal cytb5 proteins. The majority of eukaryotes possess both group I and II cytb5 proteins, whereas three Giardia paralogs belong to group II. We have identified a fourth Giardia cytb5 paralog (gCYTb5-IV) that is rather divergent and possesses an unusual 134-residue N-terminal extension. Recombinant Giardia cytb5 proteins, including gCYTb5-IV, were expressed in Escherichia coli and exhibited characteristic UV-visible spectra that corresponded to heme-loaded cytb5 proteins. The expression of the recombinant gCYTb5-IV in G. intestinalis resulted in the increased import of extracellular heme and its incorporation into the protein, whereas this effect was not observed when gCYTb5-IV containing a mutated heme-binding site was expressed. The electrons for Giardia cytb5 proteins may be provided by the NADPH-dependent Tah18-like oxidoreductase GiOR-1. Therefore, GiOR-1 and cytb5 may constitute a novel redox system in G. intestinalis. To our knowledge, G. intestinalis is the first anaerobic eukaryote in which the presence of heme has been directly demonstrated.
Sibout, Richard; Eudes, Aymerick; Pollet, Brigitte; Goujon, Thomas; Mila, Isabelle; Granier, Fabienne; Séguin, Armand; Lapierre, Catherine; Jouanin, Lise
2003-06-01
Studying Arabidopsis mutants of the phenylpropanoid pathway has unraveled several biosynthetic steps of monolignol synthesis. Most of the genes leading to monolignol synthesis have been characterized recently in this herbaceous plant, except those encoding cinnamyl alcohol dehydrogenase (CAD). We have used the complete sequencing of the Arabidopsis genome to highlight a new view of the complete CAD gene family. Among nine AtCAD genes, we have identified the two distinct paralogs AtCAD-C and AtCAD-D, which share 75% identity and are likely to be involved in lignin biosynthesis in other plants. Northern, semiquantitative restriction fragment-length polymorphism-reverse transcriptase-polymerase chain reaction and western analysis revealed that AtCAD-C and AtCAD-D mRNA and protein ratios were organ dependent. Promoter activities of both genes are high in fibers and in xylem bundles. However, AtCAD-C displayed a larger range of sites of expression than AtCAD-D. Arabidopsis null mutants (Atcad-D and Atcad-C) corresponding to both genes were isolated. CAD activities were drastically reduced in both mutants, with a higher impact on sinapyl alcohol dehydrogenase activity (6% and 38% of residual sinapyl alcohol dehydrogenase activities for Atcad-D and Atcad-C, respectively). Only Atcad-D showed a slight reduction in Klason lignin content and displayed modifications of lignin structure with a significant reduced proportion of conventional S lignin units in both stems and roots, together with the incorporation of sinapaldehyde structures ether linked at Cbeta. These results argue for a substantial role of AtCAD-D in lignification, and more specifically in the biosynthesis of sinapyl alcohol, the precursor of S lignin units.
Sibout, Richard; Eudes, Aymerick; Pollet, Brigitte; Goujon, Thomas; Mila, Isabelle; Granier, Fabienne; Séguin, Armand; Lapierre, Catherine; Jouanin, Lise
2003-01-01
Studying Arabidopsis mutants of the phenylpropanoid pathway has unraveled several biosynthetic steps of monolignol synthesis. Most of the genes leading to monolignol synthesis have been characterized recently in this herbaceous plant, except those encoding cinnamyl alcohol dehydrogenase (CAD). We have used the complete sequencing of the Arabidopsis genome to highlight a new view of the complete CAD gene family. Among nine AtCAD genes, we have identified the two distinct paralogs AtCAD-C and AtCAD-D, which share 75% identity and are likely to be involved in lignin biosynthesis in other plants. Northern, semiquantitative restriction fragment-length polymorphism-reverse transcriptase-polymerase chain reaction and western analysis revealed that AtCAD-C and AtCAD-D mRNA and protein ratios were organ dependent. Promoter activities of both genes are high in fibers and in xylem bundles. However, AtCAD-C displayed a larger range of sites of expression than AtCAD-D. Arabidopsis null mutants (Atcad-D and Atcad-C) corresponding to both genes were isolated. CAD activities were drastically reduced in both mutants, with a higher impact on sinapyl alcohol dehydrogenase activity (6% and 38% of residual sinapyl alcohol dehydrogenase activities for Atcad-D and Atcad-C, respectively). Only Atcad-D showed a slight reduction in Klason lignin content and displayed modifications of lignin structure with a significant reduced proportion of conventional S lignin units in both stems and roots, together with the incorporation of sinapaldehyde structures ether linked at Cβ. These results argue for a substantial role of AtCAD-D in lignification, and more specifically in the biosynthesis of sinapyl alcohol, the precursor of S lignin units. PMID:12805615
Conserved structure and expression of hsp70 paralogs in teleost fishes.
Metzger, David C H; Hemmer-Hansen, Jakob; Schulte, Patricia M
2016-06-01
The cytosolic 70KDa heat shock proteins (Hsp70s) are widely used as biomarkers of environmental stress in ecological and toxicological studies in fish. Here we analyze teleost genome sequences to show that two genes encoding inducible hsp70s (hsp70-1 and hsp70-2) are likely present in all teleost fish. Phylogenetic and synteny analyses indicate that hsp70-1 and hsp70-2 are distinct paralogs that originated prior to the diversification of the teleosts. The promoters of both genes contain a TATA box and conserved heat shock elements (HSEs), but unlike mammalian HSP70s, both genes contain an intron in the 5' UTR. The hsp70-2 gene has undergone tandem duplication in several species. In addition, many other teleost genome assemblies have multiple copies of hsp70-2 present on separate, small, genomic scaffolds. To verify that these represent poorly assembled tandem duplicates, we cloned the genomic region surrounding hsp70-2 in Fundulus heteroclitus and showed that the hsp70-2 gene copies that are on separate scaffolds in the genome assembly are arranged as tandem duplicates. Real-time quantitative PCR of F. heteroclitus genomic DNA indicates that four copies of the hsp70-2 gene are likely present in the F. heteroclitus genome. Comparison of expression patterns in F. heteroclitus and Gasterosteus aculeatus demonstrates that hsp70-2 has a higher fold increase than hsp70-1 following heat shock in gill but not in muscle tissue, revealing a conserved difference in expression patterns between isoforms and tissues. These data indicate that ecological and toxicological studies using hsp70 as a biomarker in teleosts should take this complexity into account. Copyright © 2016 Elsevier Inc. All rights reserved.
Costa, Fernanda C; Saito, Angela; Gonçalves, Kaliandra A; Vidigal, Pedro M; Meirelles, Gabriela V; Bressan, Gustavo C; Kobarg, Jörg
2014-12-01
Ki-1/57 (HABP4) and CGI-55 (SERBP1) are regulatory proteins and paralogs with 40.7% amino acid sequence identity and 67.4% similarity. Functionally, they have been implicated in the regulation of gene expression on both the transcriptional and mRNA metabolism levels. A link with tumorigenesis is suggested, since both paralogs show altered expression levels in tumor cells and the Ki-1/57 gene is found in a region of chromosome 9q that represents a haplotype for familiar colon cancer. However, the target genes regulated by Ki-1/57 and CGI-55 are unknown. Here, we analyzed the alterations of the global transcriptome profile after Ki-1/57 or CGI-55 overexpression in HEK293T cells by DNA microchip technology. We were able to identify 363 or 190 down-regulated and 50 or 27 up-regulated genes for Ki-1/57 and CGI-55, respectively, of which 20 were shared between both proteins. Expression levels of selected genes were confirmed by qRT-PCR both after protein overexpression and siRNA knockdown. The majority of the genes with altered expression were associated to proliferation, apoptosis and cell cycle control processes, prompting us to further explore these contexts experimentally. We observed that overexpression of Ki-1/57 or CGI-55 results in reduced cell proliferation, mainly due to a G1 phase arrest, whereas siRNA knockdown of CGI-55 caused an increase in proliferation. In the case of Ki-1/57 overexpression, we found protection from apoptosis after treatment with the ER-stress inducer thapsigargin. Together, our data give important new insights that may help to explain these proteins putative involvement in tumorigenic events. Copyright © 2014 Elsevier B.V. All rights reserved.
AutoFACT: An Automatic Functional Annotation and Classification Tool
Koski, Liisa B; Gray, Michael W; Lang, B Franz; Burger, Gertraud
2005-01-01
Background Assignment of function to new molecular sequence data is an essential step in genomics projects. The usual process involves similarity searches of a given sequence against one or more databases, an arduous process for large datasets. Results We present AutoFACT, a fully automated and customizable annotation tool that assigns biologically informative functions to a sequence. Key features of this tool are that it (1) analyzes nucleotide and protein sequence data; (2) determines the most informative functional description by combining multiple BLAST reports from several user-selected databases; (3) assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names; and (4) generates output in HTML, text and GFF formats for the user's convenience. We have compared AutoFACT to four well-established annotation pipelines. The error rate of functional annotation is estimated to be only between 1–2%. Comparison of AutoFACT to the traditional top-BLAST-hit annotation method shows that our procedure increases the number of functionally informative annotations by approximately 50%. Conclusion AutoFACT will serve as a useful annotation tool for smaller sequencing groups lacking dedicated bioinformatics staff. It is implemented in PERL and runs on LINUX/UNIX platforms. AutoFACT is available at . PMID:15960857
GASP: Gapped Ancestral Sequence Prediction for proteins
Edwards, Richard J; Shields, Denis C
2004-01-01
Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike. PMID:15350199
Aiewsakun, Pakorn; Simmonds, Peter
2018-02-20
The International Committee on Taxonomy of Viruses (ICTV) classifies viruses into families, genera and species and provides a regulated system for their nomenclature that is universally used in virus descriptions. Virus taxonomic assignments have traditionally been based upon virus phenotypic properties such as host range, virion morphology and replication mechanisms, particularly at family level. However, gene sequence comparisons provide a clearer guide to their evolutionary relationships and provide the only information that may guide the incorporation of viruses detected in environmental (metagenomic) studies that lack any phenotypic data. The current study sought to determine whether the existing virus taxonomy could be reproduced by examination of genetic relationships through the extraction of protein-coding gene signatures and genome organisational features. We found large-scale consistency between genetic relationships and taxonomic assignments for viruses of all genome configurations and genome sizes. The analysis pipeline that we have called 'Genome Relationships Applied to Virus Taxonomy' (GRAViTy) was highly effective at reproducing the current assignments of viruses at family level as well as inter-family groupings into orders. Its ability to correctly differentiate assigned viruses from unassigned viruses, and classify them into the correct taxonomic group, was evaluated by threefold cross-validation technique. This predicted family membership of eukaryotic viruses with close to 100% accuracy and specificity potentially enabling the algorithm to predict assignments for the vast corpus of metagenomic sequences consistently with ICTV taxonomy rules. In an evaluation run of GRAViTy, over one half (460/921) of (near)-complete genome sequences from several large published metagenomic eukaryotic virus datasets were assigned to 127 novel family-level groupings. If corroborated by other analysis methods, these would potentially more than double the number of eukaryotic virus families in the ICTV taxonomy. A rapid and objective means to explore metagenomic viral diversity and make informed recommendations for their assignments at each taxonomic layer is essential. GRAViTy provides one means to make rule-based assignments at family and order levels in a manner that preserves the integrity and underlying organisational principles of the current ICTV taxonomy framework. Such methods are increasingly required as the vast virosphere is explored.
Taxonomic and functional assignment of cloned sequences from high Andean forest soil metagenome.
Montaña, José Salvador; Jiménez, Diego Javier; Hernández, Mónica; Angel, Tatiana; Baena, Sandra
2012-02-01
Total metagenomic DNA was isolated from high Andean forest soil and subjected to taxonomical and functional composition analyses by means of clone library generation and sequencing. The obtained yield of 1.7 μg of DNA/g of soil was used to construct a metagenomic library of approximately 20,000 clones (in the plasmid p-Bluescript II SK+) with an average insert size of 4 Kb, covering 80 Mb of the total metagenomic DNA. Metagenomic sequences near the plasmid cloning site were sequenced and them trimmed and assembled, obtaining 299 reads and 31 contigs (0.3 Mb). Taxonomic assignment of total sequences was performed by BLASTX, resulting in 68.8, 44.8 and 24.5% classification into taxonomic groups using the metagenomic RAST server v2.0, WebCARMA v1.0 online system and MetaGenome Analyzer v3.8 software, respectively. Most clone sequences were classified as Bacteria belonging to phlya Actinobacteria, Proteobacteria and Acidobacteria. Among the most represented orders were Actinomycetales (34% average), Rhizobiales, Burkholderiales and Myxococcales and with a greater number of sequences in the genus Mycobacterium (7% average), Frankia, Streptomyces and Bradyrhizobium. The vast majority of sequences were associated with the metabolism of carbohydrates, proteins, lipids and catalytic functions, such as phosphatases, glycosyltransferases, dehydrogenases, methyltransferases, dehydratases and epoxide hydrolases. In this study we compared different methods of taxonomic and functional assignment of metagenomic clone sequences to evaluate microbial diversity in an unexplored soil ecosystem, searching for putative enzymes of biotechnological interest and generating important information for further functional screening of clone libraries.
Lamoliatte, Frederic; Bonneil, Eric; Durette, Chantal; Caron-Lizotte, Olivier; Wildemann, Dirk; Zerweck, Johannes; Wenshuk, Holger; Thibault, Pierre
2013-01-01
Protein modification by small ubiquitin-like modifier (SUMO) modulates the activities of numerous proteins involved in different cellular functions such as gene transcription, cell cycle, and DNA repair. Comprehensive identification of SUMOylated sites is a prerequisite to determine how SUMOylation regulates protein function. However, mapping SUMOylated Lys residues by mass spectrometry (MS) is challenging because of the dynamic nature of this modification, the existence of three functionally distinct human SUMO paralogs, and the large SUMO chain remnant that remains attached to tryptic peptides. To overcome these problems, we created HEK293 cell lines that stably express functional SUMO paralogs with an N-terminal His6-tag and an Arg residue near the C terminus that leave a short five amino acid SUMO remnant upon tryptic digestion. We determined the fragmentation patterns of our short SUMO remnant peptides by collisional activation and electron transfer dissociation using synthetic peptide libraries. Activation using higher energy collisional dissociation on the LTQ-Orbitrap Elite identified SUMO paralog-specific fragment ions and neutral losses of the SUMO remnant with high mass accuracy (< 5 ppm). We exploited these features to detect SUMO modified tryptic peptides in complex cell extracts by correlating mass measurements of precursor and fragment ions using a data independent acquisition method. We also generated bioinformatics tools to retrieve MS/MS spectra containing characteristic fragment ions to the identification of SUMOylated peptide by conventional Mascot database searches. In HEK293 cell extracts, this MS approach uncovered low abundance SUMOylated peptides and 37 SUMO3-modified Lys residues in target proteins, most of which were previously unknown. Interestingly, we identified mixed SUMO-ubiquitin chains with ubiquitylated SUMO proteins (K20 and K32) and SUMOylated ubiquitin (K63), suggesting a complex crosstalk between these two modifications. PMID:23750026
Dussert, Stéphane; Guerin, Chloé; Andersson, Mariette; Joët, Thierry; Tranbarger, Timothy J.; Pizot, Maxime; Sarah, Gautier; Omore, Alphonse; Durand-Gasselin, Tristan; Morcillo, Fabienne
2013-01-01
Oil palm (Elaeis guineensis) produces two oils of major economic importance, commonly referred to as palm oil and palm kernel oil, extracted from the mesocarp and the endosperm, respectively. While lauric acid predominates in endosperm oil, the major fatty acids (FAs) of mesocarp oil are palmitic and oleic acids. The oil palm embryo also stores oil, which contains a significant proportion of linoleic acid. In addition, the three tissues display high variation for oil content at maturity. To gain insight into the mechanisms that govern such differences in oil content and FA composition, tissue transcriptome and lipid composition were compared during development. The contribution of the cytosolic and plastidial glycolytic routes differed markedly between the mesocarp and seed tissues, but transcriptional patterns of genes involved in the conversion of sucrose to pyruvate were not related to variations for oil content. Accumulation of lauric acid relied on the dramatic up-regulation of a specialized acyl-acyl carrier protein thioesterase paralog and the concerted recruitment of specific isoforms of triacylglycerol assembly enzymes. Three paralogs of the WRINKLED1 (WRI1) transcription factor were identified, of which EgWRI1-1 and EgWRI1-2 were massively transcribed during oil deposition in the mesocarp and the endosperm, respectively. None of the three WRI1 paralogs were detected in the embryo. The transcription level of FA synthesis genes correlated with the amount of WRI1 transcripts and oil content. Changes in triacylglycerol content and FA composition of Nicotiana benthamiana leaves infiltrated with various combinations of WRI1 and FatB paralogs from oil palm validated functions inferred from transcriptome analysis. PMID:23735505
Storz, Jay F.; Natarajan, Chandrasekhar; Cheviron, Zachary A.; Hoffmann, Federico G.; Kelly, John K.
2012-01-01
Spatially varying selection on a given polymorphism is expected to produce a localized peak in the between-population component of nucleotide diversity, and theory suggests that the chromosomal extent of elevated differentiation may be enhanced in cases where tandemly linked genes contribute to fitness variation. An intriguing example is provided by the tandemly duplicated β-globin genes of deer mice (Peromyscus maniculatus), which contribute to adaptive differentiation in blood–oxygen affinity between high- and low-altitude populations. Remarkably, the two β-globin genes segregate the same pair of functionally distinct alleles due to a history of interparalog gene conversion and alleles of the same functional type are in perfect coupling-phase linkage disequilibrium (LD). Here we report a multilocus analysis of nucleotide polymorphism and LD in highland and lowland mice with different genetic backgrounds at the β-globin genes. The analysis of haplotype structure revealed a paradoxical pattern whereby perfect LD between the two β-globin paralogs (which are separated by 16.2 kb) is maintained in spite of the fact that LD within both paralogs decays to background levels over physical distances of less than 1 kb. The survey of nucleotide polymorphism revealed that elevated levels of altitudinal differentiation at each of the β-globin genes drop away quite rapidly in the external flanking regions (upstream of the 5′ paralog and downstream of the 3′ paralog), but the level of differentiation remains unexpectedly high across the intergenic region. Observed patterns of diversity and haplotype structure are difficult to reconcile with expectations of a two-locus selection model with multiplicative fitness. PMID:22042573
Zeng, Jia; Hannenhalli, Sridhar
2013-01-01
Gene duplication, followed by functional evolution of duplicate genes, is a primary engine of evolutionary innovation. In turn, gene expression evolution is a critical component of overall functional evolution of paralogs. Inferring evolutionary history of gene expression among paralogs is therefore a problem of considerable interest. It also represents significant challenges. The standard approaches of evolutionary reconstruction assume that at an internal node of the duplication tree, the two duplicates evolve independently. However, because of various selection pressures functional evolution of the two paralogs may be coupled. The coupling of paralog evolution corresponds to three major fates of gene duplicates: subfunctionalization (SF), conserved function (CF) or neofunctionalization (NF). Quantitative analysis of these fates is of great interest and clearly influences evolutionary inference of expression. These two interrelated problems of inferring gene expression and evolutionary fates of gene duplicates have not been studied together previously and motivate the present study. Here we propose a novel probabilistic framework and algorithm to simultaneously infer (i) ancestral gene expression and (ii) the likely fate (SF, NF, CF) at each duplication event during the evolution of gene family. Using tissue-specific gene expression data, we develop a nonparametric belief propagation (NBP) algorithm to predict the ancestral expression level as a proxy for function, and describe a novel probabilistic model that relates the predicted and known expression levels to the possible evolutionary fates. We validate our model using simulation and then apply it to a genome-wide set of gene duplicates in human. Our results suggest that SF tends to be more frequent at the earlier stage of gene family expansion, while NF occurs more frequently later on.
Baskar, Venkidasamy; Park, Se Won
2015-07-01
Glucosinolates (GSL) are one of the major secondary metabolites of the Brassicaceae family. In the present study, we aim at characterizing the multiple paralogs of aliphatic GSL regulators, such as BrMYB28 and BrMYB29 genes in Brassica rapa ssp. pekinensis, by quantitative real-time PCR (qRT-PCR) analysis in different tissues and at various developmental stages. An overlapping gene expression pattern between the BrMYBs as well as their downstream genes (DSGs) was found at different developmental stages. Among the BrMYB28 and BrMYB29 paralogous genes, the BrMYB28.3 and BrMYB29.1 genes were dominantly expressed in most of the developmental stages, compared to the other paralogs of the BrMYB genes. Furthermore, the differential expression pattern of the BrMYBs was observed under various stress treatments. Interestingly, BrMYB28.2 showed the least expression in most developmental stages, while its expression was remarkably high in different stress conditions. More specifically, the BrMYB28.2, BrMYB28.3, and BrMYB29.1 genes were highly responsive to various abiotic and biotic stresses, further indicating their possible role in stress tolerance. Moreover, the in silico cis motif analysis in the upstream regulatory regions of BrMYBs showed the presence of various putative stress-specific motifs, which further indicated their responsiveness to biotic and abiotic stresses. These observations suggest that the dominantly expressed BrMYBs, both in different developmental stages and under various stress treatments (BrMYB28.3 and BrMYB29.1), may be potential candidate genes for altering the GSL level through genetic modification studies in B. rapa ssp. pekinensis. Copyright © 2015. Published by Elsevier SAS.
Requirement of zebrafish pcdh10a and pcdh10b in melanocyte precursor migration.
Williams, Jason S; Hsu, Jessica Y; Rossi, Christy Cortez; Artinger, Kristin Bruk
2018-03-29
Melanocytes derive from neural crest cells, which are a highly migratory population of cells that play an important role in pigmentation of the skin and epidermal appendages. In most vertebrates, melanocyte precursor cells migrate solely along the dorsolateral pathway to populate the skin. However, zebrafish melanocyte precursors also migrate along the ventromedial pathway, in route to the yolk, where they interact with other neural crest derivative populations. Here, we demonstrate the requirement for zebrafish paralogs pcdh10a and pcdh10b in zebrafish melanocyte precursor migration. pcdh10a and pcdh10b are expressed in a subset of melanocyte precursor and somatic cells respectively, and knockdown and TALEN mediated gene disruption of pcdh10a results in aberrant migration of melanocyte precursors resulting in fully melanized melanocytes that differentiate precociously in the ventromedial pathway. Live cell imaging analysis demonstrates that loss of pchd10a results in a reduction of directed cell migration of melanocyte precursors, caused by both increased adhesion and a loss of cell-cell contact with other migratory neural crest cells. Also, we determined that the paralog pcdh10b is upregulated and can compensate for the genetic loss of pcdh10a. Disruption of pcdh10b alone by CRISPR mutagenesis results in somite defects, while the loss of both paralogs results in enhanced migratory melanocyte precursor phenotype and embryonic lethality. These results reveal a novel role for pcdh10a and pcdh10b in zebrafish melanocyte precursor migration and suggest that pcdh10 paralogs potentially interact for proper transient migration along the ventromedial pathway. Copyright © 2018 Elsevier Inc. All rights reserved.
McGlothlin, Joel W; Chuckalovcak, John P; Janes, Daniel E; Edwards, Scott V; Feldman, Chris R; Brodie, Edmund D; Pfrender, Michael E; Brodie, Edmund D
2014-11-01
Members of a gene family expressed in a single species often experience common selection pressures. Consequently, the molecular basis of complex adaptations may be expected to involve parallel evolutionary changes in multiple paralogs. Here, we use bacterial artificial chromosome library scans to investigate the evolution of the voltage-gated sodium channel (Nav) family in the garter snake Thamnophis sirtalis, a predator of highly toxic Taricha newts. Newts possess tetrodotoxin (TTX), which blocks Nav's, arresting action potentials in nerves and muscle. Some Thamnophis populations have evolved resistance to extremely high levels of TTX. Previous work has identified amino acid sites in the skeletal muscle sodium channel Nav1.4 that confer resistance to TTX and vary across populations. We identify parallel evolution of TTX resistance in two additional Nav paralogs, Nav1.6 and 1.7, which are known to be expressed in the peripheral nervous system and should thus be exposed to ingested TTX. Each paralog contains at least one TTX-resistant substitution identical to a substitution previously identified in Nav1.4. These sites are fixed across populations, suggesting that the resistant peripheral nerves antedate resistant muscle. In contrast, three sodium channels expressed solely in the central nervous system (Nav1.1-1.3) showed no evidence of TTX resistance, consistent with protection from toxins by the blood-brain barrier. We also report the exon-intron structure of six Nav paralogs, the first such analysis for snake genes. Our results demonstrate that the molecular basis of adaptation may be both repeatable across members of a gene family and predictable based on functional considerations. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Ai, Chenbing; Liang, Yuting; Miao, Bo; Chen, Miao; Zeng, Weimin; Qiu, Guanzhou
2018-07-01
Iron-oxidizing Acidithiobacillus spp. are applied worldwide in biomining industry to extract metals from sulfide minerals. They derive energy for survival through Fe 2+ oxidation and generate Fe 3+ for the dissolution of sulfide minerals. However, molecular mechanisms of their iron oxidation still remain elusive. A novel two-cytochrome-encoding gene cluster (named tce gene cluster) encoding a high-molecular-weight cytochrome c (AFE_1428) and a c 4 -type cytochrome c 552 (AFE_1429) in A. ferrooxidans ATCC 23270 was first identified in this study. Bioinformatic analysis together with transcriptional study showed that AFE_1428 and AFE_1429 were the corresponding paralog of Cyc2 (AFE_3153) and Cyc1 (AFE_3152) which were encoded by the extensively studied rus operon and had been proven involving in ferrous iron oxidation. Both AFE_1428 and AFE_1429 contained signal peptide and the classic heme-binding motif(s) as their corresponding paralog. The modeled structure of AFE_1429 showed high resemblance to Cyc1. AFE_1428 and AFE_1429 were preferentially transcribed as their corresponding paralogs in the presence of ferrous iron as sole energy source as compared with sulfur. The tce gene cluster is highly conserved in the genomes of four phylogenetic-related A. ferrooxidans strains that were originally isolated from different sites separated with huge geographical distance, which further implies the importance of this gene cluster. Collectively, AFE_1428 and AFE_1429 involve in Fe 2+ oxidation like their corresponding paralog by integrating with the metalloproteins encoded by rus operon. This study provides novel insights into the Fe 2+ oxidation mechanism in Fe 2+ -oxidizing A. ferrooxidans ssp.
Both mechanism and age of duplications contribute to biased gene retention patterns in plants.
Rody, Hugo V S; Baute, Gregory J; Rieseberg, Loren H; Oliveira, Luiz O
2017-01-06
All extant seed plants are successful paleopolyploids, whose genomes carry duplicate genes that have survived repeated episodes of diploidization. However, the survival of gene duplicates is biased with respect to gene function and mechanism of duplication. Transcription factors, in particular, are reported to be preferentially retained following whole-genome duplications (WGDs), but disproportionately lost when duplicated by tandem events. An explanation for this pattern is provided by the Gene Balance Hypothesis (GBH), which posits that duplicates of highly connected genes are retained following WGDs to maintain optimal stoichiometry among gene products; but such connected gene duplicates are disfavored following tandem duplications. We used genomic data from 25 taxonomically diverse plant species to investigate the roles of duplication mechanism, gene function, and age of duplication in the retention of duplicate genes. Enrichment analyses were conducted to identify Gene Ontology (GO) functional categories that were overrepresented in either WGD or tandem duplications, or across ranges of divergence times. Tandem paralogs were much younger, on average, than WGD paralogs and the most frequently overrepresented GO categories were not shared between tandem and WGD paralogs. Transcription factors were overrepresented among ancient paralogs regardless of mechanism of origin or presence of a WGD. Also, in many cases, there was no bias toward transcription factor retention following recent WGDs. Both the fixation and the retention of duplicated genes in plant genomes are context-dependent events. The strong bias toward ancient transcription factor duplicates can be reconciled with the GBH if selection for optimal stoichiometry among gene products is strongest following the earliest polyploidization events and becomes increasingly relaxed as gene families expand.
Pydiura, Nikolay; Pirko, Yaroslav; Galinousky, Dmitry; Postovoitova, Anastasiia; Yemets, Alla; Kilchevsky, Aleksandr; Blume, Yaroslav
2018-06-08
Flax (Linum usitatissimum L.) is a valuable food and fiber crop cultivated for its quality fiber and seed oil. α-, β-, γ-tubulins and actins are the main structural proteins of the cytoskeleton. α- and γ-tubulin and actin genes have not been characterized yet in the flax genome. In this study, we have identified 6 α-tubulin genes, 13 β-tubulin genes, 2 γ-tubulin genes, and 15 actin genes in the flax genome and analysed the phylogenetic relationships between flax and A. thaliana tubulin and actin genes. Six α-tubulin genes are represented by 3 paralogous pairs, among 13 β-tubulin genes 7 different isotypes can be distinguished, 6 of which are encoded by two paralogous genes each. γ-tubulin is represented by a paralogous pair of genes one of which may be not functional. Fifteen actin genes represent 7 paralogous pairs - 7 actin isotypes and a sequentially duplicated copy of one of the genes of one of the isotypes. Exon-intron structure analysis has shown intron length polymorphism within the β-tubulin genes and intron number variation among the α-tubulin gene: 3 or 4 introns are found in two or four genes, respectively. Intron positioning occurs at conservative sites, as observed in numerous other plant species. Flax actin genes show both intron length polymorphisms and variation in the number of intron that may be 2 or 3. These data will be useful to support further studies on the specificity, functioning, regulation and evolution of the flax cytoskeleton proteins. This article is protected by copyright. All rights reserved.
Siegel, Nicol; Hoegg, Simone; Salzburger, Walter; Braasch, Ingo; Meyer, Axel
2007-01-01
Background The evolutionary lineage leading to the teleost fish underwent a whole genome duplication termed FSGD or 3R in addition to two prior genome duplications that took place earlier during vertebrate evolution (termed 1R and 2R). Resulting from the FSGD, additional copies of genes are present in fish, compared to tetrapods whose lineage did not experience the 3R genome duplication. Interestingly, we find that ParaHox genes do not differ in number in extant teleost fishes despite their additional genome duplication from the genomic situation in mammals, but they are distributed over twice as many paralogous regions in fish genomes. Results We determined the DNA sequence of the entire ParaHox C1 paralogon in the East African cichlid fish Astatotilapia burtoni, and compared it to orthologous regions in other vertebrate genomes as well as to the paralogous vertebrate ParaHox D paralogons. Evolutionary relationships among genes from these four chromosomal regions were studied with several phylogenetic algorithms. We provide evidence that the genes of the ParaHox C paralogous cluster are duplicated in teleosts, just as it had been shown previously for the D paralogon genes. Overall, however, synteny and cluster integrity seems to be less conserved in ParaHox gene clusters than in Hox gene clusters. Comparative analyses of non-coding sequences uncovered conserved, possibly co-regulatory elements, which are likely to contain promoter motives of the genes belonging to the ParaHox paralogons. Conclusion There seems to be strong stabilizing selection for gene order as well as gene orientation in the ParaHox C paralogon, since with a few exceptions, only the lengths of the introns and intergenic regions differ between the distantly related species examined. The high degree of evolutionary conservation of this gene cluster's architecture in particular – but possibly clusters of genes more generally – might be linked to the presence of promoter, enhancer or inhibitor motifs that serve to regulate more than just one gene. Therefore, deletions, inversions or relocations of individual genes could destroy the regulation of the clustered genes in this region. The existence of such a regulation network might explain the evolutionary conservation of gene order and orientation over the course of hundreds of millions of years of vertebrate evolution. Another possible explanation for the highly conserved gene order might be the existence of a regulator not located immediately next to its corresponding gene but further away since a relocation or inversion would possibly interrupt this interaction. Different ParaHox clusters were found to have experienced differential gene loss in teleosts. Yet the complete set of these homeobox genes was maintained, albeit distributed over almost twice the number of chromosomes. Selection due to dosage effects and/or stoichiometric disturbance might act more strongly to maintain a modal number of homeobox genes (and possibly transcription factors more generally) per genome, yet permit the accumulation of other (non regulatory) genes associated with these homeobox gene clusters. PMID:17822543
Gadala-Maria, Daniel; Yaari, Gur; Uduman, Mohamed; Kleinstein, Steven H
2015-02-24
Individual variation in germline and expressed B-cell immunoglobulin (Ig) repertoires has been associated with aging, disease susceptibility, and differential response to infection and vaccination. Repertoire properties can now be studied at large-scale through next-generation sequencing of rearranged Ig genes. Accurate analysis of these repertoire-sequencing (Rep-Seq) data requires identifying the germline variable (V), diversity (D), and joining (J) gene segments used by each Ig sequence. Current V(D)J assignment methods work by aligning sequences to a database of known germline V(D)J segment alleles. However, existing databases are likely to be incomplete and novel polymorphisms are hard to differentiate from the frequent occurrence of somatic hypermutations in Ig sequences. Here we develop a Tool for Ig Genotype Elucidation via Rep-Seq (TIgGER). TIgGER analyzes mutation patterns in Rep-Seq data to identify novel V segment alleles, and also constructs a personalized germline database containing the specific set of alleles carried by a subject. This information is then used to improve the initial V segment assignments from existing tools, like IMGT/HighV-QUEST. The application of TIgGER to Rep-Seq data from seven subjects identified 11 novel V segment alleles, including at least one in every subject examined. These novel alleles constituted 13% of the total number of unique alleles in these subjects, and impacted 3% of V(D)J segment assignments. These results reinforce the highly polymorphic nature of human Ig V genes, and suggest that many novel alleles remain to be discovered. The integration of TIgGER into Rep-Seq processing pipelines will increase the accuracy of V segment assignments, thus improving B-cell repertoire analyses.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wemmer, D.E.; Kumar, N.V.; Metrione, R.M.
Toxin II from Radianthus paumotensis (Rp/sub II/) has been investigated by high-resolution NMR and chemical sequencing methods. Resonance assignments have been obtained for this protein by the sequential approach. NMR assignments could not be made consistent with the previously reported primary sequence for this protein, and chemical methods have been used to determine a sequence with which the NMR data are consistent. Analysis of the 2D NOE spectra shows that the protein secondary structure is comprised of two sequences of ..beta..-sheet, probably joined into a distorted continuous sheet, connected by turns and extended loops, without any regular ..cap alpha..-helical segments.more » The residues previously implicated in activity in this class of proteins, D8 and R13, occur in a loop region.« less
A proteomic analysis of leaf sheaths from rice.
Shen, Shihua; Matsubae, Masami; Takao, Toshifumi; Tanaka, Naoki; Komatsu, Setsuko
2002-10-01
The proteins extracted from the leaf sheaths of rice seedlings were separated by 2-D PAGE, and analyzed by Edman sequencing and mass spectrometry, followed by database searching. Image analysis revealed 352 protein spots on 2-D PAGE after staining with Coomassie Brilliant Blue. The amino acid sequences of 44 of 84 proteins were determined; for 31 of these proteins, a clear function could be assigned, whereas for 12 proteins, no function could be assigned. Forty proteins did not yield amino acid sequence information, because they were N-terminally blocked, or the obtained sequences were too short and/or did not give unambiguous results. Fifty-nine proteins were analyzed by mass spectrometry; all of these proteins were identified by matching to the protein database. The amino acid sequences of 19 of 27 proteins analyzed by mass spectrometry were similar to the results of Edman sequencing. These results suggest that 2-D PAGE combined with Edman sequencing and mass spectrometry analysis can be effectively used to identify plant proteins.
Zuriaga, María Angeles; Mas-Coma, Santiago; Bargues, María Dolores
2015-05-01
A pseudogene, designated as "ps(5.8S+ITS-2)", paralogous to the 5.8S gene and internal transcribed spacer (ITS)-2 of the nuclear ribosomal DNA (rDNA), has been recently found in many triatomine species distributed throughout North America, Central America and northern South America. Among characteristics used as criteria for pseudogene verification, secondary structures and free energy are highlighted, showing a lower fit between minimum free energy, partition function and centroid structures, although in given cases the fit only appeared to be slightly lower. The unique characteristics of "ps(5.8S+ITS-2)" as a processed or retrotransposed pseudogenic unit of the ghost type are reviewed, with emphasis on its potential functionality compared to the functionality of genes and spacers of the normal rDNA operon. Besides the technical problem of the risk for erroneous sequence results, the usefulness of "ps(5.8S+ITS-2)" for specimen classification, phylogenetic analyses and systematic/taxonomic studies should be highlighted, based on consistence and retention index values, which in pseudogenic sequence trees were higher than in functional sequence trees. Additionally, intraindividual, interpopulational and interspecific differences in pseudogene amount and the fact that it is a pseudogene in the nuclear rDNA suggests a potential relationships with fitness, behaviour and adaptability of triatomine vectors and consequently its potential utility in Chagas disease epidemiology and control.
Marck, Christian; Kachouri-Lafond, Rym; Lafontaine, Ingrid; Westhof, Eric; Dujon, Bernard; Grosjean, Henri
2006-01-01
We present the first comprehensive analysis of RNA polymerase III (Pol III) transcribed genes in ten yeast genomes. This set includes all tRNA genes (tDNA) and genes coding for SNR6 (U6), SNR52, SCR1 and RPR1 RNA in the nine hemiascomycetes Saccharomyces cerevisiae, Saccharomyces castellii, Candida glabrata, Kluyveromyces waltii, Kluyveromyces lactis, Eremothecium gossypii, Debaryomyces hansenii, Candida albicans, Yarrowia lipolytica and the archiascomycete Schizosaccharomyces pombe. We systematically analysed sequence specificities of tRNA genes, polymorphism, variability of introns, gene redundancy and gene clustering. Analysis of decoding strategies showed that yeasts close to S.cerevisiae use bacterial decoding rules to read the Leu CUN and Arg CGN codons, in contrast to all other known Eukaryotes. In D.hansenii and C.albicans, we identified a novel tDNA-Leu (AAG), reading the Leu CUU/CUC/CUA codons with an unusual G at position 32. A systematic ‘p-distance tree’ using the 60 variable positions of the tRNA molecule revealed that most tDNAs cluster into amino acid-specific sub-trees, suggesting that, within hemiascomycetes, orthologous tDNAs are more closely related than paralogs. We finally determined the bipartite A- and B-box sequences recognized by TFIIIC. These minimal sequences are nearly conserved throughout hemiascomycetes and were satisfactorily retrieved at appropriate locations in other Pol III genes. PMID:16600899
Evolution of the Structure and Chromosomal Distribution of Histidine Biosynthetic Genes
NASA Astrophysics Data System (ADS)
Fani, Renato; Mori, Elena; Tamburini, Elena; Lazcano, Antonio
1998-10-01
A database of more than 100 histidine biosynthetic genes from different organisms belonging to the three primary domains has been analyzed, including those found in the now completely sequenced genomes of Haemophilus influenzae, Mycoplasma genitalium, Synechocystis sp., Methanococcus jannaschii, and Saccharomyces cerevisiae. The ubiquity of his genes suggests that it is a highly conserved pathway that was probably already present in the last common ancestor of all extant life. The chromosomal distribution of the his genes shows that the enterobacterial histidine operon structure is not the only possible organization, and that there is a diversity of gene arrays for the his pathway. Analysis of the available sequences shows that gene fusions (like those involved in the origin of the Escherichia coli and Salmonella typhimurium hisIE and hisB gene structures) are not universal. In contrast, the elongation event that led to the extant hisA gene from two homologous ancestral modules, as well as the subsequent paralogous duplication that originated hisF, appear to be irreversible and are conserved in all known organisms. The available evidence supports the hypothesis that histidine biosynthesis was assembled by a gene recruitment process.
Park, Eonyoung; Maquat, Lynne E
2013-01-01
Staufen1 (STAU1)-mediated mRNA decay (SMD) is an mRNA degradation process in mammalian cells that is mediated by the binding of STAU1 to a STAU1-binding site (SBS) within the 3'-untranslated region (3'-UTR) of target mRNAs. During SMD, STAU1, a double-stranded (ds) RNA-binding protein, recognizes dsRNA structures formed either by intramolecular base pairing of 3'-UTR sequences or by intermolecular base pairing of 3'-UTR sequences with a long-noncoding RNA (lncRNA) via partially complementary Alu elements. Recently, STAU2, a paralog of STAU1, has also been reported to mediate SMD. Both STAU1 and STAU2 interact directly with the ATP-dependent RNA helicase UPF1, a key SMD factor, enhancing its helicase activity to promote effective SMD. Moreover, STAU1 and STAU2 form homodimeric and heterodimeric interactions via domain-swapping. Because both SMD and the mechanistically related nonsense-mediated mRNA decay (NMD) employ UPF1; SMD and NMD are competitive pathways. Competition contributes to cellular differentiation processes, such as myogenesis and adipogenesis, placing SMD at the heart of various physiologically important mechanisms. Copyright © 2013 John Wiley & Sons, Ltd.
PGDD: a database of gene and genome duplication in plants
Lee, Tae-Ho; Tang, Haibao; Wang, Xiyin; Paterson, Andrew H.
2013-01-01
Genome duplication (GD) has permanently shaped the architecture and function of many higher eukaryotic genomes. The angiosperms (flowering plants) are outstanding models in which to elucidate consequences of GD for higher eukaryotes, owing to their propensity for chromosomal duplication or even triplication in a few cases. Duplicated genome structures often require both intra- and inter-genome alignments to unravel their evolutionary history, also providing the means to deduce both obvious and otherwise-cryptic orthology, paralogy and other relationships among genes. The burgeoning sets of angiosperm genome sequences provide the foundation for a host of investigations into the functional and evolutionary consequences of gene and GD. To provide genome alignments from a single resource based on uniform standards that have been validated by empirical studies, we built the Plant Genome Duplication Database (PGDD; freely available at http://chibba.agtec.uga.edu/duplication/), a web service providing synteny information in terms of colinearity between chromosomes. At present, PGDD contains data for 26 plants including bryophytes and chlorophyta, as well as angiosperms with draft genome sequences. In addition to the inclusion of new genomes as they become available, we are preparing new functions to enhance PGDD. PMID:23180799
Amino acid selective unlabeling for sequence specific resonance assignments in proteins
Krishnarjuna, B.; Jaipuria, Garima; Thakur, Anushikha
2010-01-01
Sequence specific resonance assignment constitutes an important step towards high-resolution structure determination of proteins by NMR and is aided by selective identification and assignment of amino acid types. The traditional approach to selective labeling yields only the chemical shifts of the particular amino acid being selected and does not help in establishing a link between adjacent residues along the polypeptide chain, which is important for sequential assignments. An alternative approach is the method of amino acid selective ‘unlabeling’ or reverse labeling, which involves selective unlabeling of specific amino acid types against a uniformly 13C/15N labeled background. Based on this method, we present a novel approach for sequential assignments in proteins. The method involves a new NMR experiment named, {12COi–15Ni+1}-filtered HSQC, which aids in linking the 1HN/15N resonances of the selectively unlabeled residue, i, and its C-terminal neighbor, i + 1, in HN-detected double and triple resonance spectra. This leads to the assignment of a tri-peptide segment from the knowledge of the amino acid types of residues: i − 1, i and i + 1, thereby speeding up the sequential assignment process. The method has the advantage of being relatively inexpensive, applicable to 2H labeled protein and can be coupled with cell-free synthesis and/or automated assignment approaches. A detailed survey involving unlabeling of different amino acid types individually or in pairs reveals that the proposed approach is also robust to misincorporation of 14N at undesired sites. Taken together, this study represents the first application of selective unlabeling for sequence specific resonance assignments and opens up new avenues to using this methodology in protein structural studies. Electronic supplementary material The online version of this article (doi:10.1007/s10858-010-9459-z) contains supplementary material, which is available to authorized users. PMID:21153044
First observation of rotational structures in Re 168
Hartley, D. J.; Janssens, R. V. F.; Riedinger, L. L.; ...
2016-11-30
We assigned first rotational sequences to the odd-odd nucleus 168Re. Coincidence relationships of these structures with rhenium x rays confirm the isotopic assignment, while arguments based on the γ-ray multiplicity (K-fold) distributions observed with the new bands lead to the mass assignment. Configurations for the two bands were determined through analysis of the rotational alignments of the structures and a comparison of the experimental B(M1)/B(E2) ratios with theory. Tentative spin assignments are proposed for the πh 11/2νi 13/2 band, based on energy level systematics for other known sequences in neighboring odd-odd rhenium nuclei, as well as on systematics seen formore » the signature inversion feature that is well known in this region. Furthermore, the spin assignment for the πh 11/2ν(h 9/2/f 7/2) structure provides additional validation of the proposed spins and configurations for isomers in the 176Au → 172Ir → 168Re α-decay chain.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hartley, D. J.; Janssens, R. V. F.; Riedinger, L. L.
We assigned first rotational sequences to the odd-odd nucleus 168Re. Coincidence relationships of these structures with rhenium x rays confirm the isotopic assignment, while arguments based on the γ-ray multiplicity (K-fold) distributions observed with the new bands lead to the mass assignment. Configurations for the two bands were determined through analysis of the rotational alignments of the structures and a comparison of the experimental B(M1)/B(E2) ratios with theory. Tentative spin assignments are proposed for the πh 11/2νi 13/2 band, based on energy level systematics for other known sequences in neighboring odd-odd rhenium nuclei, as well as on systematics seen formore » the signature inversion feature that is well known in this region. Furthermore, the spin assignment for the πh 11/2ν(h 9/2/f 7/2) structure provides additional validation of the proposed spins and configurations for isomers in the 176Au → 172Ir → 168Re α-decay chain.« less
The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes
Liu, Shengyi; Liu, Yumei; Yang, Xinhua; Tong, Chaobo; Edwards, David; Parkin, Isobel A. P.; Zhao, Meixia; Ma, Jianxin; Yu, Jingyin; Huang, Shunmou; Wang, Xiyin; Wang, Junyi; Lu, Kun; Fang, Zhiyuan; Bancroft, Ian; Yang, Tae-Jin; Hu, Qiong; Wang, Xinfa; Yue, Zhen; Li, Haojie; Yang, Linfeng; Wu, Jian; Zhou, Qing; Wang, Wanxin; King, Graham J; Pires, J. Chris; Lu, Changxin; Wu, Zhangyan; Sampath, Perumal; Wang, Zhuo; Guo, Hui; Pan, Shengkai; Yang, Limei; Min, Jiumeng; Zhang, Dong; Jin, Dianchuan; Li, Wanshun; Belcram, Harry; Tu, Jinxing; Guan, Mei; Qi, Cunkou; Du, Dezhi; Li, Jiana; Jiang, Liangcai; Batley, Jacqueline; Sharpe, Andrew G; Park, Beom-Seok; Ruperao, Pradeep; Cheng, Feng; Waminal, Nomar Espinosa; Huang, Yin; Dong, Caihua; Wang, Li; Li, Jingping; Hu, Zhiyong; Zhuang, Mu; Huang, Yi; Huang, Junyan; Shi, Jiaqin; Mei, Desheng; Liu, Jing; Lee, Tae-Ho; Wang, Jinpeng; Jin, Huizhe; Li, Zaiyun; Li, Xun; Zhang, Jiefu; Xiao, Lu; Zhou, Yongming; Liu, Zhongsong; Liu, Xuequn; Qin, Rui; Tang, Xu; Liu, Wenbin; Wang, Yupeng; Zhang, Yangyong; Lee, Jonghoon; Kim, Hyun Hee; Denoeud, France; Xu, Xun; Liang, Xinming; Hua, Wei; Wang, Xiaowu; Wang, Jun; Chalhoub, Boulos; Paterson, Andrew H
2014-01-01
Polyploidization has provided much genetic variation for plant adaptive evolution, but the mechanisms by which the molecular evolution of polyploid genomes establishes genetic architecture underlying species differentiation are unclear. Brassica is an ideal model to increase knowledge of polyploid evolution. Here we describe a draft genome sequence of Brassica oleracea, comparing it with that of its sister species B. rapa to reveal numerous chromosome rearrangements and asymmetrical gene loss in duplicated genomic blocks, asymmetrical amplification of transposable elements, differential gene co-retention for specific pathways and variation in gene expression, including alternative splicing, among a large number of paralogous and orthologous genes. Genes related to the production of anticancer phytochemicals and morphological variations illustrate consequences of genome duplication and gene divergence, imparting biochemical and morphological variation to B. oleracea. This study provides insights into Brassica genome evolution and will underpin research into the many important crops in this genus. PMID:24852848
Krak, Karol; Alvarez, Inés; Caklová, Petra; Costa, Andrea; Chrtek, Jindrich; Fehrer, Judith
2012-02-01
The development of three low-copy nuclear markers for low taxonomic level phylogenies in Asteraceae with emphasis on the subtribe Hieraciinae is reported. Marker candidates were selected by comparing a Lactuca complementary DNA (cDNA) library with public DNA sequence databases. Interspecific variation and phylogenetic signal of the selected genes were investigated for diploid taxa from the subtribe Hieraciinae and compared to a reference phylogeny. Their ability to cross-amplify was assessed for other Asteraceae tribes. All three markers had higher variation (2.1-4.5 times) than the internal transcribed spacer (ITS) in Hieraciinae. Cross-amplification was successful in at least seven other tribes of the Asteraceae. Only three cases indicating the presence of paralogs or pseudogenes were detected. The results demonstrate the potential of these markers for phylogeny reconstruction in the Hieraciinae as well as in other Asteraceae tribes, especially for very closely related species.
Chung, George; Rose, Ann M; Petalcorin, Mark I R; Martin, Julie S; Kessler, Zebulin; Sanchez-Pulido, Luis; Ponting, Chris P; Yanowitz, Judith L; Boulton, Simon J
2015-09-15
The Caenorhabditis elegans gene rec-1 was the first genetic locus identified in metazoa to affect the distribution of meiotic crossovers along the chromosome. We report that rec-1 encodes a distant paralog of HIM-5, which was discovered by whole-genome sequencing and confirmed by multiple genome-edited alleles. REC-1 is phosphorylated by cyclin-dependent kinase (CDK) in vitro, and mutation of the CDK consensus sites in REC-1 compromises meiotic crossover distribution in vivo. Unexpectedly, rec-1; him-5 double mutants are synthetic-lethal due to a defect in meiotic double-strand break formation. Thus, we uncovered an unexpected robustness to meiotic DSB formation and crossover positioning that is executed by HIM-5 and REC-1 and regulated by phosphorylation. © 2015 Chung et al.; Published by Cold Spring Harbor Laboratory Press.
Do molecules matter more than morphology? Promises and pitfalls in parasites.
Perkins, S L; Martinsen, E S; Falk, B G
2011-11-01
Systematics involves resolving both the taxonomy and phylogenetic placement of organisms. We review the advantages and disadvantages of the two kinds of information commonly used for such inferences--morphological and molecular data--as applied to the systematics of metazoan parasites generally, with special attention to the malaria parasites. The problems that potentially confound the use of morphology in parasites include challenges to consistent specimen preservation, plasticity of features depending on hosts or other environmental factors, and morphological convergence. Molecular characters such as DNA sequences present an alternative data source and are particularly useful when not all the parasite's life stages are present or when parasitaemia is low. Nonetheless, molecular data can bring challenges that include troublesome DNA isolation, paralogous gene copies, difficulty in developing molecular markers, and preferential amplification in mixed species infections. Given the differential benefits and shortcomings of both molecular and morphological characters, both should be implemented in parasite taxonomy and phylogenetics.
Cornman, R Scott; Otto, Clint R V; Iwanowicz, Deborah; Pettis, Jeffery S
2015-01-01
Identifying plant taxa that honey bees (Apis mellifera) forage upon is of great apicultural interest, but traditional methods are labor intensive and may lack resolution. Here we evaluate a high-throughput genetic barcoding approach to characterize trap-collected pollen from multiple North Dakota apiaries across multiple years. We used the Illumina MiSeq platform to generate sequence scaffolds from non-overlapping 300-bp paired-end sequencing reads of the ribosomal internal transcribed spacers (ITS). Full-length sequence scaffolds represented ~530 bp of ITS sequence after adapter trimming, drawn from the 5' of ITS1 and the 3' of ITS2, while skipping the uninformative 5.8S region. Operational taxonomic units (OTUs) were picked from scaffolds clustered at 97% identity, searched by BLAST against the nt database, and given taxonomic assignments using the paired-read lowest common ancestor approach. Taxonomic assignments and quantitative patterns were consistent with known plant distributions, phenology, and observational reports of pollen foraging, but revealed an unexpected contribution from non-crop graminoids and wetland plants. The mean number of plant species assignments per sample was 23.0 (+/- 5.5) and the mean species diversity (effective number of equally abundant species) was 3.3 (+/- 1.2). Bray-Curtis similarities showed good agreement among samples from the same apiary and sampling date. Rarefaction plots indicated that fewer than 50,000 reads are typically needed to characterize pollen samples of this complexity. Our results show that a pre-compiled, curated reference database is not essential for genus-level assignments, but species-level assignments are hindered by database gaps, reference length variation, and probable errors in the taxonomic assignment, requiring post-hoc evaluation. Although the effective per-sample yield achieved using custom MiSeq amplicon primers was less than the machine maximum, primarily due to lower "read2" quality, further protocol optimization and/or a modest reduction in multiplex scale should offset this difficulty. As small quantities of pollen are sufficient for amplification, our approach might be extendable to other questions or species for which large pollen samples are not available.
Cornman, R. Scott; Otto, Clint R. V.; Iwanowicz, Deborah; Pettis, Jeffery S.
2015-01-01
Identifying plant taxa that honey bees (Apis mellifera) forage upon is of great apicultural interest, but traditional methods are labor intensive and may lack resolution. Here we evaluate a high-throughput genetic barcoding approach to characterize trap-collected pollen from multiple North Dakota apiaries across multiple years. We used the Illumina MiSeq platform to generate sequence scaffolds from non-overlapping 300-bp paired-end sequencing reads of the ribosomal internal transcribed spacers (ITS). Full-length sequence scaffolds represented ~530 bp of ITS sequence after adapter trimming, drawn from the 5’ of ITS1 and the 3’ of ITS2, while skipping the uninformative 5.8S region. Operational taxonomic units (OTUs) were picked from scaffolds clustered at 97% identity, searched by BLAST against the nt database, and given taxonomic assignments using the paired-read lowest common ancestor approach. Taxonomic assignments and quantitative patterns were consistent with known plant distributions, phenology, and observational reports of pollen foraging, but revealed an unexpected contribution from non-crop graminoids and wetland plants. The mean number of plant species assignments per sample was 23.0 (+/- 5.5) and the mean species diversity (effective number of equally abundant species) was 3.3 (+/- 1.2). Bray-Curtis similarities showed good agreement among samples from the same apiary and sampling date. Rarefaction plots indicated that fewer than 50,000 reads are typically needed to characterize pollen samples of this complexity. Our results show that a pre-compiled, curated reference database is not essential for genus-level assignments, but species-level assignments are hindered by database gaps, reference length variation, and probable errors in the taxonomic assignment, requiring post-hoc evaluation. Although the effective per-sample yield achieved using custom MiSeq amplicon primers was less than the machine maximum, primarily due to lower “read2” quality, further protocol optimization and/or a modest reduction in multiplex scale should offset this difficulty. As small quantities of pollen are sufficient for amplification, our approach might be extendable to other questions or species for which large pollen samples are not available. PMID:26700168
Cornman, Robert S.; Otto, Clint R.; Iwanowicz, Deborah; Pettis, Jeffery S
2015-01-01
Identifying plant taxa that honey bees (Apis mellifera) forage upon is of great apicultural interest, but traditional methods are labor intensive and may lack resolution. Here we evaluate a high-throughput genetic barcoding approach to characterize trap-collected pollen from multiple North Dakota apiaries across multiple years. We used the Illumina MiSeq platform to generate sequence scaffolds from non-overlapping 300-bp paired-end sequencing reads of the ribosomal internal transcribed spacers (ITS). Full-length sequence scaffolds represented ~530 bp of ITS sequence after adapter trimming, drawn from the 5’ of ITS1 and the 3’ of ITS2, while skipping the uninformative 5.8S region. Operational taxonomic units (OTUs) were picked from scaffolds clustered at 97% identity, searched by BLAST against the nt database, and given taxonomic assignments using the paired-read lowest common ancestor approach. Taxonomic assignments and quantitative patterns were consistent with known plant distributions, phenology, and observational reports of pollen foraging, but revealed an unexpected contribution from non-crop graminoids and wetland plants. The mean number of plant species assignments per sample was 23.0 (+/- 5.5) and the mean species diversity (effective number of equally abundant species) was 3.3 (+/- 1.2). Bray-Curtis similarities showed good agreement among samples from the same apiary and sampling date. Rarefaction plots indicated that fewer than 50,000 reads are typically needed to characterize pollen samples of this complexity. Our results show that a pre-compiled, curated reference database is not essential for genus-level assignments, but species-level assignments are hindered by database gaps, reference length variation, and probable errors in the taxonomic assignment, requiring post-hoc evaluation. Although the effective per-sample yield achieved using custom MiSeq amplicon primers was less than the machine maximum, primarily due to lower “read2” quality, further protocol optimization and/or a modest reduction in multiplex scale should offset this difficulty. As small quantities of pollen are sufficient for amplification, our approach might be extendable to other questions or species for which large pollen samples are not available.
A teaching-learning sequence about weather map reading
NASA Astrophysics Data System (ADS)
Mandrikas, Achilleas; Stavrou, Dimitrios; Skordoulis, Constantine
2017-07-01
In this paper a teaching-learning sequence (TLS) introducing pre-service elementary teachers (PET) to weather map reading, with emphasis on wind assignment, is presented. The TLS includes activities about recognition of wind symbols, assignment of wind direction and wind speed on a weather map and identification of wind characteristics in a weather forecast. Sixty PET capabilities and difficulties in understanding weather maps were investigated, using inquiry-based learning activities. The results show that most PET became more capable of reading weather maps and assigning wind direction and speed on them. Our results also show that PET could be guided to understand meteorology concepts useful in everyday life and in teaching their future students.
Conn, Caitlin E; Nelson, David C
2015-01-01
The α/β-hydrolases KAI2 and D14 are paralogous receptors for karrikins and strigolactones, two classes of plant growth regulators with butenolide moieties. KAI2 and D14 act in parallel signaling pathways that share a requirement for the F-box protein MAX2, but produce distinct growth responses by regulating different members of the SMAX1-LIKE/D53 family. kai2 and max2 mutants share seed germination, seedling growth, leaf shape, and petiole orientation phenotypes that are not found in d14 or SL-deficient mutants. This implies that KAI2 recognizes an unknown, endogenous signal, herein termed KAI2 ligand (KL). Recent studies of ligand-specificity among KAI2 paralogs in basal land plants and root parasitic plants suggest that karrikin and strigolactone perception may be evolutionary adaptations of KL receptors. Here we demonstrate that evolutionarily conserved KAI2c genes from two parasite species rescue multiple phenotypes of the Arabidopsis kai2 mutant, unlike karrikin-, and strigolactone-specific KAI2 paralogs. We hypothesize that KAI2c proteins recognize KL, which could be an undiscovered hormone.
Conn, Caitlin E.; Nelson, David C.
2016-01-01
The α/β-hydrolases KAI2 and D14 are paralogous receptors for karrikins and strigolactones, two classes of plant growth regulators with butenolide moieties. KAI2 and D14 act in parallel signaling pathways that share a requirement for the F-box protein MAX2, but produce distinct growth responses by regulating different members of the SMAX1-LIKE/D53 family. kai2 and max2 mutants share seed germination, seedling growth, leaf shape, and petiole orientation phenotypes that are not found in d14 or SL-deficient mutants. This implies that KAI2 recognizes an unknown, endogenous signal, herein termed KAI2 ligand (KL). Recent studies of ligand-specificity among KAI2 paralogs in basal land plants and root parasitic plants suggest that karrikin and strigolactone perception may be evolutionary adaptations of KL receptors. Here we demonstrate that evolutionarily conserved KAI2c genes from two parasite species rescue multiple phenotypes of the Arabidopsis kai2 mutant, unlike karrikin-, and strigolactone-specific KAI2 paralogs. We hypothesize that KAI2c proteins recognize KL, which could be an undiscovered hormone. PMID:26779242
NASA Astrophysics Data System (ADS)
Rasmussen, S. R.; Füchtbauer, W.; Novero, M.; Volpe, V.; Malkov, N.; Genre, A.; Bonfante, P.; Stougaard, J.; Radutoiu, S.
2016-07-01
Functional divergence of paralogs following gene duplication is one of the mechanisms leading to evolution of novel pathways and traits. Here we show that divergence of Lys11 and Nfr5 LysM receptor kinase paralogs of Lotus japonicus has affected their specificity for lipochitooligosaccharides (LCOs) decorations, while the innate capacity to recognize and induce a downstream signalling after perception of rhizobial LCOs (Nod factors) was maintained. Regardless of this conserved ability, Lys11 was found neither expressed, nor essential during nitrogen-fixing symbiosis, providing an explanation for the determinant role of Nfr5 gene during Lotus-rhizobia interaction. Lys11 was expressed in root cortex cells associated with intraradical colonizing arbuscular mycorrhizal fungi. Detailed analyses of lys11 single and nfr1nfr5lys11 triple mutants revealed a functional arbuscular mycorrhizal symbiosis, indicating that Lys11 alone, or its possible shared function with the Nod factor receptors is not essential for the presymbiotic phases of AM symbiosis. Hence, both subfunctionalization and specialization appear to have shaped the function of these paralogs where Lys11 acts as an AM-inducible gene, possibly to fine-tune later stages of this interaction.
Immune-Related Functions of the Hivep Gene Family in East African Cichlid Fishes
Diepeveen, Eveline T.; Roth, Olivia; Salzburger, Walter
2013-01-01
Immune-related genes are often characterized by adaptive protein evolution. Selection on immune genes can be particularly strong when hosts encounter novel parasites, for instance, after the colonization of a new habitat or upon the exploitation of vacant ecological niches in an adaptive radiation. We examined a set of new candidate immune genes in East African cichlid fishes. More specifically, we studied the signatures of selection in five paralogs of the human immunodeficiency virus type I enhancer-binding protein (Hivep) gene family, tested their involvement in the immune defense, and related our results to explosive speciation and adaptive radiation events in cichlids. We found signatures of long-term positive selection in four Hivep paralogs and lineage-specific positive selection in Hivep3b in two radiating cichlid lineages. Exposure of the cichlid Astatotilapia burtoni to a vaccination with Vibrio anguillarum bacteria resulted in a positive correlation between immune response parameters and expression levels of three Hivep loci. This work provides the first evidence for a role of Hivep paralogs in teleost immune defense and links the signatures of positive selection to host–pathogen interactions within an adaptive radiation. PMID:24142922
Adomako-Ankomah, Yaw; English, Elizabeth D.; Danielson, Jeffrey J.; Pernas, Lena F.; Parker, Michelle L.; Boulanger, Martin J.; Dubey, Jitender P.; Boyle, Jon P.
2016-01-01
In Toxoplasma gondii, an intracellular parasite of humans and other animals, host mitochondrial association (HMA) is driven by a gene family that encodes multiple mitochondrial association factor 1 (MAF1) proteins. However, the importance of MAF1 gene duplication in the evolution of HMA is not understood, nor is the impact of HMA on parasite biology. Here we used within- and between-species comparative analysis to determine that the MAF1 locus is duplicated in T. gondii and its nearest extant relative Hammondia hammondi, but not another close relative, Neospora caninum. Using cross-species complementation, we determined that the MAF1 locus harbors multiple distinct paralogs that differ in their ability to mediate HMA, and that only T. gondii and H. hammondi harbor HMA+ paralogs. Additionally, we found that exogenous expression of an HMA+ paralog in T. gondii strains that do not normally exhibit HMA provides a competitive advantage over their wild-type counterparts during a mouse infection. These data indicate that HMA likely evolved by neofunctionalization of a duplicate MAF1 copy in the common ancestor of T. gondii and H. hammondi, and that the neofunctionalized gene duplicate is selectively advantageous. PMID:26920761
Pagnuco, Inti Anabela; Revuelta, María Victoria; Bondino, Hernán Gabriel; Brun, Marcel; Ten Have, Arjen
2018-01-01
Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following URL: https://github.com/BBCMdP/HMMERCTTER.
Pagnuco, Inti Anabela; Revuelta, María Victoria; Bondino, Hernán Gabriel; Brun, Marcel
2018-01-01
Background Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. Results HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. Conclusions HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following URL: https://github.com/BBCMdP/HMMERCTTER. PMID:29579071
A meta-analysis of bacterial diversity in the feces of cattle
USDA-ARS?s Scientific Manuscript database
In this study, we conducted a meta-analysis on 16S rRNA gene sequences of bovine fecal origin that are publicly available in the RDP database. A total of 13663 sequences including 603 isolate sequences were identified in the RDP database (Release 11, Update 1), where 13447 sequences were assigned t...
Brennan, Orla M.; Deasy, Emily C.; Rossney, Angela S.; Kinnevey, Peter M.; Ehricht, Ralf; Monecke, Stefan; Coleman, David C.
2012-01-01
One hundred seventy-five isolates representative of methicillin-resistant Staphylococcus aureus (MRSA) clones that predominated in Irish hospitals between 1971 and 2004 and that previously underwent multilocus sequence typing (MLST) and staphylococcal cassette chromosome mec (SCCmec) typing were characterized by spa typing (175 isolates) and DNA microarray profiling (107 isolates). The isolates belonged to 26 sequence type (ST)-SCCmec types and subtypes and 35 spa types. The array assigned all isolates to the correct MLST clonal complex (CC), and 94% (100/107) were assigned an ST, with 98% (98/100) correlating with MLST. The array assigned all isolates to the correct SCCmec type, but subtyping of only some SCCmec elements was possible. Additional SCCmec/SCC genes or DNA sequence variation not detected by SCCmec typing was detected by array profiling, including the SCC-fusidic acid resistance determinant Q6GD50/fusC. Novel SCCmec/SCC composite islands (CIs) were detected among CC8 isolates and comprised SCCmec IIA-IIE, IVE, IVF, or IVg and a ccrAB4-SCC element with 99% DNA sequence identity to SCCM1 from ST8/t024-MRSA, SCCmec VIII, and SCC-CI in Staphylococcus epidermidis. The array showed that the majority of isolates harbored one or more superantigen (94%; 100/107) and immune evasion cluster (91%; 97/107) genes. Apart from fusidic acid and trimethoprim resistance, the correlation between isolate antimicrobial resistance phenotype and the presence of specific resistance genes was ≥97%. Array profiling allowed high-throughput, accurate assignment of MRSA to CCs/STs and SCCmec types and provided further evidence of the diversity of SCCmec/SCC. In most cases, array profiling can accurately predict the resistance phenotype of an isolate. PMID:22869569
Yasukochi, Yoshiki; Satta, Yoko
2015-03-25
The human cytochrome P450 (CYP) 2D6 gene is a member of the CYP2D gene subfamily, along with the CYP2D7P and CYP2D8P pseudogenes. Although the CYP2D6 enzyme has been studied extensively because of its clinical importance, the evolution of the CYP2D subfamily has not yet been fully understood. Therefore, the goal of this study was to reveal the evolutionary process of the human drug metabolic system. Here, we investigate molecular evolution of the CYP2D subfamily in primates by comparing 14 CYP2D sequences from humans to New World monkey genomes. Window analysis and statistical tests revealed that entire genomic sequences of paralogous genes were extensively homogenized by gene conversion during molecular evolution of CYP2D genes in primates. A neighbor-joining tree based on genomic sequences at the nonsubstrate recognition sites showed that CYP2D6 and CYP2D8 genes were clustered together due to gene conversion. In contrast, a phylogenetic tree using amino acid sequences at substrate recognition sites did not cluster the CYP2D6 and CYP2D8 genes, suggesting that the functional constraint on substrate specificity is one of the causes for purifying selection at the substrate recognition sites. Our results suggest that the CYP2D gene subfamily in primates has evolved to maintain the regioselectivity for a substrate hydroxylation activity between individual enzymes, even though extensive gene conversion has occurred across CYP2D coding sequences. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Linking the potato genome to the conserved ortholog set (COS) markers
2013-01-01
Background Conserved ortholog set (COS) markers are an important functional genomics resource that has greatly improved orthology detection in Asterid species. A comprehensive list of these markers is available at Sol Genomics Network (http://solgenomics.net/) and many of these have been placed on the genetic maps of a number of solanaceous species. Results We amplified over 300 COS markers from eight potato accessions involving two diploid landraces of Solanum tuberosum Andigenum group (formerly classified as S. goniocalyx, S. phureja), and a dihaploid clone derived from a modern tetraploid cultivar of S. tuberosum and the wild species S. berthaultii, S. chomatophilum, and S. paucissectum. By BLASTn (Basic Local Alignment Search Tool of the NCBI, National Center for Biotechnology Information) algorithm we mapped the DNA sequences of these markers into the potato genome sequence. Additionally, we mapped a subset of these markers genetically in potato and present a comparison between the physical and genetic locations of these markers in potato and in comparison with the genetic location in tomato. We found that most of the COS markers are single-copy in the reference genome of potato and that the genetic location in tomato and physical location in potato sequence are mostly in agreement. However, we did find some COS markers that are present in multiple copies and those that map in unexpected locations. Sequence comparisons between species show that some of these markers may be paralogs. Conclusions The sequence-based physical map becomes helpful in identification of markers for traits of interest thereby reducing the number of markers to be tested for applications like marker assisted selection, diversity, and phylogenetic studies. PMID:23758607
Population Genomics of Paramecium Species.
Johri, Parul; Krenek, Sascha; Marinov, Georgi K; Doak, Thomas G; Berendonk, Thomas U; Lynch, Michael
2017-05-01
Population-genomic analyses are essential to understanding factors shaping genomic variation and lineage-specific sequence constraints. The dearth of such analyses for unicellular eukaryotes prompted us to assess genomic variation in Paramecium, one of the most well-studied ciliate genera. The Paramecium aurelia complex consists of ∼15 morphologically indistinguishable species that diverged subsequent to two rounds of whole-genome duplications (WGDs, as long as 320 MYA) and possess extremely streamlined genomes. We examine patterns of both nuclear and mitochondrial polymorphism, by sequencing whole genomes of 10-13 worldwide isolates of each of three species belonging to the P. aurelia complex: P. tetraurelia, P. biaurelia, P. sexaurelia, as well as two outgroup species that do not share the WGDs: P. caudatum and P. multimicronucleatum. An apparent absence of global geographic population structure suggests continuous or recent dispersal of Paramecium over long distances. Intergenic regions are highly constrained relative to coding sequences, especially in P. caudatum and P. multimicronucleatum that have shorter intergenic distances. Sequence diversity and divergence are reduced up to ∼100-150 bp both upstream and downstream of genes, suggesting strong constraints imposed by the presence of densely packed regulatory modules. In addition, comparison of sequence variation at non-synonymous and synonymous sites suggests similar recent selective pressures on paralogs within and orthologs across the deeply diverging species. This study presents the first genome-wide population-genomic analysis in ciliates and provides a valuable resource for future studies in evolutionary and functional genetics in Paramecium. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Yasukochi, Yoshiki; Satta, Yoko
2015-01-01
The human cytochrome P450 (CYP) 2D6 gene is a member of the CYP2D gene subfamily, along with the CYP2D7P and CYP2D8P pseudogenes. Although the CYP2D6 enzyme has been studied extensively because of its clinical importance, the evolution of the CYP2D subfamily has not yet been fully understood. Therefore, the goal of this study was to reveal the evolutionary process of the human drug metabolic system. Here, we investigate molecular evolution of the CYP2D subfamily in primates by comparing 14 CYP2D sequences from humans to New World monkey genomes. Window analysis and statistical tests revealed that entire genomic sequences of paralogous genes were extensively homogenized by gene conversion during molecular evolution of CYP2D genes in primates. A neighbor-joining tree based on genomic sequences at the nonsubstrate recognition sites showed that CYP2D6 and CYP2D8 genes were clustered together due to gene conversion. In contrast, a phylogenetic tree using amino acid sequences at substrate recognition sites did not cluster the CYP2D6 and CYP2D8 genes, suggesting that the functional constraint on substrate specificity is one of the causes for purifying selection at the substrate recognition sites. Our results suggest that the CYP2D gene subfamily in primates has evolved to maintain the regioselectivity for a substrate hydroxylation activity between individual enzymes, even though extensive gene conversion has occurred across CYP2D coding sequences. PMID:25808902
A Nomadic Subtelomeric Disease Resistance Gene Cluster in Common Bean1[W
David, Perrine; Chen, Nicolas W.G.; Pedrosa-Harand, Andrea; Thareau, Vincent; Sévignac, Mireille; Cannon, Steven B.; Debouck, Daniel; Langin, Thierry; Geffroy, Valérie
2009-01-01
The B4 resistance (R) gene cluster is one of the largest clusters known in common bean (Phaseolus vulgaris [Pv]). It is located in a peculiar genomic environment in the subtelomeric region of the short arm of chromosome 4, adjacent to two heterochromatic blocks (knobs). We sequenced 650 kb spanning this locus and annotated 97 genes, 26 of which correspond to Coiled-Coil-Nucleotide-Binding-Site-Leucine-Rich-Repeat (CNL). Conserved microsynteny was observed between the Pv B4 locus and corresponding regions of Medicago truncatula and Lotus japonicus in chromosomes Mt6 and Lj2, respectively. The notable exception was the CNL sequences, which were completely absent in these regions. The origin of the Pv B4-CNL sequences was investigated through phylogenetic analysis, which reveals that, in the Pv genome, paralogous CNL genes are shared among nonhomologous chromosomes (4 and 11). Together, our results suggest that Pv B4-CNL was derived from CNL sequences from another cluster, the Co-2 cluster, through an ectopic recombination event. Integration of the soybean (Glycine max) genome data enables us to date more precisely this event and also to infer that a single CNL moved from the Co-2 to the B4 cluster. Moreover, we identified a new 528-bp satellite repeat, referred to as khipu, specific to the Phaseolus genus, present both between B4-CNL sequences and in the two knobs identified at the B4 R gene cluster. The khipu repeat is present on most chromosomal termini, indicating the existence of frequent ectopic recombination events in Pv subtelomeric regions. Our results highlight the importance of ectopic recombination in R gene evolution. PMID:19776165
Vartanian, Jean-Pierre; Wain-Hobson, Simon
2002-05-28
Nuclear mtDNA sequences (numts) are a widespread family of paralogs evolving as pseudogenes in chromosomal DNA [Zhang, D. E. & Hewitt, G. M. (1996) TREE 11, 247-251 and Bensasson, D., Zhang, D., Hartl, D. L. & Hewitt, G. M. (2001) TREE 16, 314-321]. When trying to identify the species origin of an unknown DNA sample by way of an mtDNA locus, PCR may amplify both mtDNA and numts. Indeed, occasionally numts dominate confounding attempts at species identification [Bensasson, D., Zhang, D. X. & Hewitt, G. M. (2000) Mol. Biol. Evol. 17, 406-415; Wallace, D. C., et al. (1997) Proc. Natl. Acad. Sci. USA 94, 14900-14905]. Rhesus and cynomolgus macaque mtDNA haplotypes were identified in a study of oral polio vaccine samples dating from the late 1950s [Blancou, P., et al. (2001) Nature (London) 410, 1045-1046]. They were accompanied by a number of putative numts. To confirm that these putative numts were of macaque origin, a library of numts corresponding to a small segment of 12S rDNA locus has been made by using DNA from a Chinese rhesus macaque. A broad distribution was found with up to 30% sequence variation. Phylogenetic analysis showed that the evolutionary trajectories of numts and bona fide mtDNA haplotypes do not overlap with the signal exception of the host species; mtDNA fragments are continually crossing over into the germ line. In the case of divergent mtDNA sequences from old oral polio vaccine samples [Blancou, P., et al. (2001) Nature (London) 410, 1045-1046], all were closely related to numts in the Chinese macaque library.
Roy, M; Lee, R W; Kaarsholm, N C; Thøgersen, H; Brange, J; Dunn, M F
1990-06-12
The aromatic region of the 1H-FT-NMR spectrum of the biologically fully-potent, monomeric human insulin mutant, B9 Ser----Asp, B27 Thr----Glu has been investigated in D2O. At 1 to 5 mM concentrations, this mutant insulin is monomeric above pH 7.5. Coupling and amino acid classification of all aromatic signals is established via a combination of homonuclear one- and two-dimensional methods, including COSY, multiple quantum filters, selective spin decoupling and pH titrations. By comparisons with other insulin mutants and with chemically modified native insulins, all resonances in the aromatic region are given sequence-specific assignments without any reliance on the various crystal structures reported for insulin. These comparisons also give the sequence-specific assignments of most of the aromatic resonances of the mutant insulins B16 Tyr----Glu, B27 Thr----Glu and B25 Phe----Asp and the chemically modified species des-(B23-B30) insulin and monoiodo-Tyr A14 insulin. Chemical dispersion of the assigned resonances, ring current perturbations and comparisons at high pH have made possible the assignment of the aromatic resonances of human insulin, and these studies indicate that the major structural features of the human insulin monomer (including those critical to biological function) are also present in the monomeric mutant.
Biochemical and Genetic Analysis of the Chlamydia GroEL Chaperonins
Illingworth, Melissa; Hooppaw, Anna J.; Ruan, Lu
2017-01-01
ABSTRACT Chaperonins are essential for cellular growth under normal and stressful conditions and consequently represent one of the most conserved and ancient protein classes. The paradigm Escherichia coli chaperonin, EcGroEL, and its cochaperonin, EcGroES, assist in the folding of proteins via an ATP-dependent mechanism. In addition to the presence of groEL and groES homologs, groEL paralogs are found in many bacteria, including pathogens, and have evolved poorly understood species-specific functions. Chlamydia spp., which are obligate intracellular bacteria, have reduced genomes that nonetheless contain three groEL genes, Chlamydia groEL (ChgroEL), ChgroEL2, and ChgroEL3. We hypothesized that ChGroEL is the bona fide chaperonin and that the paralogs perform novel Chlamydia-specific functions. To test our hypothesis, we investigated the biochemical properties of ChGroEL and its cochaperonin, ChGroES, and queried the in vivo essentiality of the three ChgroEL genes through targeted mutagenesis in Chlamydia trachomatis. ChGroEL hydrolyzed ATP at a rate 25% of that of EcGroEL and bound with high affinity to ChGroES, and the ChGroEL-ChGroES complex could refold malate dehydrogenase (MDH). The chlamydial ChGroEL was selective for its cognate cochaperonin, ChGroES, while EcGroEL could function with both EcGroES and ChGroES. A P35T ChGroES mutant (ChGroESP35T) reduced ChGroEL-ChGroES interactions and MDH folding activities but was tolerated by EcGroEL. Both ChGroEL-ChGroES and EcGroEL-ChGroESP35T could complement an EcGroEL-EcGroES mutant. Finally, we successfully inactivated both paralogs but not ChgroEL, leading to minor growth defects in cell culture that were not exacerbated by heat stress. Collectively, our results support novel functions for the paralogs and solidify ChGroEL as a bona fide chaperonin that is biochemically distinct from EcGroEL. IMPORTANCE Chlamydia is an important cause of human diseases, including pneumonia, sexually transmitted infections, and trachoma. The chlamydial chaperonin ChGroEL and chaperonin paralog ChGroEL2 have been associated with survival under stress conditions, and ChGroEL is linked with immunopathology elicited by chlamydial infections. However, their exact roles in bacterial survival and disease remain unclear. Our results further substantiate the hypotheses that ChGroEL is the primary chlamydial chaperonin and that the paralogs play specialized roles during infection. Furthermore, ChGroEL and the mitochondrial GroEL only functioned with their cochaperonin, in contrast to the promiscuous nature of GroEL from E. coli and Helicobacter pylori, which might indicate a divergent evolution of GroEL during the transition from a free-living organism to an obligate intracellular lifestyle. PMID:28396349
Multiple isoforms for the catalytic subunit of PKA in the basal fungal lineage Mucor circinelloides.
Fernández Núñez, Lucas; Ocampo, Josefina; Gottlieb, Alexandra M; Rossi, Silvia; Moreno, Silvia
2016-12-01
Protein kinase A (PKA) activity is involved in dimorphism of the basal fungal lineage Mucor. From the recently sequenced genome of Mucor circinelloides we could predict ten catalytic subunits of PKA. From sequence alignment and structural prediction we conclude that the catalytic core of the isoforms is conserved, and the difference between them resides in their amino termini. This high number of isoforms is maintained in the subdivision Mucoromycotina. Each paralogue, when compared to the ones form other fungi is more homologous to one of its orthologs than to its paralogs. All of these fungal isoforms cannot be included in the class I or II in which fungal protein kinases have been classified. mRNA levels for each isoform were measured during aerobic and anaerobic growth. The expression of each isoform is differential and associated to a particular growth stage. We reanalyzed the sequence of PKAC (GI 20218944), the only cloned sequence available until now for a catalytic subunit of M. circinelloides. PKAC cannot be classified as a PKA because of its difference in the conserved C-tail; it shares with PKB a conserved C2 domain in the N-terminus. No catalytic activity could be measured for this protein nor predicted bioinformatically. It can thus be classified as a pseudokinase. Its importance can not be underestimated since it is expressed at the mRNA level in different stages of growth, and its deletion is lethal. Copyright © 2016 British Mycological Society. Published by Elsevier Ltd. All rights reserved.
Phylogenetic distribution of plant snoRNA families.
Patra Bhattacharya, Deblina; Canzler, Sebastian; Kehr, Stephanie; Hertel, Jana; Grosse, Ivo; Stadler, Peter F
2016-11-24
Small nucleolar RNAs (snoRNAs) are one of the most ancient families amongst non-protein-coding RNAs. They are ubiquitous in Archaea and Eukarya but absent in bacteria. Their main function is to target chemical modifications of ribosomal RNAs. They fall into two classes, box C/D snoRNAs and box H/ACA snoRNAs, which are clearly distinguished by conserved sequence motifs and the type of chemical modification that they govern. Similarly to microRNAs, snoRNAs appear in distinct families of homologs that affect homologous targets. In animals, snoRNAs and their evolution have been studied in much detail. In plants, however, their evolution has attracted comparably little attention. In order to chart the phylogenetic distribution of individual snoRNA families in plants, we applied a sophisticated approach for identifying homologs of known plant snoRNAs across the plant kingdom. In response to the relatively fast evolution of snoRNAs, information on conserved sequence boxes, target sequences, and secondary structure is combined to identify additional snoRNAs. We identified 296 families of snoRNAs in 24 species and traced their evolution throughout the plant kingdom. Many of the plant snoRNA families comprise paralogs. We also found that targets are well-conserved for most snoRNA families. The sequence conservation of snoRNAs is sufficient to establish homologies between phyla. The degree of this conservation tapers off, however, between land plants and algae. Plant snoRNAs are frequently organized in highly conserved spatial clusters. As a resource for further investigations we provide carefully curated and annotated alignments for each snoRNA family under investigation.
Monarez, Roberto R.; Macdonald, Clinton C.; Dass, Brinda
2006-01-01
CstF-64 (cleavage stimulation factor-64), a major regulatory protein of polyadenylation, is absent during male meiosis. Therefore a paralogous variant, τCstF-64 is expressed in male germ cells to maintain normal spermatogenesis. Based on sequence differences between τCstF-64 and CstF-64, and on the high incidence of alternative polyadenylation in testes, we hypothesized that the RBDs (RNA-binding domains) of τCstF-64 and CstF-64 have different affinities for RNA elements. We quantified Kd values of CstF-64 and τCstF-64 RBDs for various ribopolymers using an RNA cross-linking assay. The two RBDs had similar affinities for poly(G)18, poly(A)18 or poly(C)18, with affinity for poly(C)18 being the lowest. However, CstF-64 had a higher affinity for poly(U)18 than τCstF-64, whereas it had a lower affinity for poly(GU)9. Changing Pro-41 to a serine residue in the CstF-64 RBD did not affect its affinity for poly(U)18, but changes in amino acids downstream of the C-terminal α-helical region decreased affinity towards poly(U)18. Thus we show that the two CstF-64 paralogues differ in their affinities for specific RNA sequences, and that the region C-terminal to the RBD is important in RNA sequence recognition. This supports the hypothesis that τCstF-64 promotes germ-cell-specific patterns of polyadenylation by binding to different downstream sequence elements. PMID:17029590
Matrix metalloproteinases outside vertebrates.
Marino-Puertas, Laura; Goulas, Theodoros; Gomis-Rüth, F Xavier
2017-11-01
The matrix metalloproteinase (MMP) family belongs to the metzincin clan of zinc-dependent metallopeptidases. Due to their enormous implications in physiology and disease, MMPs have mainly been studied in vertebrates. They are engaged in extracellular protein processing and degradation, and present extensive paralogy, with 23 forms in humans. One characteristic of MMPs is a ~165-residue catalytic domain (CD), which has been structurally studied for 14 MMPs from human, mouse, rat, pig and the oral-microbiome bacterium Tannerella forsythia. These studies revealed close overall coincidence and characteristic structural features, which distinguish MMPs from other metzincins and give rise to a sequence pattern for their identification. Here, we reviewed the literature available on MMPs outside vertebrates and performed database searches for potential MMP CDs in invertebrates, plants, fungi, viruses, protists, archaea and bacteria. These and previous results revealed that MMPs are widely present in several copies in Eumetazoa and higher plants (Tracheophyta), but have just token presence in eukaryotic algae. A few dozen sequences were found in Ascomycota (within fungi) and in double-stranded DNA viruses infecting invertebrates (within viruses). In contrast, a few hundred sequences were found in archaea and >1000 in bacteria, with several copies for some species. Most of the archaeal and bacterial phyla containing potential MMPs are present in human oral and gut microbiomes. Overall, MMP-like sequences are present across all kingdoms of life, but their asymmetric distribution contradicts the vertical descent model from a eubacterial or archaeal ancestor. This article is part of a Special Issue entitled: Matrix Metalloproteinases edited by Rafael Fridman. Copyright © 2017 Elsevier B.V. All rights reserved.
Knief, Claudia
2015-01-01
Methane-oxidizing bacteria are characterized by their capability to grow on methane as sole source of carbon and energy. Cultivation-dependent and -independent methods have revealed that this functional guild of bacteria comprises a substantial diversity of organisms. In particular the use of cultivation-independent methods targeting a subunit of the particulate methane monooxygenase (pmoA) as functional marker for the detection of aerobic methanotrophs has resulted in thousands of sequences representing “unknown methanotrophic bacteria.” This limits data interpretation due to restricted information about these uncultured methanotrophs. A few groups of uncultivated methanotrophs are assumed to play important roles in methane oxidation in specific habitats, while the biology behind other sequence clusters remains still largely unknown. The discovery of evolutionary related monooxygenases in non-methanotrophic bacteria and of pmoA paralogs in methanotrophs requires that sequence clusters of uncultivated organisms have to be interpreted with care. This review article describes the present diversity of cultivated and uncultivated aerobic methanotrophic bacteria based on pmoA gene sequence diversity. It summarizes current knowledge about cultivated and major clusters of uncultivated methanotrophic bacteria and evaluates habitat specificity of these bacteria at different levels of taxonomic resolution. Habitat specificity exists for diverse lineages and at different taxonomic levels. Methanotrophic genera such as Methylocystis and Methylocaldum are identified as generalists, but they harbor habitat specific methanotrophs at species level. This finding implies that future studies should consider these diverging preferences at different taxonomic levels when analyzing methanotrophic communities. PMID:26696968
Yang, Wan-Shan; Hsu, Hung-Wei; Campbell, Mel; Cheng, Chia-Yang; Chang, Pei-Ching
2015-01-01
SUMOylation is associated with epigenetic regulation of chromatin structure and transcription. Epigenetic modifications of herpesviral genomes accompany the transcriptional switch of latent and lytic genes during the virus life cycle. Here, we report a genome-wide comparison of SUMO paralog modification on the KSHV genome. Using chromatin immunoprecipitation in conjunction with high-throughput sequencing, our study revealed highly distinct landscape changes of SUMO paralog genomic modifications associated with KSHV reactivation. A rapid and widespread deposition of SUMO-2/3, compared with SUMO-1, modification across the KSHV genome upon reactivation was observed. Interestingly, SUMO-2/3 enrichment was inversely correlated with H3K9me3 mark after reactivation, indicating that SUMO-2/3 may be responsible for regulating the expression of viral genes located in low heterochromatin regions during viral reactivation. RNA-sequencing analysis showed that the SUMO-2/3 enrichment pattern positively correlated with KSHV gene expression profiles. Activation of KSHV lytic genes located in regions with high SUMO-2/3 enrichment was enhanced by SUMO-2/3 knockdown. These findings suggest that SUMO-2/3 viral chromatin modification contributes to the diminution of viral gene expression during reactivation. Our previous study identified a SUMO-2/3-specific viral E3 ligase, K-bZIP, suggesting a potential role of this enzyme in regulating SUMO-2/3 enrichment and viral gene repression. Consistent with this prediction, higher K-bZIP binding on SUMO-2/3 enrichment region during reactivation was observed. Moreover, a K-bZIP SUMO E3 ligase dead mutant, K-bZIP-L75A, in the viral context, showed no SUMO-2/3 enrichment on viral chromatin and higher expression of viral genes located in SUMO-2/3 enriched regions during reactivation. Importantly, virus production significantly increased in both SUMO-2/3 knockdown and KSHV K-bZIP-L75A mutant cells. These results indicate that SUMO-2/3 modification of viral chromatin may function to counteract KSHV reactivation. As induction of herpesvirus reactivation may activate cellular antiviral regimes, our results suggest that development of viral SUMO E3 ligase specific inhibitors may be an avenue for anti-virus therapy. PMID:26197391
Zuriaga, Elena; Romero, Carlos; Blanca, Jose Miguel; Badenes, Maria Luisa
2018-01-27
Plum pox virus (PPV), causing Sharka disease, is one of the main limiting factors for Prunus production worldwide. In apricot (Prunus armeniaca L.) the major PPV resistance locus (PPVres), comprising ~ 196 kb, has been mapped to the upper part of linkage group 1. Within the PPVres, 68 genomic variants linked in coupling to PPV resistance were identified within 23 predicted transcripts according to peach genome annotation. Taking into account the predicted functions inferred from sequence homology, some members of a cluster of meprin and TRAF-C homology domain (MATHd)-containing genes were pointed as PPV resistance candidate genes. Here, we have characterized the global apricot transcriptome response to PPV-D infection identifying six PPVres locus genes (ParP-1 to ParP-6) differentially expressed in resistant/susceptible cultivars. Two of them (ParP-3 and ParP-4), that encode MATHd proteins, appear clearly down-regulated in resistant cultivars, as confirmed by qRT-PCR. Concurrently, variant calling was performed using whole-genome sequencing data of 24 apricot cultivars (10 PPV-resistant and 14 PPV-susceptible) and 2 wild relatives (PPV-susceptible). ParP-3 and ParP-4, named as Prunus armeniaca PPVres MATHd-containing genes (ParPMC), are the only 2 genes having allelic variants linked in coupling to PPV resistance. ParPMC1 has 1 nsSNP, while ParPMC2 has 15 variants, including a 5-bp deletion within the second exon that produces a frameshift mutation. ParPMC1 and ParPMC2 are adjacent and highly homologous (87.5% identity) suggesting they are paralogs originated from a tandem duplication. Cultivars carrying the ParPMC2 resistant (mutated) allele show lack of expression in both ParPMC2 and especially ParPMC1. Accordingly, we hypothesize that ParPMC2 is a pseudogene that mediates down-regulation of its functional paralog ParPMC1 by silencing. As a whole, results strongly support ParPMC1 and/or ParPMC2 as host susceptibility genes required for PPV infection which silencing may confer PPV resistance trait. This finding may facilitate resistance breeding by marker-assisted selection and pave the way for gene edition approaches in Prunus.
Partial DNA sequencing of Douglas-fir cDNAs used in RFLP mapping
K.D. Jermstad; D.L. Bassoni; C.S. Kinlaw; D.B. Neale
1998-01-01
DNA sequences from 87 Douglas-fir (Pseudotsuga menziesii [Mirb.] Franco) cDNA RFLP probes were determined. Sequences were submitted to the GenBank dbEST database and searched for similarity against nucleotide and protein databases using the BLASTn and BLASTx programs. Twenty-one sequences (24%) were assigned putative functions; 18 of which...
Self-organizing approach for meta-genomes.
Zhu, Jianfeng; Zheng, Wei-Mou
2014-12-01
We extend the self-organizing approach for annotation of a bacterial genome to analyze the raw sequencing data of the human gut metagenome without sequence assembling. The original approach divides the genomic sequence of a bacterium into non-overlapping segments of equal length and assigns to each segment one of seven 'phases', among which one is for the noncoding regions, three for the direct coding regions to indicate the three possible codon positions of the segment starting site, and three for the reverse coding regions. The noncoding phase and the six coding phases are described by two frequency tables of the 64 triplet types or 'codon usages'. A set of codon usages can be used to update the phase assignment and vice versa. An iteration after an initialization leads to a convergent phase assignment to give an annotation of the genome. In the extension of the approach to a metagenome, we consider a mixture model of a number of categories described by different codon usages. The Illumina Genome Analyzer sequencing data of the total DNA from faecal samples are then examined to understand the diversity of the human gut microbiome. Copyright © 2014 Elsevier Ltd. All rights reserved.
Piao, Hailan; Froula, Jeff; Du, Changbin; Kim, Tae-Wan; Hawley, Erik R; Bauer, Stefan; Wang, Zhong; Ivanova, Nathalia; Clark, Douglas S; Klenk, Hans-Peter; Hess, Matthias
2014-08-01
Although recent nucleotide sequencing technologies have significantly enhanced our understanding of microbial genomes, the function of ∼35% of genes identified in a genome currently remains unknown. To improve the understanding of microbial genomes and consequently of microbial processes it will be crucial to assign a function to this "genomic dark matter." Due to the urgent need for additional carbohydrate-active enzymes for improved production of transportation fuels from lignocellulosic biomass, we screened the genomes of more than 5,500 microorganisms for hypothetical proteins that are located in the proximity of already known cellulases. We identified, synthesized and expressed a total of 17 putative cellulase genes with insufficient sequence similarity to currently known cellulases to be identified as such using traditional sequence annotation techniques that rely on significant sequence similarity. The recombinant proteins of the newly identified putative cellulases were subjected to enzymatic activity assays to verify their hydrolytic activity towards cellulose and lignocellulosic biomass. Eleven (65%) of the tested enzymes had significant activity towards at least one of the substrates. This high success rate highlights that a gene context-based approach can be used to assign function to genes that are otherwise categorized as "genomic dark matter" and to identify biomass-degrading enzymes that have little sequence similarity to already known cellulases. The ability to assign function to genes that have no related sequence representatives with functional annotation will be important to enhance our understanding of microbial processes and to identify microbial proteins for a wide range of applications. © 2014 Wiley Periodicals, Inc.
Kim, Seon-Hee; Bae, Young-An
2017-09-01
Tyrosinase provides an essential activity during egg production in diverse platyhelminths by mediating sclerotization of eggshells. In this study, we investigated the genomic and evolutionary features of tyrosinases in parasitic platyhelminths whose genomic information is available. A pair of paralogous tyrosinases was detected in most trematodes, whereas they were lost in cyclophyllidean cestodes. A pseudophyllidean cestode displaying egg biology similar to that of trematodes possessed an orthologous gene. Interestingly, one of the paralogous tyrosinases appeared to have been multiplied into three copies in Clonorchis sinensis and Opisthorchis viverrini. In addition, a fifth tyrosinase gene that was minimally transcribed through all developmental stages was further detected in these opisthorchiid genomes. Phylogenetic analyses demonstrated that the tyrosinase gene has undergone duplication at least three times in platyhelminths. The additional opisthorchiid gene arose from the first duplication. A paralogous copy generated from these gene duplications, except for the last one, seemed to be lost in the major neodermatans lineages. In C. sinensis, tyrosinase gene expressions were initiated following sexual maturation and the levels were significantly enhanced by the presence of O2 and bile. Taken together, our data suggest that tyrosinase has evolved lineage-specifically across platyhelminths related to its copy number and induction mechanism.
Sehring, Ivonne M; Reiner, Christoph; Mansfeld, Jörg; Plattner, Helmut; Kissmehl, Roland
2007-01-01
To localize the different actin paralogs found in Paramecium and to disclose functional implications, we used overexpression of GFP-fusion proteins and antibody labeling, as well as gene silencing. Several isoforms are associated with food vacuoles of different stages. GFP-actin either forms a tail at the lee side of the organelle, or it is vesicle bound in a homogenous or in a speckled arrangement, thus reflecting an actin-based mosaic of the phagosome surface appropriate for association and/or dissociation of other vesicles upon travel through the cell. Several paralogs occur in cilia. A set of actins is found in the cell cortex where actin outlines the regular surface pattern. Labeling of defined structures of the oral cavity is due to other types of actin, whereas yet more types are distributed in a pattern suggesting association with the numerous Golgi fields. A substantial fraction of actins is associated with cytoskeletal elements that are known to be composed of other proteins. Silencing of the respective actin genes or gene subfamilies entails inhibitory effects on organelles compatible with localization studies. Knock down of the actin found in the cleavage furrow abolishes cell division, whereas silencing of other actin genes alters vitality, cell shape and swimming behavior.
Adomako-Ankomah, Yaw; English, Elizabeth D; Danielson, Jeffrey J; Pernas, Lena F; Parker, Michelle L; Boulanger, Martin J; Dubey, Jitender P; Boyle, Jon P
2016-05-01
In Toxoplasma gondii, an intracellular parasite of humans and other animals, host mitochondrial association (HMA) is driven by a gene family that encodes multiple mitochondrial association factor 1 (MAF1) proteins. However, the importance of MAF1 gene duplication in the evolution of HMA is not understood, nor is the impact of HMA on parasite biology. Here we used within- and between-species comparative analysis to determine that the MAF1 locus is duplicated in T. gondii and its nearest extant relative Hammondia hammondi, but not another close relative, Neospora caninum Using cross-species complementation, we determined that the MAF1 locus harbors multiple distinct paralogs that differ in their ability to mediate HMA, and that only T. gondii and H. hammondi harbor HMA(+) paralogs. Additionally, we found that exogenous expression of an HMA(+) paralog in T. gondii strains that do not normally exhibit HMA provides a competitive advantage over their wild-type counterparts during a mouse infection. These data indicate that HMA likely evolved by neofunctionalization of a duplicate MAF1 copy in the common ancestor of T. gondii and H. hammondi, and that the neofunctionalized gene duplicate is selectively advantageous. Copyright © 2016 by the Genetics Society of America.
Baurens, Franc-Christophe; Bocs, Stéphanie; Rouard, Mathieu; Matsumoto, Takashi; Miller, Robert N G; Rodier-Goud, Marguerite; MBéguié-A-MBéguié, Didier; Yahiaoui, Nabila
2010-07-16
Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana.
Classifying proteins into functional groups based on all-versus-all BLAST of 10 million proteins.
Kolker, Natali; Higdon, Roger; Broomall, William; Stanberry, Larissa; Welch, Dean; Lu, Wei; Haynes, Winston; Barga, Roger; Kolker, Eugene
2011-01-01
To address the monumental challenge of assigning function to millions of sequenced proteins, we completed the first of a kind all-versus-all sequence alignments using BLAST for 9.9 million proteins in the UniRef100 database. Microsoft Windows Azure produced over 3 billion filtered records in 6 days using 475 eight-core virtual machines. Protein classification into functional groups was then performed using Hive and custom jars implemented on top of Apache Hadoop utilizing the MapReduce paradigm. First, using the Clusters of Orthologous Genes (COG) database, a length normalized bit score (LNBS) was determined to be the best similarity measure for classification of proteins. LNBS achieved sensitivity and specificity of 98% each. Second, out of 5.1 million bacterial proteins, about two-thirds were assigned to significantly extended COG groups, encompassing 30 times more assigned proteins. Third, the remaining proteins were classified into protein functional groups using an innovative implementation of a single-linkage algorithm on an in-house Hadoop compute cluster. This implementation significantly reduces the run time for nonindexed queries and optimizes efficient clustering on a large scale. The performance was also verified on Amazon Elastic MapReduce. This clustering assigned nearly 2 million proteins to approximately half a million different functional groups. A similar approach was applied to classify 2.8 million eukaryotic sequences resulting in over 1 million proteins being assign to existing KOG groups and the remainder clustered into 100,000 functional groups.
Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)
Andreassen, Rune; Lunner, Sigbjørn; Høyheim, Bjørn
2009-01-01
Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining cDNA libraries generated by SGP represent a valuable cCDS FLIc source. The conservation of 7-mers in 3'UTRs indicates that these motifs are functionally important. Identity between some of these 7-mers and miRNA target sequences suggests that they are miRNA targets in Salmo salar transcripts as well. PMID:19878547
Trial to assess the utility of genetic sequencing to improve patient outcomes
A pilot trial to assess whether assigning treatment based on specific gene mutations can provide benefit to patients with metastatic solid tumors is being launched this month by the NCI. The Molecular Profiling based Assignment of Cancer Therapeutics, or
Due-Window Assignment Scheduling with Variable Job Processing Times
Wu, Yu-Bin
2015-01-01
We consider a common due-window assignment scheduling problem jobs with variable job processing times on a single machine, where the processing time of a job is a function of its position in a sequence (i.e., learning effect) or its starting time (i.e., deteriorating effect). The problem is to determine the optimal due-windows, and the processing sequence simultaneously to minimize a cost function includes earliness, tardiness, the window location, window size, and weighted number of tardy jobs. We prove that the problem can be solved in polynomial time. PMID:25918745
MassSieve: Panning MS/MS peptide data for proteins
Slotta, Douglas J.; McFarland, Melinda A.; Markey, Sanford P.
2010-01-01
We present MassSieve, a Java-based platform for visualization and parsimony analysis of single and comparative LC-MS/MS database search engine results. The success of mass spectrometric peptide sequence assignment algorithms has led to the need for a tool to merge and evaluate the increasing data set sizes that result from LC-MS/MS-based shotgun proteomic experiments. MassSieve supports reports from multiple search engines with differing search characteristics, which can increase peptide sequence coverage and/or identify conflicting or ambiguous spectral assignments. PMID:20564260
Kollitz, Erin M.; Zhang, Guozhu; Hawkins, Mary Beth; Whitfield, G. Kerr; Reif, David M.; Kullman, Seth W.
2015-01-01
The vertebrate genome is a result of two rapid and successive rounds of whole genome duplication, referred to as 1R and 2R. Furthermore, teleost fish have undergone a third whole genome duplication (3R) specific to their lineage, resulting in the retention of multiple gene paralogs. The more recent 3R event in teleosts provides a unique opportunity to gain insight into how genes evolve through specific evolutionary processes. In this study we compare molecular activities of vitamin D receptors (VDR) from basal species that diverged at key points in vertebrate evolution in order to infer derived and ancestral VDR functions of teleost paralogs. Species include the sea lamprey (Petromyzon marinus), a 1R jawless fish; the little skate (Leucoraja erinacea), a cartilaginous fish that diverged after the 2R event; and the Senegal bichir (Polypterus senegalus), a primitive 2R ray-finned fish. Saturation binding assays and gel mobility shift assays demonstrate high affinity ligand binding and classic DNA binding characteristics of VDR has been conserved across vertebrate evolution. Concentration response curves in transient transfection assays reveal EC50 values in the low nanomolar range, however maximum transactivational efficacy varies significantly between receptor orthologs. Protein-protein interactions were investigated using co-transfection, mammalian 2-hybrid assays, and mutations of coregulator activation domains. We then combined these results with our previous study of VDR paralogs from 3R teleosts into a bioinformatics analysis. Our results suggest that 1, 25D3 acts as a partial agonist in basal species. Furthermore, our bioinformatics analysis suggests that functional differences between VDR orthologs and paralogs are influenced by differential protein interactions with essential coregulator proteins. We speculate that we may be observing a change in the pharmacodynamics relationship between VDR and 1, 25D3 throughout vertebrate evolution that may have been driven by changes in protein-protein interactions between VDR and essential coregulators. PMID:25855982
Ritz, C M; Reiker, J; Charles, G; Hoxey, P; Hunt, D; Lowry, M; Stuppy, W; Taylor, N
2012-11-01
The cacti of tribe Tephrocacteae (Cactaceae-Opuntioideae) are adapted to diverse climatic conditions over a wide area of the southern Andes and adjacent lowlands. They exhibit a range of life forms from geophytes and cushion-plants to dwarf shrubs, shrubs or small trees. To confirm or challenge previous morphology-based classifications and molecular phylogenies, we sampled DNA sequences from the chloroplast trnK/matK region and the nuclear low copy gene phyC and compared the resulting phylogenies with previous data gathered from nuclear ribosomal DNA sequences. The here presented chloroplast and nuclear low copy gene phylogenies were mutually congruent and broadly coincident with the classification based on gross morphology and seed micro-morphology and anatomy. Reconstruction of hypothetical ancestral character states suggested that geophytes and cushion-forming species probably evolved several times from dwarf shrubby precursors. We also traced an increase of embryo size at the expense of the nucellus-derived storage tissue during the evolution of the Tephrocacteae, which is thought to be an evolutionary advantage because nutrients are then more rapidly accessible for the germinating embryo. In contrast to these highly concordant phylogenies, nuclear ribosomal DNA data sampled by a previous study yielded conflicting phylogenetic signals. Secondary structure predictions of ribosomal transcribed spacers suggested that this phylogeny is strongly influenced by the inclusion of paralogous sequence probably arisen by genome duplication during the evolution of this plant group. Copyright © 2012 Elsevier Inc. All rights reserved.
The chordate proteome history database.
Levasseur, Anthony; Paganini, Julien; Dainat, Jacques; Thompson, Julie D; Poch, Olivier; Pontarotti, Pierre; Gouret, Philippe
2012-01-01
The chordate proteome history database (http://ioda.univ-provence.fr) comprises some 20,000 evolutionary analyses of proteins from chordate species. Our main objective was to characterize and study the evolutionary histories of the chordate proteome, and in particular to detect genomic events and automatic functional searches. Firstly, phylogenetic analyses based on high quality multiple sequence alignments and a robust phylogenetic pipeline were performed for the whole protein and for each individual domain. Novel approaches were developed to identify orthologs/paralogs, and predict gene duplication/gain/loss events and the occurrence of new protein architectures (domain gains, losses and shuffling). These important genetic events were localized on the phylogenetic trees and on the genomic sequence. Secondly, the phylogenetic trees were enhanced by the creation of phylogroups, whereby groups of orthologous sequences created using OrthoMCL were corrected based on the phylogenetic trees; gene family size and gene gain/loss in a given lineage could be deduced from the phylogroups. For each ortholog group obtained from the phylogenetic or the phylogroup analysis, functional information and expression data can be retrieved. Database searches can be performed easily using biological objects: protein identifier, keyword or domain, but can also be based on events, eg, domain exchange events can be retrieved. To our knowledge, this is the first database that links group clustering, phylogeny and automatic functional searches along with the detection of important events occurring during genome evolution, such as the appearance of a new domain architecture.
Livebearing or egg-laying mammals: 27 decisive nucleotides of FAM168.
Pramanik, Subrata; Kutzner, Arne; Heese, Klaus
2017-05-23
In the present study, we determine comprehensive molecular phylogenetic relationships of the novel myelin-associated neurite-outgrowth inhibitor (MANI) gene across the entire eukaryotic lineage. Combined computational genomic and proteomic sequence analyses revealed MANI as one of the two members of the novel family with sequence similarity 168 member (FAM168) genes, consisting of FAM168A and FAM168B, having distinct genetic differences that illustrate diversification in its biological function and genetic taxonomy across the phylogenetic tree. Phylogenetic analyses based on coding sequences of these FAM168 genes revealed that they are paralogs and that the earliest emergence of these genes occurred in jawed vertebrates such as Callorhinchus milii. Surprisingly, these two genes are absent in other chordates that have a notochord at some stage in their lives, such as branchiostoma and tunicates. In the context of phylogenetic relationships among eukaryotic species, our results demonstrate the presence of FAM168 orthologs in vertebrates ranging from Callorhinchus milii to Homo sapiens, displaying distinct taxonomic clusters, comprised of fish, amphibians, reptiles, birds, and mammals. Analyses of individual FAM168 exons in our sample provide new insights into the molecular relationships between FAM168A and FAM168B (MANI) on the one hand and livebearing and egg-laying mammals on the other hand, demonstrating that a distinctive intermediate exon 4, comprised of 27 nucleotides, appears suddenly only in FAM168A and there in the livebearing mammals only but is absent from all other species including the egg-laying mammals.
The house spider genome reveals an ancient whole-genome duplication during arachnid evolution.
Schwager, Evelyn E; Sharma, Prashant P; Clarke, Thomas; Leite, Daniel J; Wierschin, Torsten; Pechmann, Matthias; Akiyama-Oda, Yasuko; Esposito, Lauren; Bechsgaard, Jesper; Bilde, Trine; Buffry, Alexandra D; Chao, Hsu; Dinh, Huyen; Doddapaneni, HarshaVardhan; Dugan, Shannon; Eibner, Cornelius; Extavour, Cassandra G; Funch, Peter; Garb, Jessica; Gonzalez, Luis B; Gonzalez, Vanessa L; Griffiths-Jones, Sam; Han, Yi; Hayashi, Cheryl; Hilbrant, Maarten; Hughes, Daniel S T; Janssen, Ralf; Lee, Sandra L; Maeso, Ignacio; Murali, Shwetha C; Muzny, Donna M; Nunes da Fonseca, Rodrigo; Paese, Christian L B; Qu, Jiaxin; Ronshaugen, Matthew; Schomburg, Christoph; Schönauer, Anna; Stollewerk, Angelika; Torres-Oliva, Montserrat; Turetzek, Natascha; Vanthournout, Bram; Werren, John H; Wolff, Carsten; Worley, Kim C; Bucher, Gregor; Gibbs, Richard A; Coddington, Jonathan; Oda, Hiroki; Stanke, Mario; Ayoub, Nadia A; Prpic, Nikola-Michael; Flot, Jean-François; Posnien, Nico; Richards, Stephen; McGregor, Alistair P
2017-07-31
The duplication of genes can occur through various mechanisms and is thought to make a major contribution to the evolutionary diversification of organisms. There is increasing evidence for a large-scale duplication of genes in some chelicerate lineages including two rounds of whole genome duplication (WGD) in horseshoe crabs. To investigate this further, we sequenced and analyzed the genome of the common house spider Parasteatoda tepidariorum. We found pervasive duplication of both coding and non-coding genes in this spider, including two clusters of Hox genes. Analysis of synteny conservation across the P. tepidariorum genome suggests that there has been an ancient WGD in spiders. Comparison with the genomes of other chelicerates, including that of the newly sequenced bark scorpion Centruroides sculpturatus, suggests that this event occurred in the common ancestor of spiders and scorpions, and is probably independent of the WGDs in horseshoe crabs. Furthermore, characterization of the sequence and expression of the Hox paralogs in P. tepidariorum suggests that many have been subject to neo-functionalization and/or sub-functionalization since their duplication. Our results reveal that spiders and scorpions are likely the descendants of a polyploid ancestor that lived more than 450 MYA. Given the extensive morphological diversity and ecological adaptations found among these animals, rivaling those of vertebrates, our study of the ancient WGD event in Arachnopulmonata provides a new comparative platform to explore common and divergent evolutionary outcomes of polyploidization events across eukaryotes.
Nadjar-Boger, Elisabeth; Maccatrozzo, Lisa; Radaelli, Giuseppe; Funkenstein, Bruria
2013-02-01
Myostatin (MSTN) is a member of the transforming growth factor-ß superfamily, known as a negative regulator of skeletal muscle development and growth in mammals. In contrast to mammals, fish possess at least two paralogs of MSTN: MSTN-1 and MSTN-2. Here we describe the cloning and sequence analysis of spliced and precursor (unspliced) transcripts as well as the 5' flanking region of MSTN-2 from the marine fish Umbrina cirrosa (ucMSTN-2). In silico analysis revealed numerous putative cis regulatory elements including several E-boxes known as binding sites to myogenic transcription factors. Transient transfection experiments using non-muscle and muscle cell lines showed high transcriptional activity in muscle cells and in differentiated neural cells, in accordance with our previous findings in MSTN-2 promoter from Sparus aurata. Comparative informatics analysis of MSTN-2 from several fish species revealed high conservation of the predicted amino acid sequence as well as the gene structure (exon length) although intron length varied between species. The proximal promoter of MSTN-2 gene was found to be conserved among Perciforms. In conclusion, this study reinforces our conclusion that MSTN-2 promoter is a very strong promoter, especially in muscle cells. In addition, we show that the MSTN-2 gene structure is highly conserved among fishes as is the predicted amino acid sequence of the peptide. Copyright © 2012 Elsevier Inc. All rights reserved.
Nock, Tanya G; Chand, Dhan; Lovejoy, David A
2011-04-01
The gonadotropin-releasing hormone (GnRH) and corticotropin-releasing family (CRF) are two neuropeptides families that are strongly conserved throughout evolution. Recently, the genome of the holocephalan, Callorhinchus milii (elephant shark) has been sequenced. The phylogenetic position of C. milii, along with the relatively slow evolution of the cartilaginous fish suggests that neuropeptides in this species may resemble the earliest gnathostome forms. The genome of the elephant shark was screened, in silico, using the various conserved motifs of both the vertebrate CRF paralogs and the insect diuretic hormone sequences to identify the structure of the C. milii CRF/DH-like peptides. A similar approach was taken to identify the GnRH peptides using conserved motifs in both vertebrate and invertebrate forms. Two CRF peptides, a urotensin-1 peptide and a urocortin 3 peptide were found in the genome. There was only about 50% sequence identity between the two CRF peptides suggesting an early divergence. In addition, the urocortin 2 peptide seems to have been lost and was identified as a pseudogene in C. milii. In contrast to the number of CRF family peptides, only a GnRH-II preprohormone with the conserved mature decapeptide was found. This confirms early studies about the identity of GnRH in the Holocephali, and suggests that the Holocephali and Elasmobranchii differ with respect to GnRH structure and function. Copyright © 2011 Elsevier Inc. All rights reserved.
Re-refinement of the spliceosomal U4 snRNP core-domain structure
Li, Jade; Leung, Adelaine K.; Kondo, Yasushi; Oubridge, Chris; Nagai, Kiyoshi
2016-01-01
The core domain of small nuclear ribonucleoprotein (snRNP), comprised of a ring of seven paralogous proteins bound around a single-stranded RNA sequence, functions as the assembly nucleus in the maturation of U1, U2, U4 and U5 spliceosomal snRNPs. The structure of the human U4 snRNP core domain was initially solved at 3.6 Å resolution by experimental phasing using data with tetartohedral twinning. Molecular replacement from this model followed by density modification using untwinned data recently led to a structure of the minimal U1 snRNP at 3.3 Å resolution. With the latter structure providing a search model for molecular replacement, the U4 core-domain structure has now been re-refined. The U4 Sm site-sequence AAUUUUU has been shown to bind to the seven Sm proteins SmF–SmE–SmG–SmD3–SmB–SmD1–SmD2 in an identical manner as the U1 Sm-site sequence AAUUUGU, except in SmD1 where the bound U replaces G. The progression from the initial to the re-refined structure exemplifies a tortuous route to accuracy: where well diffracting crystals of complex assemblies are initially unavailable, the early model errors are rectified by exploiting preliminary interpretations in further experiments involving homologous structures. New insights are obtained from the more accurate model. PMID:26894541
Multigene knockout utilizing off-target mutations of the CRISPR/Cas9 system in rice.
Endo, Masaki; Mikami, Masafumi; Toki, Seiichi
2015-01-01
The clustered regularly interspaced short palindromic repeat (CRISPR)-associated endonuclease 9 (CRISPR/Cas9) system has been demonstrated to be a robust genome engineering tool in a variety of organisms including plants. However, it has been shown that the CRISPR/Cas9 system cleaves genomic DNA sequences containing mismatches to the guide RNA strand. We expected that this low specificity could be exploited to induce multihomeologous and multiparalogous gene knockouts. In the case of polyploid plants, simultaneous modification of multiple homeologous genes, i.e. genes with similar but not identical DNA sequences, is often needed to obtain a desired phenotype. Even in diploid plants, disruption of multiparalogous genes, which have functional redundancy, is often needed. To validate the applicability of the CRISPR/Cas9 system to target mutagenesis of paralogous genes in rice, we designed a single-guide RNA (sgRNA) that recognized 20 bp sequences of cyclin-dependent kinase B2 (CDKB2) as an on-target locus. These 20 bp possess similarity to other rice CDK genes (CDKA1, CDKA2 and CDKB1) with different numbers of mismatches. We analyzed mutations in these four CDK genes in plants regenerated from Cas9/sgRNA-transformed calli and revealed that single, double and triple mutants of CDKA2, CDKB1 and CDKB2 can be created by a single sgRNA. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.
Wheat CBF gene family: identification of polymorphisms in the CBF coding sequence.
Mohseni, Sara; Che, Hua; Djillali, Zakia; Dumont, Estelle; Nankeu, Joseph; Danyluk, Jean
2012-12-01
Expression of cold-regulated genes needed for protection against freezing stress is mediated, in part, by the CBF transcription factor family. Previous studies with temperate cereals suggested that the CBF gene family in wheat was large, and that CBF genes were at the base of an important low temperature tolerance trait. Therefore, the goal of our study was to identify the CBF repertoire in the freezing-tolerant hexaploid wheat cultivar Norstar, and then to examine if the coding region of CBF genes in two spring cultivars contain polymorphisms that could affect the protein sequence and structure. Our analyses reveal that hexaploid wheat contains a complex CBF family consisting of at least 65 CBF genes of which 60 are known to be expressed in the cultivar Norstar. They represent 27 paralogous genes with 1-3 homeologous copies for the A, B, and D genomes. The cultivar Norstar contains two pseudogenes and at least 24 additional proteins having sequences and (or) structures that deviate from the consensus in the conserved AP2 DNA-binding and (or) C-terminal activation-domains. This suggests that in cultivars such as Norstar, low temperature tolerance may be increased through breeding of additional optimal alleles. The examination of the CBF repertoire present in the two spring cultivars, Chinese Spring and Manitou, reveals that they have additional polymorphisms affecting conserved positions in these domains. Understanding the effects of these polymorphisms will provide additional information for the selection of optimum CBF alleles in Triticeae breeding programs.
Evolution of an Expanded Mannose Receptor Gene Family
Staines, Karen; Hunt, Lawrence G.; Young, John R.; Butter, Colin
2014-01-01
Sequences of peptides from a protein specifically immunoprecipitated by an antibody, KUL01, that recognises chicken macrophages, identified a homologue of the mammalian mannose receptor, MRC1, which we called MRC1L-B. Inspection of the genomic environment of the chicken gene revealed an array of five paralogous genes, MRC1L-A to MRC1L-E, located between conserved flanking genes found either side of the single MRC1 gene in mammals. Transcripts of all five genes were detected in RNA from a macrophage cell line and other RNAs, whose sequences allowed the precise definition of spliced exons, confirming or correcting existing bioinformatic annotation. The confirmed gene structures were used to locate orthologues of all five genes in the genomes of two other avian species and of the painted turtle, all with intact coding sequences. The lizard genome had only three genes, one orthologue of MRC1L-A and two orthologues of the MRC1L-B antigen gene resulting from a recent duplication. The Xenopus genome, like that of most mammals, had only a single MRC1-like gene at the corresponding locus. MRC1L-A and MRC1L-B genes had similar cytoplasmic regions that may be indicative of similar subcellular migration and functions. Cytoplasmic regions of the other three genes were very divergent, possibly indicating the evolution of a new functional repertoire for this family of molecules, which might include novel interactions with pathogens. PMID:25390371
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wittekind, M.; Klevit, R.E.; Reizer, J.
1990-08-07
On the basis of an analysis of two-dimensional {sup 1}H NMR spectra, the complete sequence-specific {sup 1}H NMR assignments are presented for the phosphocarrier protein HPr from the Gram-positive bacterium Bacillus subtilis. During the assignment procedure, extensive use was made of spectra obtained from point mutants of HPr in order to resolve spectral overlap and to provide verification of assignments. Regions of regular secondary structure were identified by characteristic patterns of sequential backbone proton NOEs and slowly exchanging amide protons. B subtilis HPr contains four {beta}-strands that form a single antiparallel {beta}-sheet and two well-defined {alpha}-helices. There are two stretchesmore » of extended backbone structure, one of which contains the active site His{sub 15}. The overall fold of the protein is very similar to that of Escherichia coli HPr determined by NMR studies.« less
Kim, Minseok; Morrison, Mark; Yu, Zhongtang
2011-09-01
Phylogenetic analysis was conducted to examine ruminal bacteria in two ruminal fractions (adherent fraction vs. liquid fraction) collected from cattle fed with two different diets: forage alone vs. forage plus concentrate. One hundred forty-four 16S rRNA gene (rrs) sequences were obtained from clone libraries constructed from the four samples. These rrs sequences were assigned to 116 different operational taxonomic units (OTUs) defined at 0.03 phylogenetic distance. Most of these OTUs could not be assigned to any known genus. The phylum Firmicutes was represented by approximately 70% of all the sequences. By comparing to the OTUs already documented in the rumen, 52 new OTUs were identified. UniFrac, SONS, and denaturing gradient gel electrophoresis analyses revealed difference in diversity between the two fractions and between the two diets. This study showed that rrs sequences recovered from small clone libraries can still help identify novel species-level OTUs.
2013-01-01
Background Besides the development of comprehensive tools for high-throughput 16S ribosomal RNA amplicon sequence analysis, there exists a growing need for protocols emphasizing alternative phylogenetic markers such as those representing eukaryotic organisms. Results Here we introduce CloVR-ITS, an automated pipeline for comparative analysis of internal transcribed spacer (ITS) pyrosequences amplified from metagenomic DNA isolates and representing fungal species. This pipeline performs a variety of steps similar to those commonly used for 16S rRNA amplicon sequence analysis, including preprocessing for quality, chimera detection, clustering of sequences into operational taxonomic units (OTUs), taxonomic assignment (at class, order, family, genus, and species levels) and statistical analysis of sample groups of interest based on user-provided information. Using ITS amplicon pyrosequencing data from a previous human gastric fluid study, we demonstrate the utility of CloVR-ITS for fungal microbiota analysis and provide runtime and cost examples, including analysis of extremely large datasets on the cloud. We show that the largest fractions of reads from the stomach fluid samples were assigned to Dothideomycetes, Saccharomycetes, Agaricomycetes and Sordariomycetes but that all samples were dominated by sequences that could not be taxonomically classified. Representatives of the Candida genus were identified in all samples, most notably C. quercitrusa, while sequence reads assigned to the Aspergillus genus were only identified in a subset of samples. CloVR-ITS is made available as a pre-installed, automated, and portable software pipeline for cloud-friendly execution as part of the CloVR virtual machine package (http://clovr.org). Conclusion The CloVR-ITS pipeline provides fungal microbiota analysis that can be complementary to bacterial 16S rRNA and total metagenome sequence analysis allowing for more comprehensive studies of environmental and host-associated microbial communities. PMID:24451270
Catania, Francesco; Lynch, Michael
2010-05-04
In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.
Wu, N; Qin, H; Wang, M; Bian, Y; Dong, B; Sun, G; Zhao, W; Chang, G; Xu, Q; Chen, G
2017-04-01
1. Endothelin receptor B subtype 2 (EDNRB2) is a paralog of EDNRB, which encodes a 7-transmembrane G-protein coupled receptor. Previous studies reported that EDNRB was essential for melanoblast migration in mammals and ducks. 2. Muscovy ducks have different plumage colour phenotypes. Variations in EDNRB2 coding sequences (CDSs) and mRNA expression levels were investigated in 4 different Muscovy duck plumage colour phenotypes, including black, black mutant, silver and white head. 3. The EDNRB2 gene from Muscovy duck was cloned; it had a length of 6435 bp and encoded 437 amino acids. The coding region was screened and potential single nucleotide polymorphisms were identified. Eight mutations were obtained, including one missense variant (c.64C > T) and 7 synonymous substitutions. The substitutions were associated with plumage colour phenotypes. 4. The EDNRB2 mRNA expression levels were compared between feather pulp from black birds and black mutant birds. The results indicated that EDNRB2 transcripts in feather pulp were significantly higher in black feathers than in white feathers. 5. The results determined the variation of EDNRB2 CDS and mRNA expression in Muscovy ducks of various plumage colours.
Rise, Matthew L.; von Schalburg, Kristian R.; Brown, Gordon D.; Mawer, Melanie A.; Devlin, Robert H.; Kuipers, Nathanael; Busby, Maura; Beetz-Sargent, Marianne; Alberto, Roberto; Gibbs, A. Ross; Hunt, Peter; Shukin, Robert; Zeznik, Jeffrey A.; Nelson, Colleen; Jones, Simon R.M.; Smailus, Duane E.; Jones, Steven J.M.; Schein, Jacqueline E.; Marra, Marco A.; Butterfield, Yaron S.N.; Stott, Jeff M.; Ng, Siemon H.S.; Davidson, William S.; Koop, Ben F.
2004-01-01
We report 80,388 ESTs from 23 Atlantic salmon (Salmo salar) cDNA libraries (61,819 ESTs), 6 rainbow trout (Oncorhynchus mykiss) cDNA libraries (14,544 ESTs), 2 chinook salmon (Oncorhynchus tshawytscha) cDNA libraries (1317 ESTs), 2 sockeye salmon (Oncorhynchus nerka) cDNA libraries (1243 ESTs), and 2 lake whitefish (Coregonus clupeaformis) cDNA libraries (1465 ESTs). The majority of these are 3′ sequences, allowing discrimination between paralogs arising from a recent genome duplication in the salmonid lineage. Sequence assembly reveals 28,710 different S. salar, 8981 O. mykiss, 1085 O. tshawytscha, 520 O. nerka, and 1176 C. clupeaformis putative transcripts. We annotate the submitted portion of our EST database by molecular function. Higher- and lower-molecular-weight fractions of libraries are shown to contain distinct gene sets, and higher rates of gene discovery are associated with higher-molecular weight libraries. Pyloric caecum library group annotations indicate this organ may function in redox control and as a barrier against systemic uptake of xenobiotics. A microarray is described, containing 7356 salmonid elements representing 3557 different cDNAs. Analyses of cross-species hybridizations to this cDNA microarray indicate that this resource may be used for studies involving all salmonids. PMID:14962987
Fushiki, Daisuke; Hamada, Yasuo; Yoshimura, Ryoichi; Endo, Yasuhisa
2010-04-01
All multi-cellular animals, including hydra, insects and vertebrates, develop gap junctions, which communicate directly with neighboring cells. Gap junctions consist of protein families called connexins in vertebrates and innexins in invertebrates. Connexins and innexins have no homology in their amino acid sequence, but both are thought to have some similar characteristics, such as a tetra-membrane-spanning structure, formation of a channel by hexamer, and transmission of small molecules (e.g. ions) to neighboring cells. Pannexins were recently identified as a homolog of innexins in vertebrate genomes. Although pannexins are thought to share the function of intercellular communication with connexins and innexins, there is little information about the relationship among these three protein families of gap junctions. We phylgenetically and bioinformatically examined these protein families and other tetra-membrane-spanning proteins using a database and three analytical softwares. The clades formed by pannexin families do not belong to the species classification but do to paralogs of each member of pannexins. Amino acid sequences of pannexins are closely related to those of innexins but less to those of connexins. These data suggest that innexins and pannexins have a common origin, but the relationship between innexins/pannexins and connexins is as slight as that of other tetra-membrane-spanning members.
Ahn, Jinwoo; Kim, Kwang Hyun; Park, Sanghui; Ahn, Young-Ho; Kim, Ha Young; Yoon, Hana; Lee, Ji Hyun; Bang, Duhee; Lee, Dong Hyeon
2016-09-27
UTX is a histone demethylase gene located on the X chromosome and is a frequently mutated gene in urothelial bladder cancer (UBC). UTY is a paralog of UTX located on the Y chromosome. We performed target capture sequencing on 128 genes in 40 non-metastatic UBC patients. UTX was the most frequently mutated gene (30%, 12/40). Of the genetic alterations identified, 75% were truncating mutations. UTY copy number loss was detected in 8 male patients (22.8%, 8/35). Of the 9 male patients with UTX mutations, 6 also had copy number loss (66.7%). To evaluate the functional roles of UTX and UTY in tumor progression, we designed UTX and UTY single knockout and UTX-UTY double knockout experiments using a CRISPR/Cas9 lentiviral system, and compared the proliferative capacities of two UBC cell lines in vitro. Single UTX or UTY knockout increased cell proliferation as compared to UTX-UTY wild-type cells. UTX-UTY double knockout cells exhibited greater proliferation than single knockout cells. These findings suggest both UTX and UTY function as dose-dependent suppressors of UBC development. While UTX escapes X chromosome inactivation in females, UTY may function as a male homologue of UTX, which could compensate for dosage imbalances.
Sander, Adam F.; Lavstsen, Thomas; Rask, Thomas S.; Lisby, Michael; Salanti, Ali; Fordyce, Sarah L.; Jespersen, Jakob S.; Carter, Richard; Deitsch, Kirk W.; Theander, Thor G.; Pedersen, Anders Gorm; Arnot, David E.
2014-01-01
Many bacterial, viral and parasitic pathogens undergo antigenic variation to counter host immune defense mechanisms. In Plasmodium falciparum, the most lethal of human malaria parasites, switching of var gene expression results in alternating expression of the adhesion proteins of the Plasmodium falciparum-erythrocyte membrane protein 1 class on the infected erythrocyte surface. Recombination clearly generates var diversity, but the nature and control of the genetic exchanges involved remain unclear. By experimental and bioinformatic identification of recombination events and genome-wide recombination hotspots in var genes, we show that during the parasite’s sexual stages, ectopic recombination between isogenous var paralogs occurs near low folding free energy DNA 50-mers and that these sequences are heavily concentrated at the boundaries of regions encoding individual Plasmodium falciparum-erythrocyte membrane protein 1 structural domains. The recombinogenic potential of these 50-mers is not parasite-specific because these sequences also induce recombination when transferred to the yeast Saccharomyces cerevisiae. Genetic cross data suggest that DNA secondary structures (DSS) act as inducers of recombination during DNA replication in P. falciparum sexual stages, and that these DSS-regulated genetic exchanges generate functional and diverse P. falciparum adhesion antigens. DSS-induced recombination may represent a common mechanism for optimizing the evolvability of virulence gene families in pathogens. PMID:24253306
Kuroda, Makoto; Yamashita, Atsushi; Hirakawa, Hideki; Kumano, Miyuki; Morikawa, Kazuya; Higashide, Masato; Maruyama, Atsushi; Inose, Yumiko; Matoba, Kimio; Toh, Hidehiro; Kuhara, Satoru; Hattori, Masahira; Ohta, Toshiko
2005-09-13
Staphylococcus saprophyticus is a uropathogenic Staphylococcus frequently isolated from young female outpatients presenting with uncomplicated urinary tract infections. We sequenced the whole genome of S. saprophyticus type strain ATCC 15305, which harbors a circular chromosome of 2,516,575 bp with 2,446 ORFs and two plasmids. Comparative genomic analyses with the strains of two other species, Staphylococcus aureus and Staphylococcus epidermidis, as well as experimental data, revealed the following characteristics of the S. saprophyticus genome. S. saprophyticus does not possess any virulence factors found in S. aureus, such as coagulase, enterotoxins, exoenzymes, and extracellular matrix-binding proteins, although it does have a remarkable paralog expansion of transport systems related to highly variable ion contents in the urinary environment. A further unique feature is that only a single ORF is predictable as a cell wall-anchored protein, and it shows positive hemagglutination and adherence to human bladder cell associated with initial colonization in the urinary tract. It also shows significantly high urease activity in S. saprophyticus. The uropathogenicity of S. saprophyticus can be attributed to its genome that is needed for its survival in the human urinary tract by means of novel cell wall-anchored adhesin and redundant uro-adaptive transport systems, together with urease.
Cloud, Joann L; Harmsen, Dag; Iwen, Peter C; Dunn, James J; Hall, Gerri; Lasala, Paul Rocco; Hoggan, Karen; Wilson, Deborah; Woods, Gail L; Mellmann, Alexander
2010-04-01
Correct identification of nonfermenting Gram-negative bacilli (NFB) is crucial for patient management. We compared phenotypic identifications of 96 clinical NFB isolates with identifications obtained by 5' 16S rRNA gene sequencing. Sequencing identified 88 isolates (91.7%) with >99% similarity to a sequence from the assigned species; 61.5% of sequencing results were concordant with phenotypic results, indicating the usability of sequencing to identify NFB.
SPOCS: software for predicting and visualizing orthology/paralogy relationships among genomes.
Curtis, Darren S; Phillips, Aaron R; Callister, Stephen J; Conlan, Sean; McCue, Lee Ann
2013-10-15
At the rate that prokaryotic genomes can now be generated, comparative genomics studies require a flexible method for quickly and accurately predicting orthologs among the rapidly changing set of genomes available. SPOCS implements a graph-based ortholog prediction method to generate a simple tab-delimited table of orthologs and in addition, html files that provide a visualization of the predicted ortholog/paralog relationships to which gene/protein expression metadata may be overlaid. A SPOCS web application is freely available at http://cbb.pnnl.gov/portal/tools/spocs.html. Source code for Linux systems is also freely available under an open source license at http://cbb.pnnl.gov/portal/software/spocs.html; the Boost C++ libraries and BLAST are required.
Merino, Emilio F; Fernandez-Becerra, Carmen; Madeira, Alda M B N; Machado, Ariane L; Durham, Alan; Gruber, Arthur; Hall, Neil; del Portillo, Hernando A
2003-07-21
Plasmodium vivax is the most widely distributed human malaria, responsible for 70-80 million clinical cases each year and large socio-economical burdens for countries such as Brazil where it is the most prevalent species. Unfortunately, due to the impossibility of growing this parasite in continuous in vitro culture, research on P. vivax remains largely neglected. A pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of P. vivax was performed. To do so, 1,184 clones from a cDNA library constructed with parasites obtained from 10 different human patients in the Brazilian Amazon were sequenced. Sequences were automatedly processed to remove contaminants and low quality reads. A total of 806 sequences with an average length of 586 bp met such criteria and their clustering revealed 666 distinct events. The consensus sequence of each cluster and the unique sequences of the singlets were used in similarity searches against different databases that included P. vivax, Plasmodium falciparum, Plasmodium yoelii, Plasmodium knowlesi, Apicomplexa and the GenBank non-redundant database. An E-value of <10(-30) was used to define a significant database match. ESTs were manually assigned a gene ontology (GO) terminology A total of 769 ESTs could be assigned a putative identity based upon sequence similarity to known proteins in GenBank. Moreover, 292 ESTs were annotated and a GO terminology was assigned to 164 of them. These are the first ESTs reported for P. vivax and, as such, they represent a valuable resource to assist in the annotation of the P. vivax genome currently being sequenced. Moreover, since the GC-content of the P. vivax genome is strikingly different from that of P. falciparum, these ESTs will help in the validation of gene predictions for P. vivax and to create a gene index of this malaria parasite.
Preparation of Term Papers Based upon a Research-Process Model.
ERIC Educational Resources Information Center
Feldmann, Rodney Mansfield; Schloman, Barbara Frick
1990-01-01
Described is an alternative method of term paper preparation which provides a step-by-step sequence of assignments and provides feedback to the students at all stages in the preparation of the report. An example of this model is provided including 13 sequential assignments. (CW)
Xu, Dong; Zhang, Yang
2013-01-01
Genome-wide protein structure prediction and structure-based function annotation have been a long-term goal in molecular biology but not yet become possible due to difficulties in modeling distant-homology targets. We developed a hybrid pipeline combining ab initio folding and template-based modeling for genome-wide structure prediction applied to the Escherichia coli genome. The pipeline was tested on 43 known sequences, where QUARK-based ab initio folding simulation generated models with TM-score 17% higher than that by traditional comparative modeling methods. For 495 unknown hard sequences, 72 are predicted to have a correct fold (TM-score > 0.5) and 321 have a substantial portion of structure correctly modeled (TM-score > 0.35). 317 sequences can be reliably assigned to a SCOP fold family based on structural analogy to existing proteins in PDB. The presented results, as a case study of E. coli, represent promising progress towards genome-wide structure modeling and fold family assignment using state-of-the-art ab initio folding algorithms. PMID:23719418
NASA Astrophysics Data System (ADS)
Yang, Peng; Peng, Yongfei; Ye, Bin; Miao, Lixin
2017-09-01
This article explores the integrated optimization problem of location assignment and sequencing in multi-shuttle automated storage/retrieval systems under the modified 2n-command cycle pattern. The decision of storage and retrieval (S/R) location assignment and S/R request sequencing are jointly considered. An integer quadratic programming model is formulated to describe this integrated optimization problem. The optimal travel cycles for multi-shuttle S/R machines can be obtained to process S/R requests in the storage and retrieval request order lists by solving the model. The small-sized instances are optimally solved using CPLEX. For large-sized problems, two tabu search algorithms are proposed, in which the first come, first served and nearest neighbour are used to generate initial solutions. Various numerical experiments are conducted to examine the heuristics' performance and the sensitivity of algorithm parameters. Furthermore, the experimental results are analysed from the viewpoint of practical application, and a parameter list for applying the proposed heuristics is recommended under different real-life scenarios.
Lu, Ling; Li, Chunhua; Yuan, Jie; Lu, Teng; Okamoto, Hiroaki; Murphy, Donald G
2013-03-01
We characterized the full-length genomes of five distinct hepatitis C virus (HCV)-3 isolates. These represent the first complete genomes for subtypes 3g and 3h, the second such genomes for 3k and 3i, and of one novel variant presently not assigned to a subtype. Each genome was determined from 18-25 overlapping fragments. They had lengths of 9579-9660 nt and each contained a single ORF encoding 3020-3025 aa. They were isolated from five patients residing in Canada; four were of Asian origin and one was of Somali origin. Phylogenetic analysis using 64 partial NS5B sequences differentiated 10 assigned subtypes, 3a-3i and 3k, and two additional lineages within genotype 3. From the data of this study, HCV-3 full-length sequences are now available for six of the assigned subtypes and one unassigned. Our findings should add insights to HCV evolutionary studies and clinical applications.
2012-01-01
Background Cotton is the world’s most important natural textile fiber and a significant oilseed crop. Decoding cotton genomes will provide the ultimate reference and resource for research and utilization of the species. Integration of high-density genetic maps with genomic sequence information will largely accelerate the process of whole-genome assembly in cotton. Results In this paper, we update a high-density interspecific genetic linkage map of allotetraploid cultivated cotton. An additional 1,167 marker loci have been added to our previously published map of 2,247 loci. Three new marker types, InDel (insertion-deletion) and SNP (single nucleotide polymorphism) developed from gene information, and REMAP (retrotransposon-microsatellite amplified polymorphism), were used to increase map density. The updated map consists of 3,414 loci in 26 linkage groups covering 3,667.62 cM with an average inter-locus distance of 1.08 cM. Furthermore, genome-wide sequence analysis was finished using 3,324 informative sequence-based markers and publicly-available Gossypium DNA sequence information. A total of 413,113 EST and 195 BAC sequences were physically anchored and clustered by 3,324 sequence-based markers. Of these, 14,243 ESTs and 188 BACs from different species of Gossypium were clustered and specifically anchored to the high-density genetic map. A total of 2,748 candidate unigenes from 2,111 ESTs clusters and 63 BACs were mined for functional annotation and classification. The 337 ESTs/genes related to fiber quality traits were integrated with 132 previously reported cotton fiber quality quantitative trait loci, which demonstrated the important roles in fiber quality of these genes. Higher-level sequence conservation between different cotton species and between the A- and D-subgenomes in tetraploid cotton was found, indicating a common evolutionary origin for orthologous and paralogous loci in Gossypium. Conclusion This study will serve as a valuable genomic resource for tetraploid cotton genome assembly, for cloning genes related to superior agronomic traits, and for further comparative genomic analyses in Gossypium. PMID:23046547
Barling, Adam; Swaminathan, Kankshita; Mitros, Therese; James, Brandon T; Morris, Juliette; Ngamboma, Ornella; Hall, Megan C; Kirkpatrick, Jessica; Alabady, Magdy; Spence, Ashley K; Hudson, Matthew E; Rokhsar, Daniel S; Moose, Stephen P
2013-12-09
The Miscanthus genus of perennial C4 grasses contains promising biofuel crops for temperate climates. However, few genomic resources exist for Miscanthus, which limits understanding of its interesting biology and future genetic improvement. A comprehensive catalog of expressed sequences were generated from a variety of Miscanthus species and tissue types, with an emphasis on characterizing gene expression changes in spring compared to fall rhizomes. Illumina short read sequencing technology was used to produce transcriptome sequences from different tissues and organs during distinct developmental stages for multiple Miscanthus species, including Miscanthus sinensis, Miscanthus sacchariflorus, and their interspecific hybrid Miscanthus × giganteus. More than fifty billion base-pairs of Miscanthus transcript sequence were produced. Overall, 26,230 Sorghum gene models (i.e., ~ 96% of predicted Sorghum genes) had at least five Miscanthus reads mapped to them, suggesting that a large portion of the Miscanthus transcriptome is represented in this dataset. The Miscanthus × giganteus data was used to identify genes preferentially expressed in a single tissue, such as the spring rhizome, using Sorghum bicolor as a reference. Quantitative real-time PCR was used to verify examples of preferential expression predicted via RNA-Seq. Contiguous consensus transcript sequences were assembled for each species and annotated using InterProScan. Sequences from the assembled transcriptome were used to amplify genomic segments from a doubled haploid Miscanthus sinensis and from Miscanthus × giganteus to further disentangle the allelic and paralogous variations in genes. This large expressed sequence tag collection creates a valuable resource for the study of Miscanthus biology by providing detailed gene sequence information and tissue preferred expression patterns. We have successfully generated a database of transcriptome assemblies and demonstrated its use in the study of genes of interest. Analysis of gene expression profiles revealed biological pathways that exhibit altered regulation in spring compared to fall rhizomes, which are consistent with their different physiological functions. The expression profiles of the subterranean rhizome provides a better understanding of the biological activities of the underground stem structures that are essentials for perenniality and the storage or remobilization of carbon and nutrient resources.
Howe, Daniel K; Gaji, Rajshekhar Y; Mroz-Barrett, Meaghan; Gubbels, Marc-Jan; Striepen, Boris; Stamper, Shelby
2005-02-01
Sarcocystis neurona is a member of the Apicomplexa that causes myelitis and encephalitis in horses but normally cycles between the opossum and small mammals. Analysis of an S. neurona expressed sequence tag (EST) database revealed four paralogous proteins that exhibit clear homology to the family of surface antigens (SAGs) and SAG-related sequences of Toxoplasma gondii. The primary peptide sequences of the S. neurona proteins are consistent with the two-domain structure that has been described for the T. gondii SAGs, and each was predicted to have an amino-terminal signal peptide and a carboxyl-terminal glycolipid anchor addition site, suggesting surface localization. All four proteins were confirmed to be membrane associated and displayed on the surface of S. neurona merozoites. Due to their surface localization and homology to T. gondii surface antigens, these S. neurona proteins were designated SnSAG1, SnSAG2, SnSAG3, and SnSAG4. Consistent with their homology, the SnSAGs elicited a robust immune response in infected and immunized animals, and their conserved structure further suggests that the SnSAGs similarly serve as adhesins for attachment to host cells. Whether the S. neurona SAG family is as extensive as the T. gondii SAG family remains unresolved, but it is probable that additional SnSAGs will be revealed as more S. neurona ESTs are generated. The existence of an SnSAG family in S. neurona indicates that expression of multiple related surface antigens is not unique to the ubiquitous organism T. gondii. Instead, the SAG gene family is a common trait that presumably has an essential, conserved function(s).
Comparative transgenic analysis of enhancers from the human SHOX and mouse Shox2 genomic regions.
Rosin, Jessica M; Abassah-Oppong, Samuel; Cobb, John
2013-08-01
Disruption of presumptive enhancers downstream of the human SHOX gene (hSHOX) is a frequent cause of the zeugopodal limb defects characteristic of Léri-Weill dyschondrosteosis (LWD). The closely related mouse Shox2 gene (mShox2) is also required for limb development, but in the more proximal stylopodium. In this study, we used transgenic mice in a comparative approach to characterize enhancer sequences in the hSHOX and mShox2 genomic regions. Among conserved noncoding elements (CNEs) that function as enhancers in vertebrate genomes, those that are maintained near paralogous genes are of particular interest given their ancient origins. Therefore, we first analyzed the regulatory potential of a genomic region containing one such duplicated CNE (dCNE) downstream of mShox2 and hSHOX. We identified a strong limb enhancer directly adjacent to the mShox2 dCNE that recapitulates the expression pattern of the endogenous gene. Interestingly, this enhancer requires sequences only conserved in the mammalian lineage in order to drive strong limb expression, whereas the more deeply conserved sequences of the dCNE function as a neural enhancer. Similarly, we found that a conserved element downstream of hSHOX (CNE9) also functions as a neural enhancer in transgenic mice. However, when the CNE9 transgenic construct was enlarged to include adjacent, non-conserved sequences frequently deleted in LWD patients, the transgene drove expression in the zeugopodium of the limbs. Therefore, both hSHOX and mShox2 limb enhancers are coupled to distinct neural enhancers. This is the first report demonstrating the activity of cis-regulatory elements from the hSHOX and mShox2 genomic regions in mammalian embryos.
Mitochondrial DNA of Vitis vinifera and the issue of rampant horizontal gene transfer.
Goremykin, Vadim V; Salamini, Francesco; Velasco, Riccardo; Viola, Roberto
2009-01-01
The mitochondrial genome of grape (Vitis vinifera), the largest organelle genome sequenced so far, is presented. The genome is 773,279 nt long and has the highest coding capacity among known angiosperm mitochondrial DNAs (mtDNAs). The proportion of promiscuous DNA of plastid origin in the genome is also the largest ever reported for an angiosperm mtDNA, both in absolute and relative terms. In all, 42.4% of chloroplast genome of Vitis has been incorporated into its mitochondrial genome. In order to test if horizontal gene transfer (HGT) has also contributed to the gene content of the grape mtDNA, we built phylogenetic trees with the coding sequences of mitochondrial genes of grape and their homologs from plant mitochondrial genomes. Many incongruent gene tree topologies were obtained. However, the extent of incongruence between these gene trees is not significantly greater than that observed among optimal trees for chloroplast genes, the common ancestry of which has never been in doubt. In both cases, we attribute this incongruence to artifacts of tree reconstruction, insufficient numbers of characters, and gene paralogy. This finding leads us to question the recent phylogenetic interpretation of Bergthorsson et al. (2003, 2004) and Richardson and Palmer (2007) that rampant HGT into the mtDNA of Amborella best explains phylogenetic incongruence between mitochondrial gene trees for angiosperms. The only evidence for HGT into the Vitis mtDNA found involves fragments of two coding sequences stemming from two closteroviruses that cause the leaf roll disease of this plant. We also report that analysis of sequences shared by both chloroplast and mitochondrial genomes provides evidence for a previously unknown gene transfer route from the mitochondrion to the chloroplast.
Comparative Genomics of Carp Herpesviruses
Kurobe, Tomofumi; Gatherer, Derek; Cunningham, Charles; Korf, Ian; Fukuda, Hideo; Hedrick, Ronald P.; Waltzek, Thomas B.
2013-01-01
Three alloherpesviruses are known to cause disease in cyprinid fish: cyprinid herpesviruses 1 and 3 (CyHV1 and CyHV3) in common carp and koi and cyprinid herpesvirus 2 (CyHV2) in goldfish. We have determined the genome sequences of CyHV1 and CyHV2 and compared them with the published CyHV3 sequence. The CyHV1 and CyHV2 genomes are 291,144 and 290,304 bp, respectively, in size, and thus the CyHV3 genome, at 295,146 bp, remains the largest recorded among the herpesviruses. Each of the three genomes consists of a unique region flanked at each terminus by a sizeable direct repeat. The CyHV1, CyHV2, and CyHV3 genomes are predicted to contain 137, 150, and 155 unique, functional protein-coding genes, respectively, of which six, four, and eight, respectively, are duplicated in the terminal repeat. The three viruses share 120 orthologous genes in a largely colinear arrangement, of which up to 55 are also conserved in the other member of the genus Cyprinivirus, anguillid herpesvirus 1. Twelve genes are conserved convincingly in all sequenced alloherpesviruses, and two others are conserved marginally. The reference CyHV3 strain has been reported to contain five fragmented genes that are presumably nonfunctional. The CyHV2 strain has two fragmented genes, and the CyHV1 strain has none. CyHV1, CyHV2, and CyHV3 have five, six, and five families of paralogous genes, respectively. One family unique to CyHV1 is related to cellular JUNB, which encodes a transcription factor involved in oncogenesis. To our knowledge, this is the first time that JUNB-related sequences have been reported in a herpesvirus. PMID:23269803
Hanukoglu, Israel; Hanukoglu, Aaron
2016-01-01
The epithelial sodium channel (ENaC) is composed of three homologous subunits and allows the flow of Na+ ions across high resistance epithelia, maintaining body salt and water homeostasis. ENaC dependent reabsorption of Na+ in the kidney tubules regulates extracellular fluid (ECF) volume and blood pressure by modulating osmolarity. In multi-ciliated cells, ENaC is located in cilia and plays an essential role in the regulation of epithelial surface liquid volume necessary for cilial transport of mucus and gametes in the respiratory and reproductive tracts respectively. The subunits that form ENaC (named as alpha, beta, gamma and delta, encoded by genes SCNN1A, SCNN1B, SCNN1G, and SCNN1D) are members of the ENaC/Degenerin superfamily. The earliest appearance of ENaC orthologs is in the genomes of the most ancient vertebrate taxon, Cyclostomata (jawless vertebrates) including lampreys, followed by earliest representatives of Gnathostomata (jawed vertebrates) including cartilaginous sharks. Among Euteleostomi (bony vertebrates), Actinopterygii (ray finned-fishes) branch has lost ENaC genes. Yet, most animals in the Sarcopterygii (lobe-finned fish) branch including Tetrapoda, amphibians and amniotes (lizards, crocodiles, birds, and mammals), have four ENaC paralogs. We compared the sequences of ENaC orthologs from 20 species and established criteria for the identification of ENaC orthologs and paralogs, and their distinction from other members of the ENaC/Degenerin superfamily, especially ASIC family. Differences between ENaCs and ASICs are summarized in view of their physiological functions and tissue distributions. Structural motifs that are conserved throughout vertebrate ENaCs are highlighted. We also present a comparative overview of the genotype-phenotype relationships in inherited diseases associated with ENaC mutations, including multisystem pseudohypoaldosteronism (PHA1B), Liddle syndrome, cystic fibrosis-like disease and essential hypertension. PMID:26772908
Zhang, Zhongbao; Li, Xianglong; Zhang, Chun; Zou, Huawen; Wu, Zhongyi
2016-09-16
NUCLEAR FACTOR-Y (NF-Y) has been shown to play an important role in growth, development, and response to environmental stress. A NF-Y complex, which consists of three subunits, NF-YA, NF-YB, and, NF-YC, binds to CCAAT sequences in a promoter to control the expression of target genes. Although NF-Y proteins have been reported in Arabidopsis and rice, a comprehensive and systematic analysis of ZmNF-Y genes has not yet been performed. To examine the functions of ZmNF-Y genes in this family, we isolated and characterized 50 ZmNF-Y (14 ZmNF-YA, 18 ZmNF-YB, and 18 ZmNF-YC) genes in an analysis of the maize genome. The 50 ZmNF-Y genes were distributed on all 10 maize chromosomes, and 12 paralogs were identified. Multiple alignments showed that maize ZmNF-Y family proteins had conserved regions and relatively variable N-terminal or C-terminal domains. The comparative syntenic map illustrated 40 paralogous NF-Y gene pairs among the 10 maize chromosomes. Microarray data showed that the ZmNF-Y genes had tissue-specific expression patterns in various maize developmental stages and in response to biotic and abiotic stresses. The results suggested that ZmNF-YB2, 4, 8, 10, 13, and 16 and ZmNF-YC6, 8, and 15 were induced, while ZmNF-YA1, 3, 4, 6, 7, 10, 12, and 13, ZmNF-YB15, and ZmNF-YC3 and 9 were suppressed by drought stress. ZmNF-YA3, ZmNF-YA8 and ZmNF-YA12 were upregulated after infection by the three pathogens, while ZmNF-YA1 and ZmNF-YB2 were suppressed. These results indicate that the ZmNF-Ys may have significant roles in the response to abiotic and biotic stresses. Copyright © 2016 Elsevier Inc. All rights reserved.
Le, My Thanh; van Veldhuizen, Mart; Porcelli, Ida; Bongaerts, Roy J.; Gaskin, Duncan J. H.; Pearson, Bruce M.; van Vliet, Arnoud H. M.
2015-01-01
Assembly of flagella requires strict hierarchical and temporal control via flagellar sigma and anti-sigma factors, regulatory proteins and the assembly complex itself, but to date non-coding RNAs (ncRNAs) have not been described to regulate genes directly involved in flagellar assembly. In this study we have investigated the possible role of two ncRNA paralogs (CjNC1, CjNC4) in flagellar assembly and gene regulation of the diarrhoeal pathogen Campylobacter jejuni. CjNC1 and CjNC4 are 37/44 nt identical and predicted to target the 5' untranslated region (5' UTR) of genes transcribed from the flagellar sigma factor σ54. Orthologs of the σ54-dependent 5' UTRs and ncRNAs are present in the genomes of other thermophilic Campylobacter species, and transcription of CjNC1 and CNC4 is dependent on the flagellar sigma factor σ28. Surprisingly, inactivation and overexpression of CjNC1 and CjNC4 did not affect growth, motility or flagella-associated phenotypes such as autoagglutination. However, CjNC1 and CjNC4 were able to mediate sequence-dependent, but Hfq-independent, partial repression of fluorescence of predicted target 5' UTRs in an Escherichia coli-based GFP reporter gene system. This hints towards a subtle role for the CjNC1 and CjNC4 ncRNAs in post-transcriptional gene regulation in thermophilic Campylobacter species, and suggests that the currently used phenotypic methodologies are insufficiently sensitive to detect such subtle phenotypes. The lack of a role of Hfq in the E. coli GFP-based system indicates that the CjNC1 and CjNC4 ncRNAs may mediate post-transcriptional gene regulation in ways that do not conform to the paradigms obtained from the Enterobacteriaceae. PMID:26512728
Dysregulation of the mitogen granulin in human cancer through the miR-15/107 microRNA gene group
Wang, Wang-Xia; Kyprianou, Natasha; Wang, Xiaowei; Nelson, Peter T.
2010-01-01
Granulin (GRN) is a potent mitogen and growth factor implicated in many human cancers, but its regulation is poorly understood. Recent findings indicate that GRN is regulated strongly by the microRNA miR-107, which functionally overlap with miR-15, miR-16, and miR-195 due to a common 5' sequence critical for target specificity. In this study, we queried whether miR-107 and paralogs regulated GRN in human cancers. In cultured cells, anti-Argonaute RIP-ChIP experiments indicate that GRN mRNA is directly targeted by numerous miR-15/107 miRNAs. Further tests of this association in human tumors. MiR-15 and miR-16 are known to be downregulated in chronic lymphocytic leukemia (CLL). Using pre-existing microarray datasets, we found that GRN expression is higher in CLL relative to non-neoplastic lymphocytes (P>0.00001). By contrast, other prospective miR-15/miR-16 targets in the dataset (BCL-2 and cyclin D1) were not up-regulated in CLL. Unlike in CLL, GRN was not up-regulated in chronic myelogenous leukemia (CML) where miR-107 paralogs are not known to be dysregulated. Prior studies have shown that GRN is also up-regulated, and miR-107 down-regulated, in prostate carcinoma. Our results indicate that multiple members of the miR-107 gene group indeed repress GRN protein levels when transfected into prostate cancer cells. At least a dozen distinct types of cancer have the pattern of increased GRN and decreased miR-107 expression. These findings indicate for the first time that the mitogen and growth factor GRN is dysregulated via the miR-15/107 gene group in multiple human cancers, which may provide a potential common therapeutic target. PMID:20884628
Phan, Isabelle Q. H.; Scheib, Holger; Subramanian, Sandhya; Edwards, Thomas E.; Lehman, Stephanie S.; Piitulainen, Hanna; Sayeedur Rahman, M.; Rennoll-Bankert, Kristen E.; Staker, Bart L.; Taira, Suvi; Stacy, Robin; Myler, Peter J.; Azad, Abdu F.
2015-01-01
ABSTRACT Prokaryotes use type IV secretion systems (T4SSs) to translocate substrates (e.g., nucleoprotein, DNA, and protein) and/or elaborate surface structures (i.e., pili or adhesins). Bacterial genomes may encode multiple T4SSs, e.g., there are three functionally divergent T4SSs in some Bartonella species (vir, vbh, and trw). In a unique case, most rickettsial species encode a T4SS (rvh) enriched with gene duplication. Within single genomes, the evolutionary and functional implications of cross-system interchangeability of analogous T4SS protein components remains poorly understood. To lend insight into cross-system interchangeability, we analyzed the VirB8 family of T4SS channel proteins. Crystal structures of three VirB8 and two TrwG Bartonella proteins revealed highly conserved C-terminal periplasmic domain folds and dimerization interfaces, despite tremendous sequence divergence. This implies remarkable structural constraints for VirB8 components in the assembly of a functional T4SS. VirB8/TrwG heterodimers, determined via bacterial two-hybrid assays and molecular modeling, indicate that differential expression of trw and vir systems is the likely barrier to VirB8-TrwG interchangeability. We also determined the crystal structure of Rickettsia typhi RvhB8-II and modeled its coexpressed divergent paralog RvhB8-I. Remarkably, while RvhB8-I dimerizes and is structurally similar to other VirB8 proteins, the RvhB8-II dimer interface deviates substantially from other VirB8 structures, potentially preventing RvhB8-I/RvhB8-II heterodimerization. For the rvh T4SS, the evolution of divergent VirB8 paralogs implies a functional diversification that is unknown in other T4SSs. Collectively, our data identify two different constraints (spatiotemporal for Bartonella trw and vir T4SSs and structural for rvh T4SSs) that mediate the functionality of multiple divergent T4SSs within a single bacterium. PMID:26646013
The Use of a Sequenced Questioning Paradigm to Facilitate Associative Fluency in Preschoolers.
ERIC Educational Resources Information Center
Pellegrini, A. D.; Greene, Helen
The extent to which free play versus sequenced questioning conditions facilitates preschoolers' associative fluency was investigated in this study. Twenty-four children (12 boys and 12 girls, with a mean age of 50.7 months) were randomly assigned to one of three conditions: free play, sequenced questioning, and control. In the sequenced…
USDA-ARS?s Scientific Manuscript database
The concept of utilizing putative and unique gene sequences for the design of species specific probes was tested. The abundance profile of assigned functions within the Lactobacillus plantarum genome was used for the identification of the putative and unique gene sequence, csh. The targeted gene (cs...
USDA-ARS?s Scientific Manuscript database
The Kauffman White (KW) serotyping method requires more than 250 antisera to characterize more than 2,500 Salmonella serovars. The complexity of serotyping could be overcome using molecular methods. In this study, a dkgB-linked intergenic sequence ribotyping (ISR) method that generates sequence occu...
Assessing the Impact of Sequencing Practicums for Welding in Agricultural Mechanics
ERIC Educational Resources Information Center
Rose, Malcolm; Pate, Michael L.; Lawver, Rebecca G.; Warnick, Brian K.; Dai, Xin
2015-01-01
This study examined the impact of sequencing practicums for welding on students' ability to perform a 1F (flat position-fillet lap joint) weld on low-carbon steel. Participants were randomly assigned a specific practice sequence of welding for using gas metal arc welding (GMAW) and shielded metal arc welding (SMAW). A total of 71 participants…
Genomic Sequence of the WHO International Standard for Hepatitis A Virus RNA.
Jenkins, Adrian; Minhas, Rehan; Morris, Clare; Berry, Neil
2018-05-10
The World Health Organization (WHO) international standard for hepatitis A virus (HAV) RNA nucleic acid assays was characterized by complete genome sequencing. The entire coding sequence and noncoding regions were assigned HAV genotype IB. This information will aid the design, development, and evaluation of HAV RNA amplification assays. Copyright © 2018 Jenkins et al.
Genomic Analysis of Complex Microbial Communities in Wounds
2009-07-01
Actinobacteria — were the most commonly misclassified [25]. The 16S sequences used in the current study were all greater than or equal to 200 bases...with most (89.1%) of the sequences falling into Firmicutes, Proteobac- teria, and Actinobacteria phyla. High percentages of the Firmicutes and... Actinobacteria sequences were successfully assigned to the genus level, 88.0% and 82.3%, respectively; however, only 53.0% of the Proteobacteria sequences
Ma, Jianmin; Eisenhaber, Frank; Maurer-Stroh, Sebastian
2013-12-01
Beta lactams comprise the largest and still most effective group of antibiotics, but bacteria can gain resistance through different beta lactamases that can degrade these antibiotics. We developed a user friendly tree building web server that allows users to assign beta lactamase sequences to their respective molecular classes and subclasses. Further clinically relevant information includes if the gene is typically chromosomal or transferable through plasmids as well as listing the antibiotics which the most closely related reference sequences are known to target and cause resistance against. This web server can automatically build three phylogenetic trees: the first tree with closely related sequences from a Tachyon search against the NCBI nr database, the second tree with curated reference beta lactamase sequences, and the third tree built specifically from substrate binding pocket residues of the curated reference beta lactamase sequences. We show that the latter is better suited to recover antibiotic substrate assignments through nearest neighbor annotation transfer. The users can also choose to build a structural model for the query sequence and view the binding pocket residues of their query relative to other beta lactamases in the sequence alignment as well as in the 3D structure relative to bound antibiotics. This web server is freely available at http://blac.bii.a-star.edu.sg/.
Scheuch, Matthias; Höper, Dirk; Beer, Martin
2015-03-03
Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.
Logic system aids in evaluation of project readiness
NASA Technical Reports Server (NTRS)
Maris, S. J.; Obrien, T. J.
1966-01-01
Measurement Operational Readiness Requirements /MORR/ assignments logic is used for determining the readiness of a complex project to go forward as planned. The system used logic network which assigns qualities to all important criteria in a project and establishes a logical sequence of measurements to determine what the conditions are.
Gold, Nicola D; Jackson, Richard M
2006-02-03
The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.
ERIC Educational Resources Information Center
Miner, Carol; della Villa, Paula
1997-01-01
Describes an activity in which students reverse-translate proteins from their amino acid sequences back to their DNA sequences then assign musical notes to represent the adenine, guanine, cytosine, and thymine bases. Data is obtained from the National Institutes of Health (NIH) on the Internet. (DDR)
Du, Xinxin; Liu, Yuezhong; Liu, Jinxiang; Zhang, Quanqi
2016-01-01
Following the two rounds of whole-genome duplication (WGD) during deuterosome evolution, a third genome duplication occurred in the ray-fined fish lineage and is considered to be responsible for the teleost-specific lineage diversification and regulation mechanisms. As a receptor-regulated SMAD (R-SMAD), the function of SMAD3 was widely studied in mammals. However, limited information of its role or putative paralogs is available in ray-finned fishes. In this study, two SMAD3 paralogs were first identified in the transcriptome and genome of Japanese flounder (Paralichthys olivaceus). We also explored SMAD3 duplication in other selected species. Following identification, genomic structure, phylogenetic reconstruction, and synteny analyses performed by MrBayes and online bioinformatic tools confirmed that smad3a/3b most likely originated from the teleost-specific WGD. Additionally, selection pressure analysis and expression pattern of the two genes performed by PAML and quantitative real-time PCR (qRT-PCR) revealed evidence of subfunctionalization of the two SMAD3 paralogs in teleost. Our results indicate that two SMAD3 genes originate from teleost-specific WGD, remain transcriptionally active, and may have likely undergone subfunctionalization. This study provides novel insights to the evolution fates of smad3a/3b and draws attentions to future function analysis of SMAD3 gene family. PMID:27703851
Tu, Ho-Chou; Schwitalla, Sarah; Qian, Zhirong; LaPier, Grace S.; Yermalovich, Alena; Ku, Yuan-Chieh; Chen, Shann-Ching; Viswanathan, Srinivas R.; Zhu, Hao; Nishihara, Reiko; Inamura, Kentaro; Kim, Sun A.; Morikawa, Teppei; Mima, Kosuke; Sukawa, Yasutaka; Yang, Juhong; Meredith, Gavin; Fuchs, Charles S.; Ogino, Shuji
2015-01-01
Colorectal cancer (CRC) remains a major contributor to cancer-related mortality. LIN28A and LIN28B are highly related RNA-binding protein paralogs that regulate biogenesis of let-7 microRNAs and influence development, metabolism, tissue regeneration, and oncogenesis. Here we demonstrate that overexpression of either LIN28 paralog cooperates with the Wnt pathway to promote invasive intestinal adenocarcinoma in murine models. When LIN28 alone is induced genetically, half of the resulting tumors harbor Ctnnb1 (β-catenin) mutation. When overexpressed in ApcMin/+ mice, LIN28 accelerates tumor formation and enhances proliferation and invasiveness. In conditional genetic models, enforced expression of a LIN28-resistant form of the let-7 microRNA reduces LIN28-induced tumor burden, while silencing of LIN28 expression reduces tumor volume and increases tumor differentiation, indicating that LIN28 contributes to tumor maintenance. We detected aberrant expression of LIN28A and/or LIN28B in 38% of a large series of human CRC samples (n = 595), where LIN28 expression levels were associated with invasive tumor growth. Our late-stage CRC murine models and analysis of primary human tumors demonstrate prominent roles for both LIN28 paralogs in promoting CRC growth and progression and implicate the LIN28/let-7 pathway as a therapeutic target. PMID:25956904
Bessho-Uehara, Manabu; Konishi, Kaori; Oba, Yuichi
2017-08-09
Two paralogous genes of firefly luciferase, Luc1 and Luc2, have been isolated from the species in two subfamilies, Luciolinae and Photurinae, of the family Lampyridae. The gene expression profiles have previously been examined only in the species of Luciolinae. Here we isolated Luc1 and Luc2 genes from the Japanese firefly Pyrocoelia atripennis. This is the first report of the presence of both Luc1 and Luc2 genes in the species of the subfamily Lampyrinae and of the exon-intron structure of Luc2 in the family Lampyridae. The luminescence of both gene products peaked at 547 nm under neutral buffer conditions, and the spectrum of Luc1, but not Luc2, was red-shifted under acidic conditions, as observed for Luc2 in the Luciolinae species. The semi-quantitative reverse transcription-polymerase chain reaction suggested that Luc1 was expressed in lanterns of all the stages except eggs, while Luc2 was expressed in the non-lantern bodies of eggs, prepupae, pupae, and female adults. These expression profiles are consistent with those in the Luciolinae species. Considering the distant phylogenetic relationship between Lampyrinae and Luciolinae in Lampyridae, we propose that fireflies generally possess two different luciferase genes and the biochemical properties and gene expression profiles for each paralog are conserved among lampyrid species.
Evolution of the Class IV HD-Zip Gene Family in Streptophytes
Zalewski, Christopher S.; Floyd, Sandra K.; Furumizu, Chihiro; Sakakibara, Keiko; Stevenson, Dennis W.; Bowman, John L.
2013-01-01
Class IV homeodomain leucine zipper (C4HDZ) genes are plant-specific transcription factors that, based on phenotypes in Arabidopsis thaliana, play an important role in epidermal development. In this study, we sampled all major extant lineages and their closest algal relatives for C4HDZ homologs and phylogenetic analyses result in a gene tree that mirrors land plant evolution with evidence for gene duplications in many lineages, but minimal evidence for gene losses. Our analysis suggests an ancestral C4HDZ gene originated in an algal ancestor of land plants and a single ancestral gene was present in the last common ancestor of land plants. Independent gene duplications are evident within several lineages including mosses, lycophytes, euphyllophytes, seed plants, and, most notably, angiosperms. In recently evolved angiosperm paralogs, we find evidence of pseudogenization via mutations in both coding and regulatory sequences. The increasing complexity of the C4HDZ gene family through the diversification of land plants correlates to increasing complexity in epidermal characters. PMID:23894141
Verde, Ignazio; Abbott, Albert G; Scalabrin, Simone; Jung, Sook; Shu, Shengqiang; Marroni, Fabio; Zhebentyayeva, Tatyana; Dettori, Maria Teresa; Grimwood, Jane; Cattonaro, Federica; Zuccolo, Andrea; Rossini, Laura; Jenkins, Jerry; Vendramin, Elisa; Meisel, Lee A; Decroocq, Veronique; Sosinski, Bryon; Prochnik, Simon; Mitros, Therese; Policriti, Alberto; Cipriani, Guido; Dondini, Luca; Ficklin, Stephen; Goodstein, David M; Xuan, Pengfei; Del Fabbro, Cristian; Aramini, Valeria; Copetti, Dario; Gonzalez, Susana; Horner, David S; Falchi, Rachele; Lucas, Susan; Mica, Erica; Maldonado, Jonathan; Lazzari, Barbara; Bielenberg, Douglas; Pirona, Raul; Miculan, Mara; Barakat, Abdelali; Testolin, Raffaele; Stella, Alessandra; Tartarini, Stefano; Tonutti, Pietro; Arús, Pere; Orellana, Ariel; Wells, Christina; Main, Dorrie; Vizzotto, Giannina; Silva, Herman; Salamini, Francesco; Schmutz, Jeremy; Morgante, Michele; Rokhsar, Daniel S
2013-05-01
Rosaceae is the most important fruit-producing clade, and its key commercially relevant genera (Fragaria, Rosa, Rubus and Prunus) show broadly diverse growth habits, fruit types and compact diploid genomes. Peach, a diploid Prunus species, is one of the best genetically characterized deciduous trees. Here we describe the high-quality genome sequence of peach obtained from a completely homozygous genotype. We obtained a complete chromosome-scale assembly using Sanger whole-genome shotgun methods. We predicted 27,852 protein-coding genes, as well as noncoding RNAs. We investigated the path of peach domestication through whole-genome resequencing of 14 Prunus accessions. The analyses suggest major genetic bottlenecks that have substantially shaped peach genome diversity. Furthermore, comparative analyses showed that peach has not undergone recent whole-genome duplication, and even though the ancestral triplicated blocks in peach are fragmentary compared to those in grape, all seven paleosets of paralogs from the putative paleoancestor are detectable.
Orthologs, paralogs and genome comparisons
NASA Technical Reports Server (NTRS)
Gogarten, J. P.; Olendzenski, L.
1999-01-01
During the past decade, ancient gene duplications were recognized as one of the main forces in the generation of diverse gene families and the creation of new functional capabilities. New tools developed to search data banks for homologous sequences, and an increased availability of reliable three-dimensional structural information led to the recognition that proteins with diverse functions can belong to the same superfamily. Analyses of the evolution of these superfamilies promises to provide insights into early evolution but are complicated by several important evolutionary processes. Horizontal transfer of genes can lead to a vertical spread of innovations among organisms, therefore finding a certain property in some descendants of an ancestor does not guarantee that it was present in that ancestor. Complete or partial gene conversion between duplicated genes can yield phylogenetic trees with several, apparently independent gene duplications, suggesting an often surprising parallelism in the evolution of independent lineages. Additionally, the breakup of domains within a protein and the fusion of domains into multifunctional proteins makes the delineation of superfamilies a task that remains difficult to automate.
Evolutionary history of the ABCB2 genomic region in teleosts
Palti, Y.; Rodriguez, M.F.; Gahr, S.A.; Hansen, J.D.
2007-01-01
Gene duplication, silencing and translocation have all been implicated in shaping the unique genomic architecture of the teleost MH regions. Previously, we demonstrated that trout possess five unlinked regions encoding MH genes. One of these regions harbors ABCB2 which in all other vertebrate classes is found in the MHC class II region. In this study, we sequenced a BAC contig for the trout ABCB2 region. Analysis of this region revealed the presence of genes homologous to those located in the human class II (ABCB2, BRD2, ??DAA), extended class II (RGL2, PHF1, SYGP1) and class III (PBX2, Notch-L) regions. The organization and syntenic relationships of this region were then compared to similar regions in humans, Tetraodon and zebrafish to learn more about the evolutionary history of this region. Our analysis indicates that this region was generated during the teleost-specific duplication event while also providing insight about potential MH paralogous regions in teleosts. ?? 2006 Elsevier Ltd. All rights reserved.
Analysis of Structural MtrC Models Based on Homology with the Crystal Structure of MtrF
DOE Office of Scientific and Technical Information (OSTI.GOV)
Edwards, Marcus; Fredrickson, Jim K.; Zachara, John M.
2012-12-01
The outer-membrane decahaem cytochrome MtrC is part of the transmembrane MtrCAB complex required for mineral respiration by Shewanella oneidensis. MtrC has significant sequence similarity to the paralogous decahaem cytochrome MtrF, which has been structurally solved through X-ray crystallography. This now allows for homology-based models of MtrC to be generated. The structure of these MtrC homology models contain ten bis-histidine-co-ordinated c-type haems arranged in a staggered cross through a four-domain structure. This model is consistent with current spectroscopic data and shows that the areas around haem 5 and haem 10, at the termini of an octahaem chain, are likely to havemore » functions similar to those of the corresponding haems in MtrF. The electrostatic surfaces around haem 7, close to the β-barrels, are different in MtrF and MtrC, indicating that these haems may have different potentials and interact with substrates differently.« less
Are there laws of genome evolution?
Koonin, Eugene V
2011-08-01
Research in quantitative evolutionary genomics and systems biology led to the discovery of several universal regularities connecting genomic and molecular phenomic variables. These universals include the log-normal distribution of the evolutionary rates of orthologous genes; the power law-like distributions of paralogous family size and node degree in various biological networks; the negative correlation between a gene's sequence evolution rate and expression level; and differential scaling of functional classes of genes with genome size. The universals of genome evolution can be accounted for by simple mathematical models similar to those used in statistical physics, such as the birth-death-innovation model. These models do not explicitly incorporate selection; therefore, the observed universal regularities do not appear to be shaped by selection but rather are emergent properties of gene ensembles. Although a complete physical theory of evolutionary biology is inconceivable, the universals of genome evolution might qualify as "laws of evolutionary genomics" in the same sense "law" is understood in modern physics.
Batista, Marcelo B; Sfeir, Michelle Z T; Faoro, Helisson; Wassem, Roseli; Steffens, Maria B R; Pedrosa, Fábio O; Souza, Emanuel M; Dixon, Ray; Monteiro, Rose A
2013-01-01
The transcriptional regulatory protein Fnr, acts as an intracellular redox sensor regulating a wide range of genes in response to changes in oxygen levels. Genome sequencing of Herbaspirillum seropedicae SmR1 revealed the presence of three fnr-like genes. In this study we have constructed single, double and triple fnr deletion mutant strains of H. seropedicae. Transcriptional profiling in combination with expression data from reporter fusions, together with spectroscopic analysis, demonstrates that the Fnr1 and Fnr3 proteins not only regulate expression of the cbb3-type respiratory oxidase, but also control the cytochrome content and other component complexes required for the cytochrome c-based electron transport pathway. Accordingly, in the absence of the three Fnr paralogs, growth is restricted at low oxygen tensions and nitrogenase activity is impaired. Our results suggest that the H. seropedicae Fnr proteins are major players in regulating the composition of the electron transport chain in response to prevailing oxygen concentrations.
Batista, Marcelo B.; Sfeir, Michelle Z. T.; Faoro, Helisson; Wassem, Roseli; Steffens, Maria B. R.; Pedrosa, Fábio O.; Souza, Emanuel M.; Dixon, Ray; Monteiro, Rose A.
2013-01-01
The transcriptional regulatory protein Fnr, acts as an intracellular redox sensor regulating a wide range of genes in response to changes in oxygen levels. Genome sequencing of Herbaspirillum seropedicae SmR1 revealed the presence of three fnr-like genes. In this study we have constructed single, double and triple fnr deletion mutant strains of H. seropedicae. Transcriptional profiling in combination with expression data from reporter fusions, together with spectroscopic analysis, demonstrates that the Fnr1 and Fnr3 proteins not only regulate expression of the cbb3-type respiratory oxidase, but also control the cytochrome content and other component complexes required for the cytochrome c-based electron transport pathway. Accordingly, in the absence of the three Fnr paralogs, growth is restricted at low oxygen tensions and nitrogenase activity is impaired. Our results suggest that the H. seropedicae Fnr proteins are major players in regulating the composition of the electron transport chain in response to prevailing oxygen concentrations. PMID:23996052
Barkman, Todd J.; Chenery, Gordon; McNeal, Joel R.; Lyons-Weiler, James; Ellisens, Wayne J.; Moore, Gerry; Wolfe, Andrea D.; dePamphilis, Claude W.
2000-01-01
Plant phylogenetic estimates are most likely to be reliable when congruent evidence is obtained independently from the mitochondrial, plastid, and nuclear genomes with all methods of analysis. Here, results are presented from separate and combined genomic analyses of new and previously published data, including six and nine genes (8,911 bp and 12,010 bp, respectively) for different subsets of taxa that suggest Amborella + Nymphaeales (water lilies) are the first-branching angiosperm lineage. Before and after tree-independent noise reduction, most individual genomic compartments and methods of analysis estimated the Amborella + Nymphaeales basal topology with high support. Previous phylogenetic estimates placing Amborella alone as the first extant angiosperm branch may have been misled because of a series of specific problems with paralogy, suboptimal outgroups, long-branch taxa, and method dependence. Ancestral character state reconstructions differ between the two topologies and affect inferences about the features of early angiosperms. PMID:11069280
Uncommonly isolated clinical Pseudomonas: identification and phylogenetic assignation.
Mulet, M; Gomila, M; Ramírez, A; Cardew, S; Moore, E R B; Lalucat, J; García-Valdés, E
2017-02-01
Fifty-two Pseudomonas strains that were difficult to identify at the species level in the phenotypic routine characterizations employed by clinical microbiology laboratories were selected for genotypic-based analysis. Species level identifications were done initially by partial sequencing of the DNA dependent RNA polymerase sub-unit D gene (rpoD). Two other gene sequences, for the small sub-unit ribosonal RNA (16S rRNA) and for DNA gyrase sub-unit B (gyrB) were added in a multilocus sequence analysis (MLSA) study to confirm the species identifications. These sequences were analyzed with a collection of reference sequences from the type strains of 161 Pseudomonas species within an in-house multi-locus sequence analysis database. Whole-cell matrix-assisted laser-desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) analyses of these strains complemented the DNA sequenced-based phylogenetic analyses and were observed to be in accordance with the results of the sequence data. Twenty-three out of 52 strains were assigned to 12 recognized species not commonly detected in clinical specimens and 29 (56 %) were considered representatives of at least ten putative new species. Most strains were distributed within the P. fluorescens and P. aeruginosa lineages. The value of rpoD sequences in species-level identifications for Pseudomonas is emphasized. The correct species identifications of clinical strains is essential for establishing the intrinsic antibiotic resistance patterns and improved treatment plans.
Killer, Jiří; Skřivanová, Eva; Hochel, Igor; Marounek, Milan
2015-06-01
Cronobacter spp. are bacterial pathogens that affect children and immunocompromised adults. In this study, we used multilocus sequence typing (MLST) to determine sequence types (STs) in 11 Cronobacter spp. strains isolated from retail foods, 29 strains from dust samples obtained from vacuum cleaners, and 4 clinical isolates. Using biochemical tests, species-specific polymerase chain reaction, and MLST analysis, 36 strains were identified as Cronobacter sakazakii, and 6 were identified as Cronobacter malonaticus. In addition, one strain that originated from retail food and one from a dust sample from a vacuum cleaner were identified on the basis of MLST analysis as Cronobacter dublinensis and Cronobacter turicensis, respectively. Cronobacter spp. strains isolated from the retail foods were assigned to eight different MLST sequence types, seven of which were newly identified. The strains isolated from the dust samples were assigned to 7 known STs and 14 unknown STs. Three clinical isolates and one household dust isolate were assigned to ST4, which is the predominant ST associated with neonatal meningitis. One clinical isolate was classified based on MLST analysis as Cronobacter malonaticus and belonged to an as-yet-unknown ST. Three strains isolated from the household dust samples were assigned to ST1, which is another clinically significant ST. It can be concluded that Cronobacter spp. strains of different origin are genetically quite variable. The recovery of C. sakazakii strains belonging to ST1 and ST4 from the dust samples suggests the possibility that contamination could occur during food preparation. All of the novel STs and alleles for C. sakazakii, C. malonaticus, C. dublinensis, and C. turicensis determined in this study were deposited in the Cronobacter MLST database available online ( http://pubmlst.org/cronobacter/).
DNABIT Compress - Genome compression algorithm.
Rajarajeswari, Pothuraju; Apparao, Allam
2011-01-22
Data compression is concerned with how information is organized in data. Efficient storage means removal of redundancy from the data being stored in the DNA molecule. Data compression algorithms remove redundancy and are used to understand biologically important molecules. We present a compression algorithm, "DNABIT Compress" for DNA sequences based on a novel algorithm of assigning binary bits for smaller segments of DNA bases to compress both repetitive and non repetitive DNA sequence. Our proposed algorithm achieves the best compression ratio for DNA sequences for larger genome. Significantly better compression results show that "DNABIT Compress" algorithm is the best among the remaining compression algorithms. While achieving the best compression ratios for DNA sequences (Genomes),our new DNABIT Compress algorithm significantly improves the running time of all previous DNA compression programs. Assigning binary bits (Unique BIT CODE) for (Exact Repeats, Reverse Repeats) fragments of DNA sequence is also a unique concept introduced in this algorithm for the first time in DNA compression. This proposed new algorithm could achieve the best compression ratio as much as 1.58 bits/bases where the existing best methods could not achieve a ratio less than 1.72 bits/bases.
Impact of sequencing depth on the characterization of the microbiome and resistome.
Zaheer, Rahat; Noyes, Noelle; Ortega Polo, Rodrigo; Cook, Shaun R; Marinier, Eric; Van Domselaar, Gary; Belk, Keith E; Morley, Paul S; McAllister, Tim A
2018-04-12
Developments in high-throughput next generation sequencing (NGS) technology have rapidly advanced the understanding of overall microbial ecology as well as occurrence and diversity of specific genes within diverse environments. In the present study, we compared the ability of varying sequencing depths to generate meaningful information about the taxonomic structure and prevalence of antimicrobial resistance genes (ARGs) in the bovine fecal microbial community. Metagenomic sequencing was conducted on eight composite fecal samples originating from four beef cattle feedlots. Metagenomic DNA was sequenced to various depths, D1, D0.5 and D0.25, with average sample read counts of 117, 59 and 26 million, respectively. A comparative analysis of the relative abundance of reads aligning to different phyla and antimicrobial classes indicated that the relative proportions of read assignments remained fairly constant regardless of depth. However, the number of reads being assigned to ARGs as well as to microbial taxa increased significantly with increasing depth. We found a depth of D0.5 was suitable to describe the microbiome and resistome of cattle fecal samples. This study helps define a balance between cost and required sequencing depth to acquire meaningful results.
ERIC Educational Resources Information Center
Attwood, Paul V.
1997-01-01
Describes a self-instructional assignment approach to the teaching of advanced enzymology. Presents an assignment that offers a means of teaching enzymology to students that exposes them to modern computer-based techniques of analyzing protein structure and relates structure to enzyme function. (JRH)
International Students in the Scientific and Technical Writing Class.
ERIC Educational Resources Information Center
Constantinides, Janet C.
A course sequence for teaching the forms and formats of scientific and technical writing to English as a second language (ESL) learners is described. The first assignment, a letter of application, serves as a diagnostic indication of the student's ability. The second assignment, a narrative, is designed to define the importance of audience and…
Developing the Inferential Reasoning of Basic Writers.
ERIC Educational Resources Information Center
Zeller, Robert
1987-01-01
Describes an assignment sequence using photographs to introduce developmental students to conventions of academic inquiry, and to give them practice analyzing and synthesizing. Reports that students link details observed in the photos to inferences drawn about them. Concentrates on the assignment linking a photo of E. B. White with an essay by him…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fogh, R.H.; Mabbutt, B.C.; Kem, W.R.
Sequence-specific assignments are reported for the 500-MHz H nuclear magnetic resonance (NMR) spectrum of the 48-residue polypeptide neurotoxin I from the sea anemone Stichodactyla helianthus (Sh I). Spin systems were first identified by using two-dimensional relayed or multiple quantum filtered correlation spectroscopy, double quantum spectroscopy, and spin lock experiments. Specific resonance assignments were then obtained from nuclear Overhauser enhancement (NOE) connectivities between protons from residues adjacent in the amino acid sequence. Of a total of 265 potentially observable resonances, 248 (i.e., 94%) were assigned, arising from 39 completely and 9 partially assigned amino acid spin systems. The secondary structure ofmore » Sh I was defined on the basis of the pattern of sequential NOE connectivities. NOEs between protons on separate strands of the polypeptide backbone, and backbone amide exchange rates. Sh I contains a four-stranded antiparallel {beta}-sheet encompassing residues 1-5, 16-24, 30-33, and 40-46, with a {beta}-bulge at residues 17 and 18 and a reverse turn, probably a type II {beta}-turn, involving residues 27-30. No evidence of {alpha}-helical structure was found.« less
Nishito, Yukari; Osana, Yasunori; Hachiya, Tsuyoshi; Popendorf, Kris; Toyoda, Atsushi; Fujiyama, Asao; Itaya, Mitsuhiro; Sakakibara, Yasubumi
2010-04-16
Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length. We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1. These are specific for gamma-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases. The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B. subtilis natto harbors but B. subtilis 168 lacks. Multiple genome-level comparisons among five closely related Bacillus species were also carried out. The determined genome sequence of B. subtilis natto and gene annotations are available from the Natto genome browser http://natto-genome.org/.
West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N
2014-07-01
The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of closely related organisms, and discuss how it could be extended to future studies of multilocus rDNA systems. [concerted evolution; genome hydridisation; phylogenetic analysis; ribosomal DNA; whole genome sequencing; yeast]. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Xie, G.; Chain, P.S.G.; Lo, C.; Liu, K-L.; Gans, J.; Merritt, J.; Qi, F.
2010-01-01
SUMMARY Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~ 2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. PMID:21040513
Xie, G; Chain, P S G; Lo, C-C; Liu, K-L; Gans, J; Merritt, J; Qi, F
2010-12-01
Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. © 2010 John Wiley & Sons A/S.
The SUPERFAMILY database in 2004: additions and improvements.
Madera, Martin; Vogel, Christine; Kummerfeld, Sarah K; Chothia, Cyrus; Gough, Julian
2004-01-01
The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.
Method for assigning sites to projected generic nuclear power plants
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holter, G.M.; Purcell, W.L.; Shutz, M.E.
1986-07-01
Pacific Northwest Laboratory developed a method for forecasting potential locations and startup sequences of nuclear power plants that will be required in the future but have not yet been specifically identified by electric utilities. Use of the method results in numerical ratings for potential nuclear power plant sites located in each of the 10 federal energy regions. The rating for each potential site is obtained from numerical factors assigned to each of 5 primary siting characteristics: (1) cooling water availability, (2) site land area, (3) power transmission land area, (4) proximity to metropolitan areas, and (5) utility plans for themore » site. The sequence of plant startups in each federal energy region is obtained by use of the numerical ratings and the forecasts of generic nuclear power plant startups obtained from the EIA Middle Case electricity forecast. Sites are assigned to generic plants in chronological order according to startup date.« less
Kundu, Kunal; Pal, Lipika R; Yin, Yizhou; Moult, John
2017-09-01
The use of gene panel sequence for diagnostic and prognostic testing is now widespread, but there are so far few objective tests of methods to interpret these data. We describe the design and implementation of a gene panel sequencing data analysis pipeline (VarP) and its assessment in a CAGI4 community experiment. The method was applied to clinical gene panel sequencing data of 106 patients, with the goal of determining which of 14 disease classes each patient has and the corresponding causative variant(s). The disease class was correctly identified for 36 cases, including 10 where the original clinical pipeline did not find causative variants. For a further seven cases, we found strong evidence of an alternative disease to that tested. Many of the potentially causative variants are missense, with no previous association with disease, and these proved the hardest to correctly assign pathogenicity or otherwise. Post analysis showed that three-dimensional structure data could have helped for up to half of these cases. Over-reliance on HGMD annotation led to a number of incorrect disease assignments. We used a largely ad hoc method to assign probabilities of pathogenicity for each variant, and there is much work still to be done in this area. © 2017 The Authors. **Human Mutation published by Wiley Periodicals, Inc.
Centurion-Lara, Arturo; Giacani, Lorenzo; Godornes, Charmie; Molini, Barbara J.; Brinck Reid, Tara; Lukehart, Sheila A.
2013-01-01
Background The pathogenic non-cultivable treponemes include three subspecies of Treponema pallidum (pallidum, pertenue, endemicum), T. carateum, T. paraluiscuniculi, and the unclassified Fribourg-Blanc treponeme (Simian isolate). These treponemes are morphologically indistinguishable and antigenically and genetically highly similar, yet cross-immunity is variable or non-existent. Although all of these organisms cause chronic, multistage skin and systemic disease, they have historically been classified by mode of transmission, clinical presentations and host ranges. Whole genome studies underscore the high degree of sequence identity among species, subspecies and strains, pinpointing a limited number of genomic regions for variation. Many of these “hot spots” include members of the tpr gene family, composed of 12 paralogs encoding candidate virulence factors. We hypothesize that the distinct clinical presentations, host specificity, and variable cross-immunity might reside on virulence factors such as the tpr genes. Methodology/Principal Findings Sequence analysis of 11 tpr loci (excluding tprK) from 12 strains demonstrated an impressive heterogeneity, including SNPs, indels, chimeric genes, truncated gene products and large deletions. Comparative analyses of sequences and 3D models of predicted proteins in Subfamily I highlight the striking co-localization of discrete variable regions with predicted surface-exposed loops. A hallmark of Subfamily II is the presence of chimeric genes in the tprG and J loci. Diversity in Subfamily III is limited to tprA and tprL. Conclusions/Significance An impressive sequence variability was found in tpr sequences among the Treponema isolates examined in this study, with most of the variation being consistent within subspecies or species, or between syphilis vs. non-syphilis strains. Variability was seen in the pallidum subspecies, which can be divided into 5 genogroups. These findings support a genetic basis for the classification of these organisms into their respective subspecies and species. Future functional studies will determine whether the identified genetic differences relate to cross-immunity, clinical differences, or host ranges. PMID:23696912
Komaki, Hisayuki; Ichikawa, Natsuko; Hosoyama, Akira; Fujita, Nobuyuki; Igarashi, Yasuhiro
2015-01-01
Streptomyces sp. TP-A0598, isolated from seawater, produces lydicamycin, structurally unique type I polyketide bearing two nitrogen-containing five-membered rings, and four congeners TPU-0037-A, -B, -C, and -D. We herein report the 8 Mb draft genome sequence of this strain, together with classification and features of the organism and generation, annotation and analysis of the genome sequence. The genome encodes 7,240 putative ORFs, of which 4,450 ORFs were assigned with COG categories. Also, 66 tRNA genes and one rRNA operon were identified. The genome contains eight gene clusters involved in the production of polyketides and nonribosomal peptides. Among them, a PKS/NRPS gene cluster was assigned to be responsible for lydicamycin biosynthesis and a plausible biosynthetic pathway was proposed on the basis of gene function prediction. This genome sequence data will facilitate to probe the potential of secondary metabolism in marine-derived Streptomyces.
Prestat, Emmanuel; David, Maude M.; Hultman, Jenni; ...
2014-09-26
A new functional gene database, FOAM (Functional Ontology Assignments for Metagenomes), was developed to screen environmental metagenomic sequence datasets. FOAM provides a new functional ontology dedicated to classify gene functions relevant to environmental microorganisms based on Hidden Markov Models (HMMs). Sets of aligned protein sequences (i.e. ‘profiles’) were tailored to a large group of target KEGG Orthologs (KOs) from which HMMs were trained. The alignments were checked and curated to make them specific to the targeted KO. Within this process, sequence profiles were enriched with the most abundant sequences available to maximize the yield of accurate classifier models. An associatedmore » functional ontology was built to describe the functional groups and hierarchy. FOAM allows the user to select the target search space before HMM-based comparison steps and to easily organize the results into different functional categories and subcategories. FOAM is publicly available at http://portal.nersc.gov/project/m1317/FOAM/.« less
NASA Technical Reports Server (NTRS)
Kirkpatrick, J. D.; Kelly, Douglas M.; Rieke, George H.; Liebert, James; Allard, France; Wehrse, Rainer
1993-01-01
Red/infrared (0.6-1.5 micron) spectra are presented for a sequence of well-studied M dwarfs ranging from M2 through M9. A variety of temperature-sensitive features useful for spectral classification are identified. Using these features, the spectral data are compared to recent theoretical models, from which a temperature scale is assigned. The red portion of the model spectra provide reasonably good fits for dwarfs earlier than M6. For layer types, the infrared region provides a more reliable fit to the observations. In each case, the wavelength region used includes the broad peak of the energy distribution. For a given spectral type, the derived temperature sequence assigns higher temperatures than have earlier studies - the difference becoming more pronounced at lower luminosities. The positions of M dwarfs on the H-R diagram are, as a result, in closer agreement with theoretical tracks of the lower main sequence.
Exploring the sequence-structure protein landscape in the glycosyltransferase family
Zhang, Ziding; Kochhar, Sunil; Grigorov, Martin
2003-01-01
To understand the molecular basis of glycosyltransferases’ (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D-PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence-similarity search methods (i.e., BLAST and PSI-BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI-BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family. PMID:14500887
Molecular and morphologic data reveal multiple species in Peromyscus pectoralis
Bradley, Robert D.; Schmidly, David J.; Amman, Brian R.; Platt, Roy N.; Neumann, Kathy M.; Huynh, Howard M.; Muñiz-Martínez, Raúl; López-González, Celia; Ordóñez-Garza, Nicté
2015-01-01
DNA sequence and morphometric data were used to re-evaluate the taxonomy and systematics of Peromyscus pectoralis. Phylogenetic analyses (maximum likelihood and Bayesian inference) of DNA sequences from the mitochondrial cytochrome-b gene in 44 samples of P. pectoralis indicated 2 well-supported monophyletic clades. The 1st clade contained specimens from Texas historically assigned to P. p. laceianus; the 2nd was comprised of specimens previously referable to P. p. collinus, P. p. laceianus, and P. p. pectoralis obtained from northern and eastern Mexico. Levels of genetic variation (~7%) between these 2 clades indicated that the genetic divergence typically exceeded that reported for other species of Peromyscus. Samples of P. p. laceianus north and south of the Río Grande were not monophyletic. In addition, samples representing P. p. collinus and P. p. pectoralis formed 2 clades that differed genetically by 7.14%. Multivariate analyses of external and cranial measurements from 63 populations of P. pectoralis revealed 4 morpho-groups consistent with clades in the DNA sequence analysis: 1 from Texas and New Mexico assignable to P. p. laceianus; a 2nd from western and southern Mexico assignable to P. p. pectoralis; a 3rd from northern and central Mexico previously assigned to P. p. pectoralis but herein shown to represent an undescribed taxon; and a 4th from southeastern Mexico assignable to P. p. collinus. Based on the concordance of these results, populations from the United States are referred to as P. laceianus, whereas populations from Mexico are referred to as P. pectoralis (including some samples historically assigned to P. p. collinus, P. p. laceianus, and P. p. pectoralis). A new subspecies is described to represent populations south of the Río Grande in northern and central Mexico. Additional research is needed to discern if P. p. collinus warrants species recognition. PMID:26937045
Larionova, Marina D; Markova, Svetlana V; Vysotski, Eugene S
2017-01-29
The bright bioluminescence of copepod Metridia longa is conditioned by a small secreted coelenterazine-dependent luciferase (MLuc). To date, three isoforms of MLuc differing in length, sequences, and some properties were cloned and successfully applied as high sensitive bioluminescent reporters. In this work, we report cloning of a novel group of genes from M. longa encoding extremely psychrophilic isoforms of MLuc (MLuc2-type). The novel isoforms share only ∼54-64% of protein sequence identity with the previously cloned isoforms and, consequently, are the product of a separate group of paralogous genes. The MLuc2 isoform with consensus sequence was produced as a natively folded protein using baculovirus/insect cell expression system, purified, and characterized. The MLuc2 displays a very high bioluminescent activity and high thermostability similar to those of the previously characterized M. longa luciferase isoform MLuc7. However, in contrast to MLuc7 revealing the highest activity at 12-17 °C and 0.5 M NaCl, the bioluminescence optima of MLuc2 isoforms are at ∼5 °C and 1 M NaCl. The MLuc2 adaptation to cold is also accompanied by decrease of melting temperature and affinity to substrate suggesting a more conformational flexibility of a protein structure. The luciferase isoforms with different temperature optima may provide adaptability of the M. longa bioluminescence to the changes of water temperature during diurnal vertical migrations. Copyright © 2016 Elsevier Inc. All rights reserved.