The contribution of alu elements to mutagenic DNA double-strand break repair.
Morales, Maria E; White, Travis B; Streva, Vincent A; DeFreece, Cecily B; Hedges, Dale J; Deininger, Prescott L
2015-03-01
Alu elements make up the largest family of human mobile elements, numbering 1.1 million copies and comprising 11% of the human genome. As a consequence of evolution and genetic drift, Alu elements of various sequence divergence exist throughout the human genome. Alu/Alu recombination has been shown to cause approximately 0.5% of new human genetic diseases and contribute to extensive genomic structural variation. To begin understanding the molecular mechanisms leading to these rearrangements in mammalian cells, we constructed Alu/Alu recombination reporter cell lines containing Alu elements ranging in sequence divergence from 0%-30% that allow detection of both Alu/Alu recombination and large non-homologous end joining (NHEJ) deletions that range from 1.0 to 1.9 kb in size. Introduction of as little as 0.7% sequence divergence between Alu elements resulted in a significant reduction in recombination, which indicates even small degrees of sequence divergence reduce the efficiency of homology-directed DNA double-strand break (DSB) repair. Further reduction in recombination was observed in a sequence divergence-dependent manner for diverged Alu/Alu recombination constructs with up to 10% sequence divergence. With greater levels of sequence divergence (15%-30%), we observed a significant increase in DSB repair due to a shift from Alu/Alu recombination to variable-length NHEJ which removes sequence between the two Alu elements. This increase in NHEJ deletions depends on the presence of Alu sequence homeology (similar but not identical sequences). Analysis of recombination products revealed that Alu/Alu recombination junctions occur more frequently in the first 100 bp of the Alu element within our reporter assay, just as they do in genomic Alu/Alu recombination events. This is the first extensive study characterizing the influence of Alu element sequence divergence on DNA repair, which will inform predictions regarding the effect of Alu element sequence divergence on both the rate and nature of DNA repair events.
Concerted evolution at the population level: pupfish HindIII satellite DNA sequences.
Elder, J F; Turner, B J
1994-01-01
The canonical monomers (approximately 170 bp) of an abundant (1.9 x 10(6) copies per diploid genome) satellite DNA sequence family in the genome of Cyprinodon variegatus, a "pupfish" that ranges along the Atlantic coast from Cape Cod to central Mexico, are divergent in base sequence in 10 of 12 samples collected from natural populations. The divergence involves substitutions, deletions, and insertions, is marked in scope (mean pairwise sequence similarity = 61.6%; range = 35-95.9%), is largely confined to the 3' half of the monomer, and is not correlated with the distance among collecting sites. Repetitive cloning and direct genomic sequencing experiments failed to detect intrapopulation and intraindividual variation, suggesting high levels of sequence homogeneity within populations. The satellite sequence has therefore undergone "concerted evolution," at the level of the local population. Concerted evolution has previously almost always been discussed in terms of the divergence of species or higher taxa; its intraspecific occurrence apparently has not been reported previously. The generality of the observation is difficult to evaluate, for although satellite DNAs from a large number of organisms have been studied in detail, there appear to be little or no other data on their sequence variation in natural populations. The relationship (if any) between concerted, population level, satellite DNA divergence and the extent of gene flow/genetic isolation among conspecific natural populations remains to be established. Images PMID:8302879
Baurens, Franc-Christophe; Bocs, Stéphanie; Rouard, Mathieu; Matsumoto, Takashi; Miller, Robert N G; Rodier-Goud, Marguerite; MBéguié-A-MBéguié, Didier; Yahiaoui, Nabila
2010-07-16
Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana.
Hopple, J S; Vilgalys, R
1999-10-01
Phylogenetic relationships were investigated in the mushroom genus Coprinus based on sequence data from the nuclear encoded large-subunit rDNA gene. Forty-seven species of Coprinus and 19 additional species from the families Coprinaceae, Strophariaceae, Bolbitiaceae, Agaricaceae, Podaxaceae, and Montagneaceae were studied. A total of 1360 sites was sequenced across seven divergent domains and intervening sequences. A total of 302 phylogenetically informative characters was found. Ninety-eight percent of the average divergence between taxa was located within the divergent domains, with domains D2 and D8 being most divergent and domains D7 and D10 the least divergent. An empirical test of phylogenetic signal among divergent domains also showed that domains D2 and D3 had the lowest levels of homoplasy. Two equally most parsimonious trees were resolved using Wagner parsimony. A character-state weighted analysis produced 12 equally most parsimonious trees similar to those generated by Wagner parsimony. Phylogenetic analyses employing topological constraints suggest that none of the major taxonomic systems proposed for subgeneric classification is able to completely reflect phylogenetic relationships in Coprinus. A strict consensus integration of the two Wagner trees demonstrates the problematic nature of choosing outgroups within dark-spored mushrooms. The genus Coprinus is found to be polyphyletic and is separated into three distinct clades. Most Coprinus taxa belong to the first two clades, which together form a larger monophyletic group with Lacrymaria and Psathyrella in basal positions. A third clade contains members of Coprinus section Comati as well as the genus Leucocoprinus, Podaxis pistillaris, Montagnea arenaria, and Agaricus pocillator. This third clade is separated from the other species of Coprinus by members of the families Strophariaceae and Bolbitiaceae and the genus Panaeolus. Copyright 1999 Academic Press.
Sequence-Level Mechanisms of Human Epigenome Evolution
Prendergast, James G.D.; Chambers, Emily V.; Semple, Colin A.M.
2014-01-01
DNA methylation and chromatin states play key roles in development and disease. However, the extent of recent evolutionary divergence in the human epigenome and the influential factors that have shaped it are poorly understood. To determine the links between genome sequence and human epigenome evolution, we examined the divergence of DNA methylation and chromatin states following segmental duplication events in the human lineage. Chromatin and DNA methylation states were found to have been generally well conserved following a duplication event, with the evolution of the epigenome largely uncoupled from the total number of genetic changes in the surrounding DNA sequence. However, the epigenome at tissue-specific, distal regulatory regions was observed to be unusually prone to diverge following duplication, with particular sequence differences, altering known sequence motifs, found to be associated with divergence in patterns of DNA methylation and chromatin. Alu elements were found to have played a particularly prominent role in shaping human epigenome evolution, and we show that human-specific AluY insertion events are strongly linked to the evolution of the DNA methylation landscape and gene expression levels, including at key neurological genes in the human brain. Studying paralogous regions within the same sample enables the study of the links between genome and epigenome evolution while controlling for biological and technical variation. We show DNA methylation and chromatin divergence between duplicated regions are linked to the divergence of particular genetic motifs, with Alu elements having played a disproportionate role in the evolution of the epigenome in the human lineage. PMID:24966180
Blazier, J Chris; Ruhlman, Tracey A; Weng, Mao-Lun; Rehman, Sumaiyah K; Sabir, Jamal S M; Jansen, Robert K
2016-04-18
Genes for the plastid-encoded RNA polymerase (PEP) persist in the plastid genomes of all photosynthetic angiosperms. However, three unrelated lineages (Annonaceae, Passifloraceae and Geraniaceae) have been identified with unusually divergent open reading frames (ORFs) in the conserved region of rpoA, the gene encoding the PEP α subunit. We used sequence-based approaches to evaluate whether these genes retain function. Both gene sequences and complete plastid genome sequences were assembled and analyzed from each of the three angiosperm families. Multiple lines of evidence indicated that the rpoA sequences are likely functional despite retaining as low as 30% nucleotide sequence identity with rpoA genes from outgroups in the same angiosperm order. The ratio of non-synonymous to synonymous substitutions indicated that these genes are under purifying selection, and bioinformatic prediction of conserved domains indicated that functional domains are preserved. One of the lineages (Pelargonium, Geraniaceae) contains species with multiple rpoA-like ORFs that show evidence of ongoing inter-paralog gene conversion. The plastid genomes containing these divergent rpoA genes have experienced extensive structural rearrangement, including large expansions of the inverted repeat. We propose that illegitimate recombination, not positive selection, has driven the divergence of rpoA.
Vorticity and divergence in the solar photosphere
NASA Technical Reports Server (NTRS)
Wang, YI; Noyes, Robert W.; Tarbell, Theodore D.; Title, Alan M.
1995-01-01
We have studied an outstanding sequence of continuum images of the solar granulation from Pic du Midi Observatory. We have calculated the horizontal vector flow field using a correlation tracking algorithm, and from this determined three scalar field: the vertical component of the curl; the horizontal divergence; and the horizontal flow speed. The divergence field has substantially longer coherence time and more power than does the curl field. Statistically, curl is better correlated with regions of negative divergence - that is, the vertical vorticity is higher in downflow regions, suggesting excess vorticity in intergranular lanes. The average value of the divergence is largest (i.e., outflow is largest) where the horizontal speed is large; we associate these regions with exploding granules. A numerical simulation of general convection also shows similar statistical differences between curl and divergence. Some individual small bright points in the granulation pattern show large local vorticities.
NASA Technical Reports Server (NTRS)
Romano, Laura A.; Wray, Gregory A.
2003-01-01
Evolutionary changes in transcriptional regulation undoubtedly play an important role in creating morphological diversity. However, there is little information about the evolutionary dynamics of cis-regulatory sequences. This study examines the functional consequence of evolutionary changes in the Endo16 promoter of sea urchins. The Endo16 gene encodes a large extracellular protein that is expressed in the endoderm and may play a role in cell adhesion. Its promoter has been characterized in exceptional detail in the purple sea urchin, Strongylocentrotus purpuratus. We have characterized the structure and function of the Endo16 promoter from a second sea urchin species, Lytechinus variegatus. The Endo16 promoter sequences have evolved in a strongly mosaic manner since these species diverged approximately 35 million years ago: the most proximal region (module A) is conserved, but the remaining modules (B-G) are unalignable. Despite extensive divergence in promoter sequences, the pattern of Endo16 transcription is largely conserved during embryonic and larval development. Transient expression assays demonstrate that 2.2 kb of upstream sequence in either species is sufficient to drive GFP reporter expression that correctly mimics this pattern of Endo16 transcription. Reciprocal cross-species transient expression assays imply that changes have also evolved in the set of transcription factors that interact with the Endo16 promoter. Taken together, these results suggest that stabilizing selection on the transcriptional output may have operated to maintain a similar pattern of Endo16 expression in S. purpuratus and L. variegatus, despite dramatic divergence in promoter sequence and mechanisms of transcriptional regulation.
USDA-ARS?s Scientific Manuscript database
High-throughput sequencing of reduced representation genomic libraries has ushered in an era of genotyping-by-sequencing (GBS), where genome-wide genotype data can be obtained for nearly any species. However, there remains a need for imputation-free GBS methods for genotyping large samples taken fr...
Blazier, J. Chris; Ruhlman, Tracey A.; Weng, Mao-Lun; Rehman, Sumaiyah K.; Sabir, Jamal S. M.; Jansen, Robert K.
2016-01-01
Genes for the plastid-encoded RNA polymerase (PEP) persist in the plastid genomes of all photosynthetic angiosperms. However, three unrelated lineages (Annonaceae, Passifloraceae and Geraniaceae) have been identified with unusually divergent open reading frames (ORFs) in the conserved region of rpoA, the gene encoding the PEP α subunit. We used sequence-based approaches to evaluate whether these genes retain function. Both gene sequences and complete plastid genome sequences were assembled and analyzed from each of the three angiosperm families. Multiple lines of evidence indicated that the rpoA sequences are likely functional despite retaining as low as 30% nucleotide sequence identity with rpoA genes from outgroups in the same angiosperm order. The ratio of non-synonymous to synonymous substitutions indicated that these genes are under purifying selection, and bioinformatic prediction of conserved domains indicated that functional domains are preserved. One of the lineages (Pelargonium, Geraniaceae) contains species with multiple rpoA-like ORFs that show evidence of ongoing inter-paralog gene conversion. The plastid genomes containing these divergent rpoA genes have experienced extensive structural rearrangement, including large expansions of the inverted repeat. We propose that illegitimate recombination, not positive selection, has driven the divergence of rpoA. PMID:27087667
Olmsted, R A; Langley, R; Roelke, M E; Goeken, R M; Adger-Johnson, D; Goff, J P; Albert, J P; Packer, C; Laurenson, M K; Caro, T M
1992-10-01
The natural occurrence of lentiviruses closely related to feline immunodeficiency virus (FIV) in nondomestic felid species is shown here to be worldwide. Cross-reactive antibodies to FIV were common in several free-ranging populations of large cats, including East African lions and cheetahs of the Serengeti ecosystem and in puma (also called cougar or mountain lion) populations throughout North America. Infectious puma lentivirus (PLV) was isolated from several Florida panthers, a severely endangered relict puma subspecies inhabiting the Big Cypress Swamp and Everglades ecosystems in southern Florida. Phylogenetic analysis of PLV genomic sequences from disparate geographic isolates revealed appreciable divergence from domestic cat FIV sequences as well as between PLV sequences found in different North American locales. The level of sequence divergence between PLV and FIV was greater than the level of divergence between human and certain simian immunodeficiency viruses, suggesting that the transmission of FIV between feline species is infrequent and parallels in time the emergence of HIV from simian ancestors.
2010-01-01
Background Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). Results Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. Conclusions A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana. PMID:20637079
Echave, Julian; Wilke, Claus O.
2018-01-01
For decades, rates of protein evolution have been interpreted in terms of the vague concept of “functional importance”. Slowly evolving proteins or sites within proteins were assumed to be more functionally important and thus subject to stronger selection pressure. More recently, biophysical models of protein evolution, which combine evolutionary theory with protein biophysics, have completely revolutionized our view of the forces that shape sequence divergence. Slowly evolving proteins have been found to evolve slowly because of selection against toxic misfolding and misinteractions, linking their rate of evolution primarily to their abundance. Similarly, most slowly evolving sites in proteins are not directly involved in function, but mutating them has large impacts on protein structure and stability. Here, we review the studies of the emergent field of biophysical protein evolution that have shaped our current understanding of sequence divergence patterns. We also propose future research directions to develop this nascent field. PMID:28301766
Clustering evolving proteins into homologous families.
Chan, Cheong Xin; Mahbob, Maisarah; Ragan, Mark A
2013-04-08
Clustering sequences into groups of putative homologs (families) is a critical first step in many areas of comparative biology and bioinformatics. The performance of clustering approaches in delineating biologically meaningful families depends strongly on characteristics of the data, including content bias and degree of divergence. New, highly scalable methods have recently been introduced to cluster the very large datasets being generated by next-generation sequencing technologies. However, there has been little systematic investigation of how characteristics of the data impact the performance of these approaches. Using clusters from a manually curated dataset as reference, we examined the performance of a widely used graph-based Markov clustering algorithm (MCL) and a greedy heuristic approach (UCLUST) in delineating protein families coded by three sets of bacterial genomes of different G+C content. Both MCL and UCLUST generated clusters that are comparable to the reference sets at specific parameter settings, although UCLUST tends to under-cluster compositionally biased sequences (G+C content 33% and 66%). Using simulated data, we sought to assess the individual effects of sequence divergence, rate heterogeneity, and underlying G+C content. Performance decreased with increasing sequence divergence, decreasing among-site rate variation, and increasing G+C bias. Two MCL-based methods recovered the simulated families more accurately than did UCLUST. MCL using local alignment distances is more robust across the investigated range of sequence features than are greedy heuristics using distances based on global alignment. Our results demonstrate that sequence divergence, rate heterogeneity and content bias can individually and in combination affect the accuracy with which MCL and UCLUST can recover homologous protein families. For application to data that are more divergent, and exhibit higher among-site rate variation and/or content bias, MCL may often be the better choice, especially if computational resources are not limiting.
Whole genome investigation of a divergent clade of the pathogen Streptococcus suis
Baig, Abiyad; Weinert, Lucy A.; Peters, Sarah E.; Howell, Kate J.; Chaudhuri, Roy R.; Wang, Jinhong; Holden, Matthew T. G.; Parkhill, Julian; Langford, Paul R.; Rycroft, Andrew N.; Wren, Brendan W.; Tucker, Alexander W.; Maskell, Duncan J.
2015-01-01
Streptococcus suis is a major porcine and zoonotic pathogen responsible for significant economic losses in the pig industry and an increasing number of human cases. Multiple isolates of S. suis show marked genomic diversity. Here, we report the analysis of whole genome sequences of nine pig isolates that caused disease typical of S. suis and had phenotypic characteristics of S. suis, but their genomes were divergent from those of many other S. suis isolates. Comparison of protein sequences predicted from divergent genomes with those from normal S. suis reduced the size of core genome from 793 to only 397 genes. Divergence was clear if phylogenetic analysis was performed on reduced core genes and MLST alleles. Phylogenies based on certain other genes (16S rRNA, sodA, recN, and cpn60) did not show divergence for all isolates, suggesting recombination between some divergent isolates with normal S. suis for these genes. Indeed, there is evidence of recent recombination between the divergent and normal S. suis genomes for 249 of 397 core genes. In addition, phylogenetic analysis based on the 16S rRNA gene and 132 genes that were conserved between the divergent isolates and representatives of the broader Streptococcus genus showed that divergent isolates were more closely related to S. suis. Six out of nine divergent isolates possessed a S. suis-like capsule region with variation in capsular gene sequences but the remaining three did not have a discrete capsule locus. The majority (40/70), of virulence-associated genes in normal S. suis were present in the divergent genomes. Overall, the divergent isolates extend the current diversity of S. suis species but the phenotypic similarities and the large amount of gene exchange with normal S. suis gives insufficient evidence to assign these isolates to a new species or subspecies. Further, sampling and whole genome analysis of more isolates is warranted to understand the diversity of the species. PMID:26583006
Landry, C; Geyer, L B; Arakaki, Y; Uehara, T; Palumbi, Stephen R
2003-01-01
The rich species diversity of the marine Indo-West Pacific (IWP) has been explained largely on the basis of historical observation of large-scale diversity gradients. Careful study of divergence among closely related species can reveal important new information about the pace and mechanisms of their formation, and can illuminate the genesis of biogeographic patterns. Young species inhabiting the IWP include urchins of the genus Echinometra, which diverged over the past 1-5 Myr. Here, we report the most recent divergence of two cryptic species of Echinometra inhabiting this region. Mitochondrial cytochrome oxidase 1 (CO1) sequence data show that in Echinometra oblonga, species-level divergence in sperm morphology, gamete recognition proteins and gamete compatibility arose between central and western Pacific populations in the past 250 000 years. Divergence in sperm attachment proteins suggests rapid evolution of the fertilization system. Divergence of sperm morphology may be a common feature of free-spawning animals, and offers opportunities to simultaneously understand genetic divergence, changes in protein expression patterns and morphological evolution in traits directly related to reproductive isolation. PMID:12964987
Genetic and phylogenetic divergence of feline immunodeficiency virus in the puma (Puma concolor).
Carpenter, M A; Brown, E W; Culver, M; Johnson, W E; Pecon-Slattery, J; Brousset, D; O'Brien, S J
1996-01-01
Feline immunodeficiency virus (FIV) is a lentivirus which causes an AIDS-like disease in domestic cats (Felis catus). A number of other felid species, including the puma (Puma concolor), carry a virus closely related to domestic cat FIV. Serological testing revealed the presence of antibodies to FIV in 22% of 434 samples from throughout the geographic range of the puma. FIV-Pco pol gene sequences isolated from pumas revealed extensive sequence diversity, greater than has been documented in the domestic cat. The puma sequences formed two highly divergent groups, analogous to the clades which have been defined for domestic cat and lion (Panthera leo) FIV. The puma clade A was made up of samples from Florida and California, whereas clade B consisted of samples from other parts of North America, Central America, and Brazil. The difference between these two groups was as great as that reported among three lion FIV clades. Within puma clades, sequence variation is large, comparable to between-clade differences seen for domestic cat clades, allowing recognition of 15 phylogenetic lineages (subclades) among puma FIV-Pco. Large sequence divergence among isolates, nearly complete species monophyly, and widespread geographic distribution suggest that FIV-Pco has evolved within the puma species for a long period. The sequence data provided evidence for vertical transmission of FIV-Pco from mothers to their kittens, for coinfection of individuals by two different viral strains, and for cross-species transmission of FIV from a domestic cat to a puma. These factors may all be important for understanding the epidemiology and natural history of FIV in the puma. PMID:8794304
Divergence of Gene Body DNA Methylation and Evolution of Plant Duplicate Genes
Wang, Jun; Marowsky, Nicholas C.; Fan, Chuanzhu
2014-01-01
It has been shown that gene body DNA methylation is associated with gene expression. However, whether and how deviation of gene body DNA methylation between duplicate genes can influence their divergence remains largely unexplored. Here, we aim to elucidate the potential role of gene body DNA methylation in the fate of duplicate genes. We identified paralogous gene pairs from Arabidopsis and rice (Oryza sativa ssp. japonica) genomes and reprocessed their single-base resolution methylome data. We show that methylation in paralogous genes nonlinearly correlates with several gene properties including exon number/gene length, expression level and mutation rate. Further, we demonstrated that divergence of methylation level and pattern in paralogs indeed positively correlate with their sequence and expression divergences. This result held even after controlling for other confounding factors known to influence the divergence of paralogs. We observed that methylation level divergence might be more relevant to the expression divergence of paralogs than methylation pattern divergence. Finally, we explored the mechanisms that might give rise to the divergence of gene body methylation in paralogs. We found that exonic methylation divergence more closely correlates with expression divergence than intronic methylation divergence. We show that genomic environments (e.g., flanked by transposable elements and repetitive sequences) of paralogs generated by various duplication mechanisms are associated with the methylation divergence of paralogs. Overall, our results suggest that the changes in gene body DNA methylation could provide another avenue for duplicate genes to develop differential expression patterns and undergo different evolutionary fates in plant genomes. PMID:25310342
Czesny, Sergiusz; Epifanio, John; Michalak, Pawel
2012-01-01
Alewife Alosa pseudoharengus, a small clupeid fish native to Atlantic Ocean, has recently (∼150 years ago) invaded the North American Great Lakes and despite challenges of freshwater environment its populations exploded and disrupted local food web structures. This range expansion has been accompanied by dramatic changes at all levels of organization. Growth rates, size at maturation, or fecundity are only a few of the most distinct morphological and life history traits that contrast the two alewife morphs. A question arises to what extent these rapidly evolving differences between marine and freshwater varieties result from regulatory (including phenotypic plasticity) or structural mutations. To gain insights into expression changes and sequence divergence between marine and freshwater alewives, we sequenced transcriptomes of individuals from Lake Michigan and Atlantic Ocean. Population specific single nucleotide polymorphisms were rare but interestingly occurred in sequences of genes that also tended to show large differences in expression. Our results show that the striking phenotypic divergence between anadromous and lake alewives can be attributed to massive regulatory modifications rather than coding changes.
Czesny, Sergiusz; Epifanio, John; Michalak, Pawel
2012-01-01
Alewife Alosa pseudoharengus, a small clupeid fish native to Atlantic Ocean, has recently (∼150 years ago) invaded the North American Great Lakes and despite challenges of freshwater environment its populations exploded and disrupted local food web structures. This range expansion has been accompanied by dramatic changes at all levels of organization. Growth rates, size at maturation, or fecundity are only a few of the most distinct morphological and life history traits that contrast the two alewife morphs. A question arises to what extent these rapidly evolving differences between marine and freshwater varieties result from regulatory (including phenotypic plasticity) or structural mutations. To gain insights into expression changes and sequence divergence between marine and freshwater alewives, we sequenced transcriptomes of individuals from Lake Michigan and Atlantic Ocean. Population specific single nucleotide polymorphisms were rare but interestingly occurred in sequences of genes that also tended to show large differences in expression. Our results show that the striking phenotypic divergence between anadromous and lake alewives can be attributed to massive regulatory modifications rather than coding changes. PMID:22438868
Chi, Hongshu; Taik, Patricia; Foley, Emily J; Racicot, Alycia C; Gray, Hilary M; Guzzetta, Katherine E; Lin, Hsin-Yun; Song, Yen-Ling; Tung, Che-Huang; Zenke, Kosuke; Yoshinaga, Tomoyoshi; Cheng, Chao-Yin; Chang, Wei-Jen; Gong, Hui
2017-07-01
The ciliate protozoan Cryptocaryon irritans parasitizes marine fish and causes lethal white spot disease. Sporadic infections as well as large-scale outbreaks have been reported globally and the parasite's broad host range poses particular threat to the aquaculture and ornamental fish markets. In order to better understand C. irritans' population structure, we sequenced and compared mitochondrial cox-1, SSU rRNA, and ITS-1 sequences from 8 new isolates of C. irritans collected in China, Japan, and Taiwan. We detected two SSU rRNA haplotypes, which differ at three positions, separating the isolates into two main groups (I and II). Cox-1 sequences also support the division into two groups, and the cox-1 divergence between these two groups is unexpectedly high (9.28% for 1582 nucleotide positions). The divergence is much greater than that detected in Ichthyophthirius multifiliis, the ciliate protozoan causing freshwater white spot disease in fish, where intraspecies divergence on cox-1 sequence is only 1.95%. ITS-1 sequences derived from these eight isolates and from all other C. irritans isolates (deposited in the GenBank) not only support the two groups, but further suggest the presence of a third group with even greater sequence divergence. Finally, a small Ka/Ks ratio estimated from cox-1 sequences suggests that this gene in C. irritans remains under strong purifying selection. Taken together, the C. irritans species may consists of many subspecies and/or syngens. Further work is needed to determine if there is reproductive isolation between the groups we have defined. Copyright © 2017 Elsevier Inc. All rights reserved.
Horn, T; Chang, C A; Urdea, M S
1997-12-01
The divergent synthesis of branched DNA (bDNA) comb structures is described. This new type of bDNA contains one unique oligonucleotide, the primary sequence, covalently attached through a comb-like branch network to many identical copies of a different oligonucleotide, the secondary sequence. The bDNA comb structures were assembled on a solid support and several synthesis parameters were investigated and optimized. The bDNA comb molecules were characterized by polyacrylamide gel electrophoretic methods and by controlled cleavage at periodate-cleavable moieties incorporated during synthesis. The developed chemistry allows synthesis of bDNA comb molecules containing multiple secondary sequences. In the accompanying article we describe the synthesis and characterization of large bDNA combs containing all four deoxynucleotides for use as signal amplifiers in nucleic acid quantification assays.
Horn, T; Chang, C A; Urdea, M S
1997-01-01
The divergent synthesis of branched DNA (bDNA) comb structures is described. This new type of bDNA contains one unique oligonucleotide, the primary sequence, covalently attached through a comb-like branch network to many identical copies of a different oligonucleotide, the secondary sequence. The bDNA comb structures were assembled on a solid support and several synthesis parameters were investigated and optimized. The bDNA comb molecules were characterized by polyacrylamide gel electrophoretic methods and by controlled cleavage at periodate-cleavable moieties incorporated during synthesis. The developed chemistry allows synthesis of bDNA comb molecules containing multiple secondary sequences. In the accompanying article we describe the synthesis and characterization of large bDNA combs containing all four deoxynucleotides for use as signal amplifiers in nucleic acid quantification assays. PMID:9365265
Candida ficus sp. nov., a novel yeast species from the gut of Apriona germari larvae.
Hui, Feng-Li; Niu, Qiu-Hong; Ke, Tao; Liu, Zheng
2012-11-01
A novel yeast species is described based on three strains from the gut of wood-boring larvae collected in a tree trunk of Ficus carica cultivated in parks near Nanyang, central China. Phylogenetic analysis based on sequences of the D1/D2 domains of the large subunit rRNA gene showed that these strains occurred in a separate clade that was genetically distinct from all known ascomycetous yeasts. In terms of pairwise sequence divergence, the novel strains differed by 15.3% divergence from the type strain of Pichia terricola, and by 15.8% divergence from the type strains of Pichia exigua and Candida rugopelliculosa in the D1/D2 domains. All three are ascomycetous yeasts in the Pichia clade. Unlike P. terricola, P. exigua and C. rugopelliculosa, the novel isolates did not ferment glucose. The name Candida ficus sp. nov. is proposed to accommodate these highly divergent organisms, with STN-8(T) (=CICC 1980(T)=CBS 12638(T)) as the type strain.
Mhc class II B gene evolution in East African cichlid fishes.
Figueroa, F; Mayer, W E; Sültmann, H; O'hUigin, C; Tichy, H; Satta, Y; Takezaki, N; Takahata, N; Klein, J
2000-06-01
A distinctive feature of essential major histocompatibility complex (Mhc) loci is their polymorphism characterized by large genetic distances between alleles and long persistence times of allelic lineages. Since the lineages often span several successive speciations, we investigated the behavior of the Mhc alleles during or close to the speciation phase. We sequenced exon 2 of the class II B locus 4 from 232 East African cichlid fishes representing 32 related species. The divergence times of the (sub)species ranged from 6,000 to 8.4 million years. Two types of evolutionary analysis were used to elucidate the pattern of exon 2 sequence divergence. First, phylogenetic methods were applied to reconstruct the most likely evolutionary pathways leading from the last common ancestor of the set to the extant sequences, and to assess the probable mechanisms involved in allelic diversification. Second, pairwise comparisons of sequences were carried out to detect differences seemingly incompatible with origin by nonparallel point mutations. The analysis revealed point mutations to be the most important mechanism behind allelic divergences, with recombination playing only an auxiliary part. Comparison of sequences from related species revealed evidence of random allelic (lineage) losses apparently associated with speciation. Sharing of identical alleles could be demonstrated between species that diverged 2 million years ago. The phylogeny of the exon was incongruent with that of the flanking introns, indicating either a high degree of convergent evolution at the peptide-binding region-encoding sites, or intron homogenization.
Amoikon, Tiemele Laurent Simon; Grondin, Cécile; Djéni, Théodore N'Dédé; Jacques, Noémie; Casaregola, Serge
2018-05-21
Analysis of yeasts isolated from various biotopes in French Guiana led to the identification of two strains isolated from flowers and designated CLIB 1634 T and CLIB 1707 T . Comparison of the D1/D2 domain of the large subunit (LSU D1/D2) rRNA gene sequences of CLIB 1634 T and CLIB 1707 T to those in the GenBank database revealed that these strains belong to the Starmerella clade. Strain CLIB 1634 T was shown to diverge from the closely related Starmerella apicola type strain CBS 2868 T with a sequence divergence of 1.34 and 1.30 %, in the LSU D1/D2 rRNA gene and internal transcribed spacer (ITS) sequences respectively. Strain CLIB 1634 T and Candida apicola CBS 2868 T diverged by 3.81 and 14.96 % at the level of the protein-coding gene partial sequences EF-1α and RPB2, respectively. CLIB 1707 T was found to have sequence divergence of 3.88 and 9.16 % in the LSU D1/D2 rRNA gene and ITS, respectively, from that of the most closely related species Starmerella ratchasimensis type strain CBS 10611 T . The species Starmerella reginensis f.a., sp. nov. and Starmerella kourouensis f.a., sp. nov. are proposed to accommodate strains CLIB 1634 T (=CBS 15247 T ) and CLIB 1707 T (=CBS 15257 T ), respectively.
Hybridization Reveals the Evolving Genomic Architecture of Speciation
Kronforst, Marcus R.; Hansen, Matthew E.B.; Crawford, Nicholas G.; Gallant, Jason R.; Zhang, Wei; Kulathinal, Rob J.; Kapan, Durrell D.; Mullen, Sean P.
2014-01-01
SUMMARY The rate at which genomes diverge during speciation is unknown, as are the physical dynamics of the process. Here, we compare full genome sequences of 32 butterflies, representing five species from a hybridizing Heliconius butterfly community, to examine genome-wide patterns of introgression and infer how divergence evolves during the speciation process. Our analyses reveal that initial divergence is restricted to a small fraction of the genome, largely clustered around known wing-patterning genes. Over time, divergence evolves rapidly, due primarily to the origin of new divergent regions. Furthermore, divergent genomic regions display signatures of both selection and adaptive introgression, demonstrating the link between microevolutionary processes acting within species and the origin of species across macroevolutionary timescales. Our results provide a uniquely comprehensive portrait of the evolving species boundary due to the role that hybridization plays in reducing the background accumulation of divergence at neutral sites. PMID:24183670
When are pathogen genome sequences informative of transmission events?
Ferguson, Neil; Jombart, Thibaut
2018-01-01
Recent years have seen the development of numerous methodologies for reconstructing transmission trees in infectious disease outbreaks from densely sampled whole genome sequence data. However, a fundamental and as of yet poorly addressed limitation of such approaches is the requirement for genetic diversity to arise on epidemiological timescales. Specifically, the position of infected individuals in a transmission tree can only be resolved by genetic data if mutations have accumulated between the sampled pathogen genomes. To quantify and compare the useful genetic diversity expected from genetic data in different pathogen outbreaks, we introduce here the concept of ‘transmission divergence’, defined as the number of mutations separating whole genome sequences sampled from transmission pairs. Using parameter values obtained by literature review, we simulate outbreak scenarios alongside sequence evolution using two models described in the literature to describe transmission divergence of ten major outbreak-causing pathogens. We find that while mean values vary significantly between the pathogens considered, their transmission divergence is generally very low, with many outbreaks characterised by large numbers of genetically identical transmission pairs. We describe the impact of transmission divergence on our ability to reconstruct outbreaks using two outbreak reconstruction tools, the R packages outbreaker and phybreak, and demonstrate that, in agreement with previous observations, genetic sequence data of rapidly evolving pathogens such as RNA viruses can provide valuable information on individual transmission events. Conversely, sequence data of pathogens with lower mean transmission divergence, including Streptococcus pneumoniae, Shigella sonnei and Clostridium difficile, provide little to no information about individual transmission events. Our results highlight the informational limitations of genetic sequence data in certain outbreak scenarios, and demonstrate the need to expand the toolkit of outbreak reconstruction tools to integrate other types of epidemiological data. PMID:29420641
Dissecting the relationship between protein structure and sequence variation
NASA Astrophysics Data System (ADS)
Shahmoradi, Amir; Wilke, Claus; Wilke Lab Team
2015-03-01
Over the past decade several independent works have shown that some structural properties of proteins are capable of predicting protein evolution. The strength and significance of these structure-sequence relations, however, appear to vary widely among different proteins, with absolute correlation strengths ranging from 0 . 1 to 0 . 8 . Here we present the results from a comprehensive search for the potential biophysical and structural determinants of protein evolution by studying more than 200 structural and evolutionary properties in a dataset of 209 monomeric enzymes. We discuss the main protein characteristics responsible for the general patterns of protein evolution, and identify sequence divergence as the main determinant of the strengths of virtually all structure-evolution relationships, explaining ~ 10 - 30 % of observed variation in sequence-structure relations. In addition to sequence divergence, we identify several protein structural properties that are moderately but significantly coupled with the strength of sequence-structure relations. In particular, proteins with more homogeneous back-bone hydrogen bond energies, large fractions of helical secondary structures and low fraction of beta sheets tend to have the strongest sequence-structure relation. BEACON-NSF center for the study of evolution in action.
Brettanomyces acidodurans sp. nov., a new acetic acid producing yeast species from olive oil.
Péter, Gábor; Dlauchy, Dénes; Tóbiás, Andrea; Fülöp, László; Podgoršek, Martina; Čadež, Neža
2017-05-01
Two yeast strains representing a hitherto undescribed yeast species were isolated from olive oil and spoiled olive oil originating from Spain and Israel, respectively. Both strains are strong acetic acid producers, equipped with considerable tolerance to acetic acid. The cultures are not short-lived. Cellobiose is fermented as well as several other sugars. The sequences of their large subunit (LSU) rRNA gene D1/D2 domain are very divergent from the sequences available in the GenBank. They differ from the closest hit, Brettanomyces naardenensis by about 27%, mainly substitutions. Sequence analyses of the concatenated dataset from genes of the small subunit (SSU) rRNA, LSU rRNA and translation elongation factor-1α (EF-1α) placed the two strains as an early diverging member of the Brettanomyces/Dekkera clade with high bootstrap support. Sexual reproduction was not observed. The name Brettanomyces acidodurans sp. nov. (holotype: NCAIM Y.02178 T ; isotypes: CBS 14519 T = NRRL Y-63865 T = ZIM 2626 T , MycoBank no.: MB 819608) is proposed for this highly divergent new yeast species.
Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis
Ré, Miguel A.; Azad, Rajeev K.
2014-01-01
Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms. PMID:24728338
Generalization of entropy based divergence measures for symbolic sequence analysis.
Ré, Miguel A; Azad, Rajeev K
2014-01-01
Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms.
Expression Divergence Is Correlated with Sequence Evolution but Not Positive Selection in Conifers.
Hodgins, Kathryn A; Yeaman, Sam; Nurkowski, Kristin A; Rieseberg, Loren H; Aitken, Sally N
2016-06-01
The evolutionary and genomic determinants of sequence evolution in conifers are poorly understood, and previous studies have found only limited evidence for positive selection. Using RNAseq data, we compared gene expression profiles to patterns of divergence and polymorphism in 44 seedlings of lodgepole pine (Pinus contorta) and 39 seedlings of interior spruce (Picea glauca × engelmannii) to elucidate the evolutionary forces that shape their genomes and their plastic responses to abiotic stress. We found that rapidly diverging genes tend to have greater expression divergence, lower expression levels, reduced levels of synonymous site diversity, and longer proteins than slowly diverging genes. Similar patterns were identified for the untranslated regions, but with some exceptions. We found evidence that genes with low expression levels had a larger fraction of nearly neutral sites, suggesting a primary role for negative selection in determining the association between evolutionary rate and expression level. There was limited evidence for differences in the rate of positive selection among genes with divergent versus conserved expression profiles and some evidence supporting relaxed selection in genes diverging in expression between the species. Finally, we identified a small number of genes that showed evidence of site-specific positive selection using divergence data alone. However, estimates of the proportion of sites fixed by positive selection (α) were in the range of other plant species with large effective population sizes suggesting relatively high rates of adaptive divergence among conifers. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
The D1-D2 region of the large subunit ribosomal DNA as barcode for ciliates.
Stoeck, T; Przybos, E; Dunthorn, M
2014-05-01
Ciliates are a major evolutionary lineage within the alveolates, which are distributed in nearly all habitats on our planet and are an essential component for ecosystem function, processes and stability. Accurate identification of these unicellular eukaryotes through, for example, microscopy or mating type reactions is reserved to few specialists. To satisfy the demand for a DNA barcode for ciliates, which meets the standard criteria for DNA barcodes defined by the Consortium for the Barcode of Life (CBOL), we here evaluated the D1-D2 region of the ribosomal DNA large subunit (LSU-rDNA). Primer universality for the phylum Ciliophora was tested in silico with available database sequences as well as in the laboratory with 73 ciliate species, which represented nine of 12 ciliate classes. Primers tested in this study were successful for all tested classes. To test the ability of the D1-D2 region to resolve conspecific and congeneric sequence divergence, 63 Paramecium strains were sampled from 24 mating species. The average conspecific D1-D2 variation was 0.18%, whereas congeneric sequence divergence averaged 4.83%. In pairwise genetic distance analyses, we identified a D1-D2 sequence divergence of <0.6% as an ideal threshold to discriminate Paramecium species. Using this definition, only 3.8% of all conspecific and 3.9% of all congeneric sequence comparisons had the potential of false assignments. Neighbour-joining analyses inferred monophyly for all taxa but for two Paramecium octaurelia strains. Here, we present a protocol for easy DNA amplification of single cells and voucher deposition. In conclusion, the presented data pinpoint the D1-D2 region as an excellent candidate for an official CBOL barcode for ciliated protists. © 2013 John Wiley & Sons Ltd.
Evolution of the arginase fold and functional diversity
Dowling, Daniel P.; Costanzo, Luigi Di; Gennadios, Heather A.; Christianson, David W.
2009-01-01
The large number of protein structures deposited in the Protein Data Bank allows for the identification of novel structural superfamilies based on conservation of fold in addition to conservation of amino acid sequence. Since sequence diverges more rapidly than fold in protein evolution, proteins with little or no significant sequence identity are occasionally observed to adopt similar folds, thereby reflecting unanticipated evolutionary relationships. Here, we review the unique α/β fold first observed in the manganese metalloenzyme rat liver arginase, consisting of a parallel 8 stranded β-sheet surrounded by several helices, and its evolutionary relationship with the zinc-requiring and/or iron-requiring histone deacetylases and acetylpolyamine amidohydrolases. Structural comparisons reveal key features of the core α/β fold that contribute to the divergent metal ion specificity and stoichiometry required for the chemical and biological functions of these enzymes. PMID:18360740
Kullback Leibler divergence in complete bacterial and phage genomes
Akhter, Sajia; Kashef, Mona T.; Ibrahim, Eslam S.; Bailey, Barbara
2017-01-01
The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses. PMID:29204318
Kullback Leibler divergence in complete bacterial and phage genomes.
Akhter, Sajia; Aziz, Ramy K; Kashef, Mona T; Ibrahim, Eslam S; Bailey, Barbara; Edwards, Robert A
2017-01-01
The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback-Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses.
Renz, Adina J.; Meyer, Axel; Kuraku, Shigehiro
2013-01-01
Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon. PMID:23825540
Renz, Adina J; Meyer, Axel; Kuraku, Shigehiro
2013-01-01
Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon.
Jensen, Annette Bruun; Eilenberg, Jørgen; López Lastra, Claudia
2009-11-01
Three DNA regions (ITS 1, LSU rRNA and GPD) of isolates from the insect-pathogenic fungus genus Entomophthora originating from different fly (Diptera) and aphid (Hemiptera) host taxa were sequenced. The results documented a large genetic diversity among the fly-pathogenic Entomophthora and only minor differences among aphid-pathogenic Entomophthora. The evolutionary time of divergence of the fly and the aphid host taxa included cannot account for this difference. The host-driven divergence of Entomophthora, therefore, has been much greater in flies than in aphids. Host-range differences or a recent host shift to aphid are possible explanations.
RECOVIR Software for Identifying Viruses
NASA Technical Reports Server (NTRS)
Chakravarty, Sugoto; Fox, George E.; Zhu, Dianhui
2013-01-01
Most single-stranded RNA (ssRNA) viruses mutate rapidly to generate a large number of strains with highly divergent capsid sequences. Determining the capsid residues or nucleotides that uniquely characterize these strains is critical in understanding the strain diversity of these viruses. RECOVIR (an acronym for "recognize viruses") software predicts the strains of some ssRNA viruses from their limited sequence data. Novel phylogenetic-tree-based databases of protein or nucleic acid residues that uniquely characterize these virus strains are created. Strains of input virus sequences (partial or complete) are predicted through residue-wise comparisons with the databases. RECOVIR uses unique characterizing residues to identify automatically strains of partial or complete capsid sequences of picorna and caliciviruses, two of the most highly diverse ssRNA virus families. Partition-wise comparisons of the database residues with the corresponding residues of more than 300 complete and partial sequences of these viruses resulted in correct strain identification for all of these sequences. This study shows the feasibility of creating databases of hitherto unknown residues uniquely characterizing the capsid sequences of two of the most highly divergent ssRNA virus families. These databases enable automated strain identification from partial or complete capsid sequences of these human and animal pathogens.
Sequence space and the ongoing expansion of the protein universe.
Povolotskaya, Inna S; Kondrashov, Fyodor A
2010-06-17
The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: approximately 98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, approximately 3.5 x 10(9) yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences.
Candida ruelliae sp. nov., a novel yeast species isolated from flowers of Ruellia sp. (Acanthaceae).
Saluja, Puja; Prasad, Gandham S
2008-06-01
Two novel yeast strains designated as 16Q1 and 16Q3 were isolated from flowers of the Ruellia species of the Acanthaceae family. The D1/D2 domain and ITS sequences of these two strains were identical. Sequence analysis of the D1/D2 domain of large-subunit rRNA gene indicated their relationship to species of the Candida haemulonii cluster. However, they differ from C. haemulonii by 14% nucleotide sequence divergence, from Candida pseudohaemulonii by 16.1% and from C. haemulonii type II by 16.5%. These strains also differ in 18 physiological tests from the type strain of C. haemulonii, and 12 and 16 tests, respectively, from C. pseudohaemulonii and C. haemulonii type II. They also differ from C. haemulonii and other related species by more than 13% sequence divergence in the internal transcribed spacer region. In the SSU rRNA gene sequences, strain 16Q1 differs by 1.7% nucleotide divergence from C. haemulonii. Sporulation was not observed in pure or mixed cultures on several media examined. All these data support the assignment of these strains to a novel species; we have named them as Candida ruelliae sp. nov., and designate strain 16Q1(T)=MTCC 7739(T)=CBS10815(T) as type strain of the novel species.
Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi.
Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M
2010-12-15
Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.
Amazonian phylogeography: mtDNA sequence variation in arboreal echimyid rodents (Caviomorpha).
da Silva, M N; Patton, J L
1993-09-01
Patterns of evolutionary relationships among haplotype clades of sequences of the mitochondrial cytochrome b DNA gene are examined for five genera of arboreal rodents of the Caviomorph family Echimyidae from the Amazon Basin. Data are available for 798 bp of sequence from a total of 24 separate localities in Peru, Venezuela, Bolivia, and Brazil for Mesomys, Isothrix, Makalata, Dactylomys, and Echimys. Sequence divergence, corrected for multiple hits, is extensive, ranging from less than 1% for comparisons within populations of over 20% among geographic units within genera. Both the degree of differentiation and the geographic patterning of the variation suggest that more than one species composes the Amazonian distribution of the currently recognized Mesomys hispidus, Isothrix bistriata, Makalata didelphoides, and Dactylomys dactylinus. There is general concordance in the geographic range of haplotype clades for each of these taxa, and the overall level of differentiation within them is largely equivalent. These observations suggest that a common vicariant history underlies the respective diversification of each genus. However, estimated times of divergence based on the rate of third position transversion substitutions for the major clades within each genus typically range above 1 million years. Thus, allopatric isolation precipitating divergence must have been considerably earlier than the late Pleistocene forest fragmentation events commonly invoked for Amazonian biota.
Poomtien, Jamroonsri; Jindamorakot, Sasitorn; Limtong, Savitree; Pinphanichakarn, Pairoh; Thaniyavarn, Jiraporn
2013-01-01
Three yeast strains were isolated from industrial wastes in Thailand. Based on the phylogenetic sequence analysis of the D1/D2 region of the large subunit rRNA gene, the internal transcribed spacer (ITS1-5.8S rRNA gene-ITS2; ITS1-2) region, and their physiological characteristics, the three strains were found to represent two novel species of the ascomycetous anamorphic yeast. Strain JP52(T) represent a novel species which was named Cyberlindnera samutprakarnensis sp. nov. (type strain JP52(T); = BCC 46825(T) = JCM 17816(T) = CBS 12528(T), MycoBank no. MB800879), which was differentiated from the closely related species Cyberlindnera mengyuniae CBS 10845(T) by 2.9 % sequence divergence in the D1/D2 region and 4.4 % sequence divergence in the ITS1-2. Strain JP59(T) and JP60 were identical in their D1/D2 and ITS1-2 regions, which were closely related to those of Scheffersomyces spartinae CBS 6059(T) by 0.9 and 1.0 % sequence divergence, respectively. In addition, supportive evidence of actin gene and translational elongation factor gene by sequence divergence of 6.5 % each confirmed their distinct status. Furthermore, JP59(T) and JP60 differentiated from the closely related species in some biochemical and physiological characteristics. These two strains were assigned as a single novel species which was named Candida thasaenensis sp. nov. (type JP59(T) = BCC 46828(T) = JCM 17817(T) = CBS 12529(T), MycoBank no. MB800880).
The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza.
Qian, Jun; Song, Jingyuan; Gao, Huanhuan; Zhu, Yingjie; Xu, Jiang; Pang, Xiaohui; Yao, Hui; Sun, Chao; Li, Xian'en; Li, Chuyuan; Liu, Juyan; Xu, Haibin; Chen, Shilin
2013-01-01
Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp) genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp) and small (SSC, 17,555 bp) single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp). It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR) analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.
2012-01-01
Background The NCBI Conserved Domain Database (CDD) consists of a collection of multiple sequence alignments of protein domains that are at various stages of being manually curated into evolutionary hierarchies based on conserved and divergent sequence and structural features. These domain models are annotated to provide insights into the relationships between sequence, structure and function via web-based BLAST searches. Results Here we automate the generation of conserved domain (CD) hierarchies using a combination of heuristic and Markov chain Monte Carlo (MCMC) sampling procedures and starting from a (typically very large) multiple sequence alignment. This procedure relies on statistical criteria to define each hierarchy based on the conserved and divergent sequence patterns associated with protein functional-specialization. At the same time this facilitates the sequence and structural annotation of residues that are functionally important. These statistical criteria also provide a means to objectively assess the quality of CD hierarchies, a non-trivial task considering that the protein subgroups are often very distantly related—a situation in which standard phylogenetic methods can be unreliable. Our aim here is to automatically generate (typically sub-optimal) hierarchies that, based on statistical criteria and visual comparisons, are comparable to manually curated hierarchies; this serves as the first step toward the ultimate goal of obtaining optimal hierarchical classifications. A plot of runtimes for the most time-intensive (non-parallelizable) part of the algorithm indicates a nearly linear time complexity so that, even for the extremely large Rossmann fold protein class, results were obtained in about a day. Conclusions This approach automates the rapid creation of protein domain hierarchies and thus will eliminate one of the most time consuming aspects of conserved domain database curation. At the same time, it also facilitates protein domain annotation by identifying those pattern residues that most distinguish each protein domain subgroup from other related subgroups. PMID:22726767
Resolving the tips of the tree of life: How much mitochondrialdata doe we need?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bonett, Ronald M.; Macey, J. Robert; Boore, Jeffrey L.
2005-04-29
Mitochondrial (mt) DNA sequences are used extensively to reconstruct evolutionary relationships among recently diverged animals,and have constituted the most widely used markers for species- and generic-level relationships for the last decade or more. However, most studies to date have employed relatively small portions of the mt-genome. In contrast, complete mt-genomes primarily have been used to investigate deep divergences, including several studies of the amount of mt sequence necessary to recover ancient relationships. We sequenced and analyzed 24 complete mt-genomes from a group of salamander species exhibiting divergences typical of those in many species-level studies. We present the first comprehensive investigationmore » of the amount of mt sequence data necessary to consistently recover the mt-genome tree at this level, using parsimony and Bayesian methods. Both methods of phylogenetic analysis revealed extremely similar results. A surprising number of well supported, yet conflicting, relationships were found in trees based on fragments less than {approx}2000 nucleotides (nt), typical of the vast majority of the thousands of mt-based studies published to date. Large amounts of data (11,500+ nt) were necessary to consistently recover the whole mt-genome tree. Some relationships consistently were recovered with fragments of all sizes, but many nodes required the majority of the mt-genome to stabilize, particularly those associated with short internal branches. Although moderate amounts of data (2000-3000 nt) were adequate to recover mt-based relationships for which most nodes were congruent with the whole mt-genome tree, many thousands of nucleotides were necessary to resolve rapid bursts of evolution. Recent advances in genomics are making collection of large amounts of sequence data highly feasible, and our results provide the basis for comparative studies of other closely related groups to optimize mt sequence sampling and phylogenetic resolution at the ''tips'' of the Tree of Life.« less
Centromere location in Arabidopsis is unaltered by extreme divergence in CENH3 protein sequence
2017-01-01
During cell division, spindle fibers attach to chromosomes at centromeres. The DNA sequence at regional centromeres is fast evolving with no conserved genetic signature for centromere identity. Instead CENH3, a centromere-specific histone H3 variant, is the epigenetic signature that specifies centromere location across both plant and animal kingdoms. Paradoxically, CENH3 is also adaptively evolving. An ongoing question is whether CENH3 evolution is driven by a functional relationship with the underlying DNA sequence. Here, we demonstrate that despite extensive protein sequence divergence, CENH3 histones from distant species assemble centromeres on the same underlying DNA sequence. We first characterized the organization and diversity of centromere repeats in wild-type Arabidopsis thaliana. We show that A. thaliana CENH3-containing nucleosomes exhibit a strong preference for a unique subset of centromeric repeats. These sequences are largely missing from the genome assemblies and represent the youngest and most homogeneous class of repeats. Next, we tested the evolutionary specificity of this interaction in a background in which the native A. thaliana CENH3 is replaced with CENH3s from distant species. Strikingly, we find that CENH3 from Lepidium oleraceum and Zea mays, although specifying epigenetically weaker centromeres that result in genome elimination upon outcrossing, show a binding pattern on A. thaliana centromere repeats that is indistinguishable from the native CENH3. Our results demonstrate positional stability of a highly diverged CENH3 on independently evolved repeats, suggesting that the sequence specificity of centromeres is determined by a mechanism independent of CENH3. PMID:28223399
Ross, Cody T.; Roodgar, Morteza; Smith, David Glenn
2015-01-01
We use the Reciprocal Smallest Distance (RSD) algorithm to identify amino acid sequence orthologs in the Chinese and Indian rhesus macaque draft sequences and estimate the evolutionary distance between such orthologs. We then use GOanna to map gene function annotations and human gene identifiers to the rhesus macaque amino acid sequences. We conclude methodologically by cross-tabulating a list of amino acid orthologs with large divergence scores with a list of genes known to be involved in SIV or HIV pathogenesis. We find that many of the amino acid sequences with large evolutionary divergence scores, as calculated by the RSD algorithm, have been shown to be related to HIV pathogenesis in previous laboratory studies. Four of the strongest candidate genes for SIVmac resistance in Chinese rhesus macaques identified in this study are CDK9, CXCL12, TRIM21, and TRIM32. Additionally, ANKRD30A, CTSZ, GORASP2, GTF2H1, IL13RA1, MUC16, NMDAR1, Notch1, NT5M, PDCD5, RAD50, and TM9SF2 were identified as possible candidates, among others. We failed to find many laboratory experiments contrasting the effects of Indian and Chinese orthologs at these sites on SIVmac pathogenesis, but future comparative studies might hold fertile ground for research into the biological mechanisms underlying innate resistance to SIVmac in Chinese rhesus macaques. PMID:25884674
Structure and evolution of cereal genomes.
Paterson, Andrew H; Bowers, John E; Peterson, Daniel G; Estill, James C; Chapman, Brad A
2003-12-01
The cereal species, of central importance to our diet, began to diverge 50-70 million years ago. For the past few thousand years, these species have undergone largely parallel selection regimes associated with domestication and improvement. The rice genome sequence provides a platform for organizing information about diverse cereals, and together with genetic maps and sequence samples from other cereals is yielding new insights into both the shared and the independent dimensions of cereal evolution. New data and population-based approaches are identifying genes that have been involved in cereal improvement. Reduced-representation sequencing promises to accelerate gene discovery in many large-genome cereals, and to better link the under-explored genomes of 'orphan' cereals with state-of-the-art knowledge.
Phylogeography of Canada Geese (Branta canadensis) in western North America
Scribner, K.T.; Talbot, S.L.; Pearce, J.M.; Pierson, Barbara J.; Bollinger, K.S.; Derksen, D.V.
2003-01-01
Using molecular genetic markers that differ in mode of inheritance and rate of evolution, we examined levels and partitioning of genetic variation for seven nominal subspecies (11 breeding populations) of Canada Geese (Branta canadensis) in western North America. Gene trees constructed from mtDNA control region sequence data show that subspecies of Canada Geese do not have distinct mtDNA. Large- and small-bodied forms of Canada Geese were highly diverged (0. 077 average sequence divergence) and represent monophyletic groups. A majority (65%) of 20 haplotypes resolved were observed in single breeding locales. However, within both large- and small-bodied forms certain haplotypes occurred across multiple subspecies. Population trees for both nuclear (microsatellites) and mitochondrial markers were generally concordant and provide resolution of population and subspecific relationships indicating incomplete lineage sorting. All populations and subspecies were genetically diverged, but to varying degrees. Analyses of molecular variance, nested-clade and coalescence-based analyses of mtDNA suggest that both historical (past fragmentation) and contemporary forces have been important in shaping current spatial genetic distributions. Gene flow appears to be ongoing though at different rates, even among currently recognized subspecies. The efficacy of current subspecific taxonomy is discussed in light of hypothesized historical vicariance and current demographic trends of management and conservation concern.
Evolution of Enzyme Superfamilies: Comprehensive Exploration of Sequence-Function Relationships.
Baier, F; Copp, J N; Tokuriki, N
2016-11-22
The sequence and functional diversity of enzyme superfamilies have expanded through billions of years of evolution from a common ancestor. Understanding how protein sequence and functional "space" have expanded, at both the evolutionary and molecular level, is central to biochemistry, molecular biology, and evolutionary biology. Integrative approaches that examine protein sequence, structure, and function have begun to provide comprehensive views of the functional diversity and evolutionary relationships within enzyme superfamilies. In this review, we outline the recent advances in our understanding of enzyme evolution and superfamily functional diversity. We describe the tools that have been used to comprehensively analyze sequence relationships and to characterize sequence and function relationships. We also highlight recent large-scale experimental approaches that systematically determine the activity profiles across enzyme superfamilies. We identify several intriguing insights from this recent body of work. First, promiscuous activities are prevalent among extant enzymes. Second, many divergent proteins retain "function connectivity" via enzyme promiscuity, which can be used to probe the evolutionary potential and history of enzyme superfamilies. Finally, we discuss open questions regarding the intricacies of enzyme divergence, as well as potential research directions that will deepen our understanding of enzyme superfamily evolution.
Genome analysis and polar tube firing dynamics of mosquito-infecting microsporidia
USDA-ARS?s Scientific Manuscript database
Microsporidia are highly divergent fungi that are obligate intracellular pathogens of a wide range of host organisms. Here we review recent findings from the genome sequences of mosquito-infecting microsporidian species Edhazardia aedis and Vavraia culicis, which show large differences in genome siz...
Centromere location in Arabidopsis is unaltered by extreme divergence in CENH3 protein sequence.
Maheshwari, Shamoni; Ishii, Takayoshi; Brown, C Titus; Houben, Andreas; Comai, Luca
2017-03-01
During cell division, spindle fibers attach to chromosomes at centromeres. The DNA sequence at regional centromeres is fast evolving with no conserved genetic signature for centromere identity. Instead CENH3, a centromere-specific histone H3 variant, is the epigenetic signature that specifies centromere location across both plant and animal kingdoms. Paradoxically, CENH3 is also adaptively evolving. An ongoing question is whether CENH3 evolution is driven by a functional relationship with the underlying DNA sequence. Here, we demonstrate that despite extensive protein sequence divergence, CENH3 histones from distant species assemble centromeres on the same underlying DNA sequence. We first characterized the organization and diversity of centromere repeats in wild-type Arabidopsis thaliana We show that A. thaliana CENH3-containing nucleosomes exhibit a strong preference for a unique subset of centromeric repeats. These sequences are largely missing from the genome assemblies and represent the youngest and most homogeneous class of repeats. Next, we tested the evolutionary specificity of this interaction in a background in which the native A. thaliana CENH3 is replaced with CENH3s from distant species. Strikingly, we find that CENH3 from Lepidium oleraceum and Zea mays , although specifying epigenetically weaker centromeres that result in genome elimination upon outcrossing, show a binding pattern on A. thaliana centromere repeats that is indistinguishable from the native CENH3. Our results demonstrate positional stability of a highly diverged CENH3 on independently evolved repeats, suggesting that the sequence specificity of centromeres is determined by a mechanism independent of CENH3. © 2017 Maheshwari et al.; Published by Cold Spring Harbor Laboratory Press.
Genotype imputation in a coalescent model with infinitely-many-sites mutation
Huang, Lucy; Buzbas, Erkan O.; Rosenberg, Noah A.
2012-01-01
Empirical studies have identified population-genetic factors as important determinants of the properties of genotype-imputation accuracy in imputation-based disease association studies. Here, we develop a simple coalescent model of three sequences that we use to explore the theoretical basis for the influence of these factors on genotype-imputation accuracy, under the assumption of infinitely-many-sites mutation. Employing a demographic model in which two populations diverged at a given time in the past, we derive the approximate expectation and variance of imputation accuracy in a study sequence sampled from one of the two populations, choosing between two reference sequences, one sampled from the same population as the study sequence and the other sampled from the other population. We show that under this model, imputation accuracy—as measured by the proportion of polymorphic sites that are imputed correctly in the study sequence—increases in expectation with the mutation rate, the proportion of the markers in a chromosomal region that are genotyped, and the time to divergence between the study and reference populations. Each of these effects derives largely from an increase in information available for determining the reference sequence that is genetically most similar to the sequence targeted for imputation. We analyze as a function of divergence time the expected gain in imputation accuracy in the target using a reference sequence from the same population as the target rather than from the other population. Together with a growing body of empirical investigations of genotype imputation in diverse human populations, our modeling framework lays a foundation for extending imputation techniques to novel populations that have not yet been extensively examined. PMID:23079542
Conceptual issues in Bayesian divergence time estimation
2016-01-01
Bayesian inference of species divergence times is an unusual statistical problem, because the divergence time parameters are not identifiable unless both fossil calibrations and sequence data are available. Commonly used marginal priors on divergence times derived from fossil calibrations may conflict with node order on the phylogenetic tree causing a change in the prior on divergence times for a particular topology. Care should be taken to avoid confusing this effect with changes due to informative sequence data. This effect is illustrated with examples. A topology-consistent prior that preserves the marginal priors is defined and examples are constructed. Conflicts between fossil calibrations and relative branch lengths (based on sequence data) can cause estimates of divergence times that are grossly incorrect, yet have a narrow posterior distribution. An example of this effect is given; it is recommended that overly narrow posterior distributions of divergence times should be carefully scrutinized. This article is part of the themed issue ‘Dating species divergences using rocks and clocks’. PMID:27325831
Conceptual issues in Bayesian divergence time estimation.
Rannala, Bruce
2016-07-19
Bayesian inference of species divergence times is an unusual statistical problem, because the divergence time parameters are not identifiable unless both fossil calibrations and sequence data are available. Commonly used marginal priors on divergence times derived from fossil calibrations may conflict with node order on the phylogenetic tree causing a change in the prior on divergence times for a particular topology. Care should be taken to avoid confusing this effect with changes due to informative sequence data. This effect is illustrated with examples. A topology-consistent prior that preserves the marginal priors is defined and examples are constructed. Conflicts between fossil calibrations and relative branch lengths (based on sequence data) can cause estimates of divergence times that are grossly incorrect, yet have a narrow posterior distribution. An example of this effect is given; it is recommended that overly narrow posterior distributions of divergence times should be carefully scrutinized.This article is part of the themed issue 'Dating species divergences using rocks and clocks'. © 2016 The Author(s).
Penny, D; Hasegawa, M; Waddell, P J; Hendy, M D
1999-03-01
We explore the tree of mammalian mtDNA sequences, using particularly the LogDet transform on amino acid sequences, the distance Hadamard transform, and the Closest Tree selection criterion. The amino acid composition of different species show significant differences, even within mammals. After compensating for these differences, nearest-neighbor bootstrap results suggest that the tree is locally stable, though a few groups show slightly greater rearrangements when a large proportion of the constant sites are removed. Many parts of the trees we obtain agree with those on published protein ML trees. Interesting results include a preference for rodent monophyly. The detection of a few alternative signals to those on the optimal tree were obtained using the distance Hadamard transform (with results expressed as a Lento plot). One rearrangement suggested was the interchange of the position of primates and rodents on the optimal tree. The basic stability of the tree, combined with two calibration points (whale/cow and horse/rhinoceros), together with a distant secondary calibration from the mammal/bird divergence, allows inferences of the times of divergence of putative clades. Allowing for sampling variances due to finite sequence length, most major divergences amongst lineages leading to modern orders, appear to occur well before the Cretaceous/Tertiary (K/T) boundary. Implications arising from these early divergences are discussed, particularly the possibility of competition between the small dinosaurs and the new mammal clades.
Can DNA barcoding accurately discriminate megadiverse Neotropical freshwater fish fauna?
2013-01-01
Background The megadiverse Neotropical freshwater ichthyofauna is the richest in the world with approximately 6,000 recognized species. Interestingly, they are distributed among only 17 orders, and almost 80% of them belong to only three orders: Characiformes, Siluriformes and Perciformes. Moreover, evidence based on molecular data has shown that most of the diversification of the Neotropical ichthyofauna occurred recently. These characteristics make the taxonomy and identification of this fauna a great challenge, even when using molecular approaches. In this context, the present study aimed to test the effectiveness of the barcoding methodology (COI gene) to identify the mega diverse freshwater fish fauna from the Neotropical region. For this purpose, 254 species of fishes were analyzed from the Upper Parana River basin, an area representative of the larger Neotropical region. Results Of the 254 species analyzed, 252 were correctly identified by their barcode sequences (99.2%). The main K2P intra- and inter-specific genetic divergence values (0.3% and 6.8%, respectively) were relatively low compared with similar values reported in the literature, reflecting the higher number of closely related species belonging to a few higher taxa and their recent radiation. Moreover, for 84 pairs of species that showed low levels of genetic divergence (<2%), application of a complementary character-based nucleotide diagnostic approach proved useful in discriminating them. Additionally, 14 species displayed high intra-specific genetic divergence (>2%), pointing to at least 23 strong candidates for new species. Conclusions Our study is the first to examine a large number of freshwater fish species from the Neotropical area, including a large number of closely related species. The results confirmed the efficacy of the barcoding methodology to identify a recently radiated, megadiverse fauna, discriminating 99.2% of the analyzed species. The power of the barcode sequences to identify species, even with low interspecific divergence, gives us an idea of the distribution of inter-specific genetic divergence in these megadiverse fauna. The results also revealed hidden genetic divergences suggestive of reproductive isolation and putative cryptic speciation in some species (23 candidates for new species). Finally, our study constituted an important contribution to the international Barcoding of Life (iBOL.org) project, providing barcode sequences for use in identification of these species by experts and non-experts, and allowing them to be available for use in other applications. PMID:23497346
Microbial evolution of sulphate reduction when lateral gene transfer is geographically restricted.
Chi Fru, E
2011-07-01
Lateral gene transfer (LGT) is an important mechanism by which micro-organisms acquire new functions. This process has been suggested to be central to prokaryotic evolution in various environments. However, the influence of geographical constraints on the evolution of laterally acquired genes in microbial metabolic evolution is not yet well understood. In this study, the influence of geographical isolation on the evolution of laterally acquired dissimilatory sulphite reductase (dsr) gene sequences in the sulphate-reducing micro-organisms (SRM) was investigated. Sequences on four continental blocks related to SRM known to have received dsr by LGT were analysed using standard phylogenetic and multidimensional statistical methods. Sequences related to lineages with large genetic diversity correlated positively with habitat divergence. Those affiliated to Thermodesulfobacterium indicated strong biogeographical delineation; hydrothermal-vent sequences clustered independently from hot-spring sequences. Some of the hydrothermal-vent and hot-spring sequences suggested to have been acquired from a common ancestral source may have diverged upon isolation within distinct habitats. In contrast, analysis of some Desulfotomaculum sequences indicated they could have been transferred from different ancestral sources but converged upon isolation within the same niche. These results hint that, after lateral acquisition of dsr genes, barriers to gene flow probably play a strong role in their subsequent evolution.
Evolutionary distances in the twilight zone--a rational kernel approach.
Schwarz, Roland F; Fletcher, William; Förster, Frank; Merget, Benjamin; Wolf, Matthias; Schultz, Jörg; Markowetz, Florian
2010-12-31
Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.
Troggio, Michela; Surbanovski, Nada; Bianco, Luca; Moretto, Marco; Giongo, Lara; Banchi, Elisa; Viola, Roberto; Fernández, Felicdad Fernández; Costa, Fabrizio; Velasco, Riccardo; Cestaro, Alessandro; Sargent, Daniel James
2013-01-01
High throughput arrays for the simultaneous genotyping of thousands of single-nucleotide polymorphisms (SNPs) have made the rapid genetic characterisation of plant genomes and the development of saturated linkage maps a realistic prospect for many plant species of agronomic importance. However, the correct calling of SNP genotypes in divergent polyploid genomes using array technology can be problematic due to paralogy, and to divergence in probe sequences causing changes in probe binding efficiencies. An Illumina Infinium II whole-genome genotyping array was recently developed for the cultivated apple and used to develop a molecular linkage map for an apple rootstock progeny (M432), but a large proportion of segregating SNPs were not mapped in the progeny, due to unexpected genotype clustering patterns. To investigate the causes of this unexpected clustering we performed BLAST analysis of all probe sequences against the 'Golden Delicious' genome sequence and discovered evidence for paralogous annealing sites and probe sequence divergence for a high proportion of probes contained on the array. Following visual re-evaluation of the genotyping data generated for 8,788 SNPs for the M432 progeny using the array, we manually re-scored genotypes at 818 loci and mapped a further 797 markers to the M432 linkage map. The newly mapped markers included the majority of those that could not be mapped previously, as well as loci that were previously scored as monomorphic, but which segregated due to divergence leading to heterozygosity in probe annealing sites. An evaluation of the 8,788 probes in a diverse collection of Malus germplasm showed that more than half the probes returned genotype clustering patterns that were difficult or impossible to interpret reliably, highlighting implications for the use of the array in genome-wide association studies.
Tzika, Athanasia C; Helaers, Raphaël; Schramm, Gerrit; Milinkovitch, Michel C
2011-09-26
Reptiles are largely under-represented in comparative genomics despite the fact that they are substantially more diverse in many respects than mammals. Given the high divergence of reptiles from classical model species, next-generation sequencing of their transcriptomes is an approach of choice for gene identification and annotation. Here, we use 454 technology to sequence the brain transcriptome of four divergent reptilian and one reference avian species: the Nile crocodile, the corn snake, the bearded dragon, the red-eared turtle, and the chicken. Using an in-house pipeline for recursive similarity searches of >3,000,000 reads against multiple databases from 7 reference vertebrates, we compile a reptilian comparative transcriptomics dataset, with homology assignment for 20,000 to 31,000 transcripts per species and a cumulated non-redundant sequence length of 248.6 Mbases. Our approach identifies the majority (87%) of chicken brain transcripts and about 50% of de novo assembled reptilian transcripts. In addition to 57,502 microsatellite loci, we identify thousands of SNP and indel polymorphisms for population genetic and linkage analyses. We also build very large multiple alignments for Sauropsida and mammals (two million residues per species) and perform extensive phylogenetic analyses suggesting that turtles are not basal living reptiles but are rather associated with Archosaurians, hence, potentially answering a long-standing question in the phylogeny of Amniotes. The reptilian transcriptome (freely available at http://www.reptilian-transcriptomes.org) should prove a useful new resource as reptiles are becoming important new models for comparative genomics, ecology, and evolutionary developmental genetics.
USDA-ARS?s Scientific Manuscript database
Phylogenetic relatedness among ascomycetous yeast genera (subphylum Saccharomycotina, phylum Ascomycota) has been uncertain. In the present study, type species of 70 currently recognized genera are compared from divergence in the nearly entire nuclear gene sequences for large subunit rRNA, small sub...
Molecular cloning and expression analysis of WRKY transcription factor genes in Salvia miltiorrhiza.
Li, Caili; Li, Dongqiao; Shao, Fenjuan; Lu, Shanfa
2015-03-17
WRKY proteins comprise a large family of transcription factors and play important regulatory roles in plant development and defense response. The WRKY gene family in Salvia miltiorrhiza has not been characterized. A total of 61 SmWRKYs were cloned from S. miltiorrhiza. Multiple sequence alignment showed that SmWRKYs could be classified into 3 groups and 8 subgroups. Sequence features, the WRKY domain and other motifs of SmWRKYs are largely conserved with Arabidopsis AtWRKYs. Each group of WRKY domains contains characteristic conserved sequences, and group-specific motifs might attribute to functional divergence of WRKYs. A total of 17 pairs of orthologous SmWRKY and AtWRKY genes and 21 pairs of paralogous SmWRKY genes were identified. Maximum likelihood analysis showed that SmWRKYs had undergone strong selective pressure for adaptive evolution. Functional divergence analysis suggested that the SmWRKY subgroup genes and many paralogous SmWRKY gene pairs were divergent in functions. Various critical amino acids contributed to functional divergence among subgroups were detected. Of the 61 SmWRKYs, 22, 13, 4 and 1 were predominantly expressed in roots, stems, leaves, and flowers, respectively. The other 21 were mainly expressed in at least two tissues analyzed. In S. miltiorrhiza roots treated with MeJA, significant changes of gene expression were observed for 49 SmWRKYs, of which 26 were up-regulated, 18 were down-regulated, while the other 5 were either up-regulated or down-regulated at different time-points of treatment. Analysis of published RNA-seq data showed that 42 of the 61 identified SmWRKYs were yeast extract and Ag(+)-responsive. Through a systematic analysis, SmWRKYs potentially involved in tanshinone biosynthesis were predicted. These results provide insights into functional conservation and diversification of SmWRKYs and are useful information for further elucidating SmWRKY functions.
Zhao, Shancen; Zheng, Pingping; Dong, Shanshan; Zhan, Xiangjiang; Wu, Qi; Guo, Xiaosen; Hu, Yibo; He, Weiming; Zhang, Shanning; Fan, Wei; Zhu, Lifeng; Li, Dong; Zhang, Xuemei; Chen, Quan; Zhang, Hemin; Zhang, Zhihe; Jin, Xuelin; Zhang, Jinguo; Yang, Huanming; Wang, Jian; Wang, Jun; Wei, Fuwen
2013-01-01
The panda lineage dates back to the late Miocene and ultimately leads to only one extant species, the giant panda (Ailuropoda melanoleuca). Although global climate change and anthropogenic disturbances are recognized to shape animal population demography their contribution to panda population dynamics remains largely unknown. We sequenced the whole genomes of 34 pandas at an average 4.7-fold coverage and used this data set together with the previously deep-sequenced panda genome to reconstruct a continuous demographic history of pandas from their origin to the present. We identify two population expansions, two bottlenecks and two divergences. Evidence indicated that, whereas global changes in climate were the primary drivers of population fluctuation for millions of years, human activities likely underlie recent population divergence and serious decline. We identified three distinct panda populations that show genetic adaptation to their environments. However, in all three populations, anthropogenic activities have negatively affected pandas for 3,000 years.
Cao, Ya-Nan; Wang, Ian J; Chen, Lu-Yao; Ding, Yan-Qian; Liu, Lu-Xian; Qiu, Ying-Xiong
2018-04-17
The relative roles of geography, climate and ecology in driving population divergence and (incipient) speciation has so far been largely neglected in studies addressing the evolution of East Asia's island flora. Here, we employed chloroplast and ribosomal DNA sequences and restriction site-associated DNA sequencing (RADseq) loci to investigate the phylogeography and drivers of population divergence of Neolitsea sericea. These data sets support the subdivision of N. sericea populations into the Southern and Northern lineages across the 'Tokara gap'. Two distinct sublineages were further identified for the Northern lineage of N. sericea from the RADseq data. RADseq was also used along with approximate Bayesian computation to show that the current distribution and differentiation of N. sericea populations resulted from a combination of relatively ancient migration and successive vicariant events that likely occurred during the mid to late Pleistocene. Landscape genomic analyses showed that, apart from geographic barriers, barrier, potentially local adaptation to different climatic conditions appears to be one of the major drivers for lineage diversification of N. sericea. Copyright © 2018 Elsevier Inc. All rights reserved.
Spielmann, A; Stutz, E
1983-10-25
The soybean chloroplast psb A gene (photosystem II thylakoid membrane protein of Mr 32 000, lysine-free) and the trn H gene (tRNAHisGUG), which both map in the large single copy region adjacent to one of the inverted repeat structures (IR1), have been sequenced including flanking regions. The psb A gene shows in its structural part 92% sequence homology with the corresponding genes of spinach and N. debneyi and contains also an open reading frame for 353 aminoacids. The aminoacid sequence of a potential primary translation product (calculated Mr, 38 904, no lysine) diverges from that of spinach and N. debneyi in only two positions in the C-terminal part. The trn H gene has the same polarity as the psb A gene and the coding region is located at the very end of the large single copy region. The deduced sequence of the soybean chloroplast tRNAHisGUG is identical with that of Zea mays chloroplasts. Both ends of the large single copy region were sequenced including a small segment of the adjacent IR1 and IR2.
Wei, Chaoling; Yang, Hua; Wang, Songbo; Zhao, Jian; Liu, Chun; Gao, Liping; Xia, Enhua; Lu, Ying; Tai, Yuling; She, Guangbiao; Sun, Jun; Cao, Haisheng; Tong, Wei; Gao, Qiang; Li, Yeyun; Deng, Weiwei; Jiang, Xiaolan; Wang, Wenzhao; Chen, Qi; Zhang, Shihua; Li, Haijing; Wu, Junlan; Wang, Ping; Li, Penghui; Shi, Chengying; Zheng, Fengya; Jian, Jianbo; Huang, Bei; Shan, Dai; Shi, Mingming; Fang, Congbing; Yue, Yi; Li, Fangdong; Li, Daxiang; Wei, Shu; Han, Bin; Jiang, Changjun; Yin, Ye; Xia, Tao; Zhang, Zhengzhu; Bennetzen, Jeffrey L; Zhao, Shancen; Wan, Xiaochun
2018-05-01
Tea, one of the world's most important beverage crops, provides numerous secondary metabolites that account for its rich taste and health benefits. Here we present a high-quality sequence of the genome of tea, Camellia sinensis var. sinensis (CSS), using both Illumina and PacBio sequencing technologies. At least 64% of the 3.1-Gb genome assembly consists of repetitive sequences, and the rest yields 33,932 high-confidence predictions of encoded proteins. Divergence between two major lineages, CSS and Camellia sinensis var. assamica (CSA), is calculated to ∼0.38 to 1.54 million years ago (Mya). Analysis of genic collinearity reveals that the tea genome is the product of two rounds of whole-genome duplications (WGDs) that occurred ∼30 to 40 and ∼90 to 100 Mya. We provide evidence that these WGD events, and subsequent paralogous duplications, had major impacts on the copy numbers of secondary metabolite genes, particularly genes critical to producing three key quality compounds: catechins, theanine, and caffeine. Analyses of transcriptome and phytochemistry data show that amplification and transcriptional divergence of genes encoding a large acyltransferase family and leucoanthocyanidin reductases are associated with the characteristic young leaf accumulation of monomeric galloylated catechins in tea, while functional divergence of a single member of the glutamine synthetase gene family yielded theanine synthetase. This genome sequence will facilitate understanding of tea genome evolution and tea metabolite pathways, and will promote germplasm utilization for breeding improved tea varieties. Copyright © 2018 the Author(s). Published by PNAS.
Wei, Chaoling; Yang, Hua; Wang, Songbo; Zhao, Jian; Liu, Chun; Gao, Liping; Xia, Enhua; Lu, Ying; Tai, Yuling; She, Guangbiao; Sun, Jun; Cao, Haisheng; Tong, Wei; Gao, Qiang; Li, Yeyun; Deng, Weiwei; Jiang, Xiaolan; Wang, Wenzhao; Chen, Qi; Zhang, Shihua; Li, Haijing; Wu, Junlan; Wang, Ping; Li, Penghui; Shi, Chengying; Zheng, Fengya; Jian, Jianbo; Huang, Bei; Shan, Dai; Shi, Mingming; Fang, Congbing; Yue, Yi; Li, Fangdong; Li, Daxiang; Wei, Shu; Han, Bin; Jiang, Changjun; Yin, Ye; Xia, Tao; Zhang, Zhengzhu; Bennetzen, Jeffrey L.; Zhao, Shancen; Wan, Xiaochun
2018-01-01
Tea, one of the world’s most important beverage crops, provides numerous secondary metabolites that account for its rich taste and health benefits. Here we present a high-quality sequence of the genome of tea, Camellia sinensis var. sinensis (CSS), using both Illumina and PacBio sequencing technologies. At least 64% of the 3.1-Gb genome assembly consists of repetitive sequences, and the rest yields 33,932 high-confidence predictions of encoded proteins. Divergence between two major lineages, CSS and Camellia sinensis var. assamica (CSA), is calculated to ∼0.38 to 1.54 million years ago (Mya). Analysis of genic collinearity reveals that the tea genome is the product of two rounds of whole-genome duplications (WGDs) that occurred ∼30 to 40 and ∼90 to 100 Mya. We provide evidence that these WGD events, and subsequent paralogous duplications, had major impacts on the copy numbers of secondary metabolite genes, particularly genes critical to producing three key quality compounds: catechins, theanine, and caffeine. Analyses of transcriptome and phytochemistry data show that amplification and transcriptional divergence of genes encoding a large acyltransferase family and leucoanthocyanidin reductases are associated with the characteristic young leaf accumulation of monomeric galloylated catechins in tea, while functional divergence of a single member of the glutamine synthetase gene family yielded theanine synthetase. This genome sequence will facilitate understanding of tea genome evolution and tea metabolite pathways, and will promote germplasm utilization for breeding improved tea varieties. PMID:29678829
Makowsky, Robert; Cox, Christian L; Roelke, Corey; Chippindale, Paul T
2010-11-01
Determining the appropriate gene for phylogeny reconstruction can be a difficult process. Rapidly evolving genes tend to resolve recent relationships, but suffer from alignment issues and increased homoplasy among distantly related species. Conversely, slowly evolving genes generally perform best for deeper relationships, but lack sufficient variation to resolve recent relationships. We determine the relationship between sequence divergence and Bayesian phylogenetic reconstruction ability using both natural and simulated datasets. The natural data are based on 28 well-supported relationships within the subphylum Vertebrata. Sequences of 12 genes were acquired and Bayesian analyses were used to determine phylogenetic support for correct relationships. Simulated datasets were designed to determine whether an optimal range of sequence divergence exists across extreme phylogenetic conditions. Across all genes we found that an optimal range of divergence for resolving the correct relationships does exist, although this level of divergence expectedly depends on the distance metric. Simulated datasets show that an optimal range of sequence divergence exists across diverse topologies and models of evolution. We determine that a simple to measure property of genetic sequences (genetic distance) is related to phylogenic reconstruction ability in Bayesian analyses. This information should be useful for selecting the most informative gene to resolve any relationships, especially those that are difficult to resolve, as well as minimizing both cost and confounding information during project design. Copyright © 2010. Published by Elsevier Inc.
Troyer, Jennifer L; Pecon-Slattery, Jill; Roelke, Melody E; Black, Lori; Packer, Craig; O'Brien, Stephen J
2004-04-01
Feline immunodeficiency virus (FIV) is a lentivirus that causes AIDS-like immunodeficiency disease in domestic cats. Free-ranging lions, Panthera leo, carry a chronic species-specific strain of FIV, FIV-Ple, which so far has not been convincingly connected with immune pathology or mortality. FIV-Ple, harboring the three distinct strains A, B, and C defined by pol gene sequence divergences, is endemic in the large outbred population of lions in the Serengeti ecosystem in Tanzania. Here we describe the pattern of variation in the three FIV genes gag, pol-RT, and pol-RNase among lions within 13 prides to assess the occurrence of FIV infection and coinfection. Genome diversity within and among FIV-Ple strains is shown to be large, with strain divergence for each gene approaching genetic distances observed for FIV between different species of cats. Multiple in fections with two or three strains were found in 43% of the FIV-positive individuals based on pol-RT sequence analysis, which may suggest that antiviral immunity or interference evoked by one strain is not consistently protective against infection by a second. This comprehensive study of FIV-Ple in a free-ranging population of lions reveals a dynamic transmission of virus in a social species that has historically adapted to render the virus benign.
Troyer, Jennifer L.; Pecon-Slattery, Jill; Roelke, Melody E.; Black, Lori; Packer, Craig; O'Brien, Stephen J.
2004-01-01
Feline immunodeficiency virus (FIV) is a lentivirus that causes AIDS-like immunodeficiency disease in domestic cats. Free-ranging lions, Panthera leo, carry a chronic species-specific strain of FIV, FIV-Ple, which so far has not been convincingly connected with immune pathology or mortality. FIV-Ple, harboring the three distinct strains A, B, and C defined by pol gene sequence divergences, is endemic in the large outbred population of lions in the Serengeti ecosystem in Tanzania. Here we describe the pattern of variation in the three FIV genes gag, pol-RT, and pol-RNase among lions within 13 prides to assess the occurrence of FIV infection and coinfection. Genome diversity within and among FIV-Ple strains is shown to be large, with strain divergence for each gene approaching genetic distances observed for FIV between different species of cats. Multiple in fections with two or three strains were found in 43% of the FIV-positive individuals based on pol-RT sequence analysis, which may suggest that antiviral immunity or interference evoked by one strain is not consistently protective against infection by a second. This comprehensive study of FIV-Ple in a free-ranging population of lions reveals a dynamic transmission of virus in a social species that has historically adapted to render the virus benign. PMID:15016897
New Hepatitis B Virus of Cranes That Has an Unexpected Broad Host Range
Prassolov, Alexej; Hohenberg, Heinz; Kalinina, Tatyana; Schneider, Carola; Cova, Lucyna; Krone, Oliver; Frölich, Kai; Will, Hans; Sirma, Hüseyin
2003-01-01
All hepadnaviruses known so far have a very limited host range, restricted to their natural hosts and a few closely related species. This is thought to be due mainly to sequence divergence in the large envelope protein and species-specific differences in host components essential for virus propagation. Here we report an infection of cranes with a novel hepadnavirus, designated CHBV, that has an unexpectedly broad host range and is only distantly evolutionarily related to avihepadnaviruses of related hosts. Direct DNA sequencing of amplified CHBV DNA as well a sequencing of cloned viral genomes revealed that CHBV is most closely related to, although distinct from, Ross' goose hepatitis B virus (RGHBV) and slightly less closely related to duck hepatitis B virus (DHBV). Phylogenetically, cranes are very distant from geese and ducks and are most closely related to herons and storks. Naturally occurring hepadnaviruses in the last two species are highly divergent in sequence from RGHBV and DHBV and do not infect ducks or do so only marginally. In contrast, CHBV from crane sera and recombinant CHBV produced from LMH cells infected primary duck hepatocytes almost as efficiently as DHBV did. This is the first report of a rather broad host range of an avihepadnavirus. Our data imply either usage of similar or identical entry pathways and receptors by DHBV and CHBV, unusual host and virus adaptation mechanisms, or divergent evolution of the host genomes and cellular components required for virus propagation. PMID:12525630
New hepatitis B virus of cranes that has an unexpected broad host range.
Prassolov, Alexej; Hohenberg, Heinz; Kalinina, Tatyana; Schneider, Carola; Cova, Lucyna; Krone, Oliver; Frölich, Kai; Will, Hans; Sirma, Hüseyin
2003-02-01
All hepadnaviruses known so far have a very limited host range, restricted to their natural hosts and a few closely related species. This is thought to be due mainly to sequence divergence in the large envelope protein and species-specific differences in host components essential for virus propagation. Here we report an infection of cranes with a novel hepadnavirus, designated CHBV, that has an unexpectedly broad host range and is only distantly evolutionarily related to avihepadnaviruses of related hosts. Direct DNA sequencing of amplified CHBV DNA as well a sequencing of cloned viral genomes revealed that CHBV is most closely related to, although distinct from, Ross' goose hepatitis B virus (RGHBV) and slightly less closely related to duck hepatitis B virus (DHBV). Phylogenetically, cranes are very distant from geese and ducks and are most closely related to herons and storks. Naturally occurring hepadnaviruses in the last two species are highly divergent in sequence from RGHBV and DHBV and do not infect ducks or do so only marginally. In contrast, CHBV from crane sera and recombinant CHBV produced from LMH cells infected primary duck hepatocytes almost as efficiently as DHBV did. This is the first report of a rather broad host range of an avihepadnavirus. Our data imply either usage of similar or identical entry pathways and receptors by DHBV and CHBV, unusual host and virus adaptation mechanisms, or divergent evolution of the host genomes and cellular components required for virus propagation.
An improved approximate-Bayesian model-choice method for estimating shared evolutionary history
2014-01-01
Background To understand biological diversification, it is important to account for large-scale processes that affect the evolutionary history of groups of co-distributed populations of organisms. Such events predict temporally clustered divergences times, a pattern that can be estimated using genetic data from co-distributed species. I introduce a new approximate-Bayesian method for comparative phylogeographical model-choice that estimates the temporal distribution of divergences across taxa from multi-locus DNA sequence data. The model is an extension of that implemented in msBayes. Results By reparameterizing the model, introducing more flexible priors on demographic and divergence-time parameters, and implementing a non-parametric Dirichlet-process prior over divergence models, I improved the robustness, accuracy, and power of the method for estimating shared evolutionary history across taxa. Conclusions The results demonstrate the improved performance of the new method is due to (1) more appropriate priors on divergence-time and demographic parameters that avoid prohibitively small marginal likelihoods for models with more divergence events, and (2) the Dirichlet-process providing a flexible prior on divergence histories that does not strongly disfavor models with intermediate numbers of divergence events. The new method yields more robust estimates of posterior uncertainty, and thus greatly reduces the tendency to incorrectly estimate models of shared evolutionary history with strong support. PMID:24992937
The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences
2010-01-01
Background In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. Results We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS) sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity) elsewhere in the genome, but only 23% have identical copies (99% identity). The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. Conclusions This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is a feasible goal. PMID:20609256
Wang, Xiao-Wei; Zhao, Qiong-Yi; Luan, Jun-Bo; Wang, Yu-Jun; Yan, Gen-Hong; Liu, Shu-Sheng
2012-10-04
Genomic divergence between invasive and native species may provide insight into the molecular basis underlying specific characteristics that drive the invasion and displacement of closely related species. In this study, we sequenced the transcriptome of an indigenous species, Asia II 3, of the Bemisia tabaci complex and compared its genetic divergence with the transcriptomes of two invasive whiteflies species, Middle East Asia Minor 1 (MEAM1) and Mediterranean (MED), respectively. More than 16 million reads of 74 base pairs in length were obtained for the Asia II 3 species using the Illumina sequencing platform. These reads were assembled into 52,535 distinct sequences (mean size: 466 bp) and 16,596 sequences were annotated with an E-value above 10-5. Protein family comparisons revealed obvious diversification among the transcriptomes of these species suggesting species-specific adaptations during whitefly evolution. On the contrary, substantial conservation of the whitefly transcriptomes was also evident, despite their differences. The overall divergence of coding sequences between the orthologous gene pairs of Asia II 3 and MEAM1 is 1.73%, which is comparable to the average divergence of Asia II 3 and MED transcriptomes (1.84%) and much higher than that of MEAM1 and MED (0.83%). This is consistent with the previous phylogenetic analyses and crossing experiments suggesting these are distinct species. We also identified hundreds of highly diverged genes and compiled sequence identify data into gene functional groups and found the most divergent gene classes are Cytochrome P450, Glutathione metabolism and Oxidative phosphorylation. These results strongly suggest that the divergence of genes related to metabolism might be the driving force of the MEAM1 and Asia II 3 differentiation. We also analyzed single nucleotide polymorphisms within the orthologous gene pairs of indigenous and invasive whiteflies which are helpful for the investigation of association between allelic and phenotypes. Our data present the most comprehensive sequences for the indigenous whitefly species Asia II 3. The extensive comparisons of Asia II 3, MEAM1 and MED transcriptomes will serve as an invaluable resource for revealing the genetic basis of whitefly invasion and the molecular mechanisms underlying their biological differences.
2012-01-01
Background Genomic divergence between invasive and native species may provide insight into the molecular basis underlying specific characteristics that drive the invasion and displacement of closely related species. In this study, we sequenced the transcriptome of an indigenous species, Asia II 3, of the Bemisia tabaci complex and compared its genetic divergence with the transcriptomes of two invasive whiteflies species, Middle East Asia Minor 1 (MEAM1) and Mediterranean (MED), respectively. Results More than 16 million reads of 74 base pairs in length were obtained for the Asia II 3 species using the Illumina sequencing platform. These reads were assembled into 52,535 distinct sequences (mean size: 466 bp) and 16,596 sequences were annotated with an E-value above 10-5. Protein family comparisons revealed obvious diversification among the transcriptomes of these species suggesting species-specific adaptations during whitefly evolution. On the contrary, substantial conservation of the whitefly transcriptomes was also evident, despite their differences. The overall divergence of coding sequences between the orthologous gene pairs of Asia II 3 and MEAM1 is 1.73%, which is comparable to the average divergence of Asia II 3 and MED transcriptomes (1.84%) and much higher than that of MEAM1 and MED (0.83%). This is consistent with the previous phylogenetic analyses and crossing experiments suggesting these are distinct species. We also identified hundreds of highly diverged genes and compiled sequence identify data into gene functional groups and found the most divergent gene classes are Cytochrome P450, Glutathione metabolism and Oxidative phosphorylation. These results strongly suggest that the divergence of genes related to metabolism might be the driving force of the MEAM1 and Asia II 3 differentiation. We also analyzed single nucleotide polymorphisms within the orthologous gene pairs of indigenous and invasive whiteflies which are helpful for the investigation of association between allelic and phenotypes. Conclusions Our data present the most comprehensive sequences for the indigenous whitefly species Asia II 3. The extensive comparisons of Asia II 3, MEAM1 and MED transcriptomes will serve as an invaluable resource for revealing the genetic basis of whitefly invasion and the molecular mechanisms underlying their biological differences. PMID:23036081
Zhu, Tianqi; Dos Reis, Mario; Yang, Ziheng
2015-03-01
Genetic sequence data provide information about the distances between species or branch lengths in a phylogeny, but not about the absolute divergence times or the evolutionary rates directly. Bayesian methods for dating species divergences estimate times and rates by assigning priors on them. In particular, the prior on times (node ages on the phylogeny) incorporates information in the fossil record to calibrate the molecular tree. Because times and rates are confounded, our posterior time estimates will not approach point values even if an infinite amount of sequence data are used in the analysis. In a previous study we developed a finite-sites theory to characterize the uncertainty in Bayesian divergence time estimation in analysis of large but finite sequence data sets under a strict molecular clock. As most modern clock dating analyses use more than one locus and are conducted under relaxed clock models, here we extend the theory to the case of relaxed clock analysis of data from multiple loci (site partitions). Uncertainty in posterior time estimates is partitioned into three sources: Sampling errors in the estimates of branch lengths in the tree for each locus due to limited sequence length, variation of substitution rates among lineages and among loci, and uncertainty in fossil calibrations. Using a simple but analogous estimation problem involving the multivariate normal distribution, we predict that as the number of loci ([Formula: see text]) goes to infinity, the variance in posterior time estimates decreases and approaches the infinite-data limit at the rate of 1/[Formula: see text], and the limit is independent of the number of sites in the sequence alignment. We then confirmed the predictions by using computer simulation on phylogenies of two or three species, and by analyzing a real genomic data set for six primate species. Our results suggest that with the fossil calibrations fixed, analyzing multiple loci or site partitions is the most effective way for improving the precision of posterior time estimation. However, even if a huge amount of sequence data is analyzed, considerable uncertainty will persist in time estimates. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society of Systematic Biologists.
2012-01-01
Background Adaptive divergence driven by environmental heterogeneity has long been a fascinating topic in ecology and evolutionary biology. The study of the genetic basis of adaptive divergence has, however, been greatly hampered by a lack of genomic information. The recent development of transcriptome sequencing provides an unprecedented opportunity to generate large amounts of genomic data for detailed investigations of the genetics of adaptive divergence in non-model organisms. Herein, we used the Illumina sequencing platform to sequence the transcriptome of brain and liver tissues from a single individual of the Vinous-throated Parrotbill, Paradoxornis webbianus bulomachus, an ecologically important avian species in Taiwan with a wide elevational range of sea level to 3100 m. Results Our 10.1 Gbp of sequences were first assembled based on Zebra Finch (Taeniopygia guttata) and chicken (Gallus gallus) RNA references. The remaining reads were then de novo assembled. After filtering out contigs with low coverage (<10X), we retained 67,791 of 487,336 contigs, which covered approximately 5.3% of the P. w. bulomachus genome. Of 7,779 contigs retained for a top-hit species distribution analysis, the majority (about 86%) were matched to known Zebra Finch and chicken transcripts. We also annotated 6,365 contigs to gene ontology (GO) terms: in total, 122 GO-slim terms were assigned, including biological process (41%), molecular function (32%), and cellular component (27%). Many potential genetic markers for future adaptive genomic studies were also identified: 8,589 single nucleotide polymorphisms, 1,344 simple sequence repeats and 109 candidate genes that might be involved in elevational or climate adaptation. Conclusions Our study shows that transcriptome data can serve as a rich genetic resource, even for a single run of short-read sequencing from a single individual of a non-model species. This is the first study providing transcriptomic information for species in the avian superfamily Sylvioidea, which comprises more than 1,000 species. Our data can be used to study adaptive divergence in heterogeneous environments and investigate other important ecological and evolutionary questions in parrotbills from different populations and even in other species in the Sylvioidea. PMID:22530590
2011-01-01
Background Reptiles are largely under-represented in comparative genomics despite the fact that they are substantially more diverse in many respects than mammals. Given the high divergence of reptiles from classical model species, next-generation sequencing of their transcriptomes is an approach of choice for gene identification and annotation. Results Here, we use 454 technology to sequence the brain transcriptome of four divergent reptilian and one reference avian species: the Nile crocodile, the corn snake, the bearded dragon, the red-eared turtle, and the chicken. Using an in-house pipeline for recursive similarity searches of >3,000,000 reads against multiple databases from 7 reference vertebrates, we compile a reptilian comparative transcriptomics dataset, with homology assignment for 20,000 to 31,000 transcripts per species and a cumulated non-redundant sequence length of 248.6 Mbases. Our approach identifies the majority (87%) of chicken brain transcripts and about 50% of de novo assembled reptilian transcripts. In addition to 57,502 microsatellite loci, we identify thousands of SNP and indel polymorphisms for population genetic and linkage analyses. We also build very large multiple alignments for Sauropsida and mammals (two million residues per species) and perform extensive phylogenetic analyses suggesting that turtles are not basal living reptiles but are rather associated with Archosaurians, hence, potentially answering a long-standing question in the phylogeny of Amniotes. Conclusions The reptilian transcriptome (freely available at http://www.reptilian-transcriptomes.org) should prove a useful new resource as reptiles are becoming important new models for comparative genomics, ecology, and evolutionary developmental genetics. PMID:21943375
2012-01-01
Background The central role of the somatotrophic axis in animal post-natal growth, development and fertility is well established. Therefore, the identification of genetic variants affecting quantitative traits within this axis is an attractive goal. However, large sample numbers are a pre-requisite for the identification of genetic variants underlying complex traits and although technologies are improving rapidly, high-throughput sequencing of large numbers of complete individual genomes remains prohibitively expensive. Therefore using a pooled DNA approach coupled with target enrichment and high-throughput sequencing, the aim of this study was to identify polymorphisms and estimate allele frequency differences across 83 candidate genes of the somatotrophic axis, in 150 Holstein-Friesian dairy bulls divided into two groups divergent for genetic merit for fertility. Results In total, 4,135 SNPs and 893 indels were identified during the resequencing of the 83 candidate genes. Nineteen percent (n = 952) of variants were located within 5' and 3' UTRs. Seventy-two percent (n = 3,612) were intronic and 9% (n = 464) were exonic, including 65 indels and 236 SNPs resulting in non-synonymous substitutions (NSS). Significant (P < 0.01) mean allele frequency differentials between the low and high fertility groups were observed for 720 SNPs (58 NSS). Allele frequencies for 43 of the SNPs were also determined by genotyping the 150 individual animals (Sequenom® MassARRAY). No significant differences (P > 0.1) were observed between the two methods for any of the 43 SNPs across both pools (i.e., 86 tests in total). Conclusions The results of the current study support previous findings of the use of DNA sample pooling and high-throughput sequencing as a viable strategy for polymorphism discovery and allele frequency estimation. Using this approach we have characterised the genetic variation within genes of the somatotrophic axis and related pathways, central to mammalian post-natal growth and development and subsequent lactogenesis and fertility. We have identified a large number of variants segregating at significantly different frequencies between cattle groups divergent for calving interval plausibly harbouring causative variants contributing to heritable variation. To our knowledge, this is the first report describing sequencing of targeted genomic regions in any livestock species using groups with divergent phenotypes for an economically important trait. PMID:22235840
Patterns of DNA barcode variation in Canadian marine molluscs.
Layton, Kara K S; Martel, André L; Hebert, Paul D N
2014-01-01
Molluscs are the most diverse marine phylum and this high diversity has resulted in considerable taxonomic problems. Because the number of species in Canadian oceans remains uncertain, there is a need to incorporate molecular methods into species identifications. A 648 base pair segment of the cytochrome c oxidase subunit I gene has proven useful for the identification and discovery of species in many animal lineages. While the utility of DNA barcoding in molluscs has been demonstrated in other studies, this is the first effort to construct a DNA barcode registry for marine molluscs across such a large geographic area. This study examines patterns of DNA barcode variation in 227 species of Canadian marine molluscs. Intraspecific sequence divergences ranged from 0-26.4% and a barcode gap existed for most taxa. Eleven cases of relatively deep (>2%) intraspecific divergence were detected, suggesting the possible presence of overlooked species. Structural variation was detected in COI with indels found in 37 species, mostly bivalves. Some indels were present in divergent lineages, primarily in the region of the first external loop, suggesting certain areas are hotspots for change. Lastly, mean GC content varied substantially among orders (24.5%-46.5%), and showed a significant positive correlation with nearest neighbour distances. DNA barcoding is an effective tool for the identification of Canadian marine molluscs and for revealing possible cases of overlooked species. Some species with deep intraspecific divergence showed a biogeographic partition between lineages on the Atlantic, Arctic and Pacific coasts, suggesting the role of Pleistocene glaciations in the subdivision of their populations. Indels were prevalent in the barcode region of the COI gene in bivalves and gastropods. This study highlights the efficacy of DNA barcoding for providing insights into sequence variation across a broad taxonomic group on a large geographic scale.
Kim, Young Bun; Oh, Jung Hun; McIver, Lauren J.; Rashkovetsky, Eugenia; Michalak, Katarzyna; Garner, Harold R.; Kang, Lin; Nevo, Eviatar; Korol, Abraham B.; Michalak, Pawel
2014-01-01
Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide enormous, albeit commonly underappreciated, evolutionary potential. We analyzed repeatomes of Drosophila melanogaster that have been diverging in response to a microclimate contrast in Evolution Canyon (Mount Carmel, Israel), a natural evolutionary laboratory with two abutting slopes at an average distance of only 200 m, which pose a constant ecological challenge to their local biotas. Flies inhabiting the colder and more humid north-facing slope carried about 6% more transposable elements than those from the hot and dry south-facing slope, in parallel to a suite of other genetic and phenotypic differences between the two populations. Nearly 50% of all mobile element insertions were slope unique, with many of them disrupting coding sequences of genes critical for cognition, olfaction, and thermotolerance, consistent with the observed patterns of thermotolerance differences and assortative mating. PMID:25006263
Kim, Young Bun; Oh, Jung Hun; McIver, Lauren J; Rashkovetsky, Eugenia; Michalak, Katarzyna; Garner, Harold R; Kang, Lin; Nevo, Eviatar; Korol, Abraham B; Michalak, Pawel
2014-07-22
Repeat sequences, especially mobile elements, make up large portions of most eukaryotic genomes and provide enormous, albeit commonly underappreciated, evolutionary potential. We analyzed repeatomes of Drosophila melanogaster that have been diverging in response to a microclimate contrast in Evolution Canyon (Mount Carmel, Israel), a natural evolutionary laboratory with two abutting slopes at an average distance of only 200 m, which pose a constant ecological challenge to their local biotas. Flies inhabiting the colder and more humid north-facing slope carried about 6% more transposable elements than those from the hot and dry south-facing slope, in parallel to a suite of other genetic and phenotypic differences between the two populations. Nearly 50% of all mobile element insertions were slope unique, with many of them disrupting coding sequences of genes critical for cognition, olfaction, and thermotolerance, consistent with the observed patterns of thermotolerance differences and assortative mating.
A Bayesian Sampler for Optimization of Protein Domain Hierarchies
2014-01-01
Abstract The process of identifying and modeling functionally divergent subgroups for a specific protein domain class and arranging these subgroups hierarchically has, thus far, largely been done via manual curation. How to accomplish this automatically and optimally is an unsolved statistical and algorithmic problem that is addressed here via Markov chain Monte Carlo sampling. Taking as input a (typically very large) multiple-sequence alignment, the sampler creates and optimizes a hierarchy by adding and deleting leaf nodes, by moving nodes and subtrees up and down the hierarchy, by inserting or deleting internal nodes, and by redefining the sequences and conserved patterns associated with each node. All such operations are based on a probability distribution that models the conserved and divergent patterns defining each subgroup. When we view these patterns as sequence determinants of protein function, each node or subtree in such a hierarchy corresponds to a subgroup of sequences with similar biological properties. The sampler can be applied either de novo or to an existing hierarchy. When applied to 60 protein domains from multiple starting points in this way, it converged on similar solutions with nearly identical log-likelihood ratio scores, suggesting that it typically finds the optimal peak in the posterior probability distribution. Similarities and differences between independently generated, nearly optimal hierarchies for a given domain help distinguish robust from statistically uncertain features. Thus, a future application of the sampler is to provide confidence measures for various features of a domain hierarchy. PMID:24494927
Foster, Charles S P; Henwood, Murray J; Ho, Simon Y W
2018-05-25
Data sets comprising small numbers of genetic markers are not always able to resolve phylogenetic relationships. This has frequently been the case in molecular systematic studies of plants, with many analyses being based on sequence data from only two or three chloroplast genes. An example of this comes from the riceflowers Pimelea Banks & Sol. ex Gaertn. (Thymelaeaceae), a large genus of flowering plants predominantly distributed in Australia. Despite the considerable morphological variation in the genus, low sequence divergence in chloroplast markers has led to the phylogeny of Pimelea remaining largely uncertain. In this study, we resolve the backbone of the phylogeny of Pimelea in comprehensive Bayesian and maximum-likelihood analyses of plastome sequences from 41 taxa. However, some relationships received only moderate to poor support, and the Pimelea clade contained extremely short internal branches. By using topology-clustering analyses, we demonstrate that conflicting phylogenetic signals can be found across the trees estimated from individual chloroplast protein-coding genes. A relaxed-clock dating analysis reveals that Pimelea arose in the mid-Miocene, with most divergences within the genus occurring during a subsequent rapid diversification. Our new phylogenetic estimate offers better resolution and is more strongly supported than previous estimates, providing a platform for future taxonomic revisions of both Pimelea and the broader subfamily. Our study has demonstrated the substantial improvements in phylogenetic resolution that can be achieved using plastome-scale data sets in plant molecular systematics. Copyright © 2018 Elsevier Inc. All rights reserved.
Tong, Ying; Zheng, Kang; Zhao, Shufang; Xiao, Guanxiu; Luo, Chen
2012-11-01
Recent studies demonstrated that sequence divergence in both transcriptional regulatory region and coding region contributes to the subfunctionalization of duplicate gene. However, whether sequence divergence in the 3'-untranslated region (3'-UTR) has an impact on the subfunctionalization of duplicate genes remains unclear. Here, we identified two diverging duplicate vsx1 (visual system homeobox-1) loci in goldfish, named vsx1A1 and vsx1A2. Phylogenetic analysis suggests that vsx1A1 and vsx1A2 may arise from a duplication of vsx1 after the separation of goldfish and zebrafish. Sequence comparison revealed that divergence in both transcriptional and translational regulatory regions is higher than divergence in the introns. vsx1A2 expresses during blastula and gastrula stages and in adult retina but silences from segmentation stage to hatching stage, vsx1A1 starts expression from segmentation onward. Comparing to that zebrafish vsx1 expresses in all the developmental stages and in the adult retina, it appears that goldfish vsx1A1 and vsx1A2 are under going to share the functions of ancestral vsx1. The different but overlapping temporal expression patterns of vsx1A1 and vsx1A2 suggest that sequence divergence in the promoter region of duplicate vsx1 is not sufficient for partitioning the functions of ancestral vsx1. By comparing vsx1A1 and vsx1A2 3'-UTR-linked green fluorescent protein gene expression patterns, we demonstrated that the 3'-UTR of vsx1A1 remains but the 3'-UTR of vsx1A2 has lost the capability of mediating bipolar cell specific expression during retina development. These results indicate that sequence divergence in the 3'-UTRs has a clear effect on subfunctionalization of the duplicate genes. © 2012 WILEY PERIODICALS, INC.
Sperm Bindin Divergence under Sexual Selection and Concerted Evolution in Sea Stars.
Patiño, Susana; Keever, Carson C; Sunday, Jennifer M; Popovic, Iva; Byrne, Maria; Hart, Michael W
2016-08-01
Selection associated with competition among males or sexual conflict between mates can create positive selection for high rates of molecular evolution of gamete recognition genes and lead to reproductive isolation between species. We analyzed coding sequence and repetitive domain variation in the gene encoding the sperm acrosomal protein bindin in 13 diverse sea star species. We found that bindin has a conserved coding sequence domain structure in all 13 species, with several repeated motifs in a large central region that is similar among all sea stars in organization but highly divergent among genera in nucleotide and predicted amino acid sequence. More bindin codons and lineages showed positive selection for high relative rates of amino acid substitution in genera with gonochoric outcrossing adults (and greater expected strength of sexual selection) than in selfing hermaphrodites. That difference is consistent with the expectation that selfing (a highly derived mating system) may moderate the strength of sexual selection and limit the accumulation of bindin amino acid differences. The results implicate both positive selection on single codons and concerted evolution within the repetitive region in bindin divergence, and suggest that both single amino acid differences and repeat differences may affect sperm-egg binding and reproductive compatibility. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Chromosome rearrangements via template switching between diverged repeated sequences
Anand, Ranjith P.; Tsaponina, Olga; Greenwell, Patricia W.; Lee, Cheng-Sheng; Du, Wei; Petes, Thomas D.
2014-01-01
Recent high-resolution genome analyses of cancer and other diseases have revealed the occurrence of microhomology-mediated chromosome rearrangements and copy number changes. Although some of these rearrangements appear to involve nonhomologous end-joining, many must have involved mechanisms requiring new DNA synthesis. Models such as microhomology-mediated break-induced replication (MM-BIR) have been invoked to explain these rearrangements. We examined BIR and template switching between highly diverged sequences in Saccharomyces cerevisiae, induced during repair of a site-specific double-strand break (DSB). Our data show that such template switches are robust mechanisms that give rise to complex rearrangements. Template switches between highly divergent sequences appear to be mechanistically distinct from the initial strand invasions that establish BIR. In particular, such jumps are less constrained by sequence divergence and exhibit a different pattern of microhomology junctions. BIR traversing repeated DNA sequences frequently results in complex translocations analogous to those seen in mammalian cells. These results suggest that template switching among repeated genes is a potent driver of genome instability and evolution. PMID:25367035
Thompson, Owen A.; Snoek, L. Basten; Nijveen, Harm; Sterken, Mark G.; Volkers, Rita J. M.; Brenchley, Rachel; van’t Hof, Arjen; Bevers, Roel P. J.; Cossins, Andrew R.; Yanai, Itai; Hajnal, Alex; Schmid, Tobias; Perkins, Jaryn D.; Spencer, David; Kruglyak, Leonid; Andersen, Erik C.; Moerman, Donald G.; Hillier, LaDeana W.; Kammenga, Jan E.; Waterston, Robert H.
2015-01-01
The Hawaiian strain (CB4856) of Caenorhabditis elegans is one of the most divergent from the canonical laboratory strain N2 and has been widely used in developmental, population, and evolutionary studies. To enhance the utility of the strain, we have generated a draft sequence of the CB4856 genome, exploiting a variety of resources and strategies. When compared against the N2 reference, the CB4856 genome has 327,050 single nucleotide variants (SNVs) and 79,529 insertion–deletion events that result in a total of 3.3 Mb of N2 sequence missing from CB4856 and 1.4 Mb of sequence present in CB4856 but not present in N2. As previously reported, the density of SNVs varies along the chromosomes, with the arms of chromosomes showing greater average variation than the centers. In addition, we find 61 regions totaling 2.8 Mb, distributed across all six chromosomes, which have a greatly elevated SNV density, ranging from 2 to 16% SNVs. A survey of other wild isolates show that the two alternative haplotypes for each region are widely distributed, suggesting they have been maintained by balancing selection over long evolutionary times. These divergent regions contain an abundance of genes from large rapidly evolving families encoding F-box, MATH, BATH, seven-transmembrane G-coupled receptors, and nuclear hormone receptors, suggesting that they provide selective advantages in natural environments. The draft sequence makes available a comprehensive catalog of sequence differences between the CB4856 and N2 strains that will facilitate the molecular dissection of their phenotypic differences. Our work also emphasizes the importance of going beyond simple alignment of reads to a reference genome when assessing differences between genomes. PMID:25995208
López-Alvarez, Diana; López-Herranz, Maria Luisa; Betekhtin, Alexander; Catalán, Pilar
2012-01-01
Background Brachypodium distachyon s. l. has been widely investigated across the world as a model plant for temperate cereals and biofuel grasses. However, this annual plant shows three cytotypes that have been recently recognized as three independent species, the diploids B. distachyon (2n = 10) and B. stacei (2n = 20) and their derived allotetraploid B. hybridum (2n = 30). Methodology/Principal Findings We propose a DNA barcoding approach that consists of a rapid, accurate and automatable species identification method using the standard DNA sequences of complementary plastid (trnLF) and nuclear (ITS, GI) loci. The highly homogenous but largely divergent B. distachyon and B. stacei diploids could be easily distinguished (100% identification success) using direct trnLF (2.4%), ITS (5.5%) or GI (3.8%) sequence divergence. By contrast, B. hybridum could only be unambiguously identified through the use of combined trnLF+ITS sequences (90% of identification success) or by cloned GI sequences (96.7%) that showed 5.4% (ITS) and 4% (GI) rate divergence between the two parental sequences found in the allopolyploid. Conclusion/Significance Our data provide an unbiased and effective barcode to differentiate these three closely-related species from one another. This procedure overcomes the taxonomic uncertainty generated from methods based on morphology or flow cytometry identifications that have resulted in some misclassifications of the model plant and its allies. Our study also demonstrates that the allotetraploid B. hybridum has resulted from bi-directional crosses of B. distachyon and B. stacei plants acting either as maternal or paternal parents. PMID:23240000
2010-01-01
Background Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly". Conclusions AlignMiner can be used to reliably detect divergent regions via several scoring methods that provide different levels of selectivity. Its predictions have been verified by experimental means. Hence, it is expected that its usage will save researchers' time and ensure an objective selection of the best-possible divergent region when closely related sequences are analysed. AlignMiner is freely available at http://www.scbi.uma.es/alignminer. PMID:20525162
Spielmann, A; Stutz, E
1983-01-01
The soybean chloroplast psb A gene (photosystem II thylakoid membrane protein of Mr 32 000, lysine-free) and the trn H gene (tRNAHisGUG), which both map in the large single copy region adjacent to one of the inverted repeat structures (IR1), have been sequenced including flanking regions. The psb A gene shows in its structural part 92% sequence homology with the corresponding genes of spinach and N. debneyi and contains also an open reading frame for 353 aminoacids. The aminoacid sequence of a potential primary translation product (calculated Mr, 38 904, no lysine) diverges from that of spinach and N. debneyi in only two positions in the C-terminal part. The trn H gene has the same polarity as the psb A gene and the coding region is located at the very end of the large single copy region. The deduced sequence of the soybean chloroplast tRNAHisGUG is identical with that of Zea mays chloroplasts. Both ends of the large single copy region were sequenced including a small segment of the adjacent IR1 and IR2. PMID:6314279
Divergent copies of the large inverted repeat in the chloroplast genomes of ulvophycean green algae.
Turmel, Monique; Otis, Christian; Lemieux, Claude
2017-04-20
The chloroplast genomes of many algae and almost all land plants carry two identical copies of a large inverted repeat (IR) sequence that can pair for flip-flop recombination and undergo expansion/contraction. Although the IR has been lost multiple times during the evolution of the green algae, the underlying mechanisms are still largely unknown. A recent comparison of IR-lacking and IR-containing chloroplast genomes of chlorophytes from the Ulvophyceae (Ulotrichales) suggested that differential elimination of genes from the IR copies might lead to IR loss. To gain deeper insights into the evolutionary history of the chloroplast genome in the Ulvophyceae, we analyzed the genomes of Ignatius tetrasporus and Pseudocharacium americanum (Ignatiales, an order not previously sampled), Dangemannia microcystis (Oltmannsiellopsidales), Pseudoneochloris marina (Ulvales) and also Chamaetrichon capsulatum and Trichosarcina mucosa (Ulotrichales). Our comparison of these six chloroplast genomes with those previously reported for nine ulvophyceans revealed unsuspected variability. All newly examined genomes feature an IR, but remarkably, the copies of the IR present in the Ignatiales, Pseudoneochloris, and Chamaetrichon diverge in sequence, with the tRNA genes from the rRNA operon missing in one IR copy. The implications of this unprecedented finding for the mechanism of IR loss and flip-flop recombination are discussed.
Bodewes, R; Kik, M J L; Raj, V Stalin; Schapendonk, C M E; Haagmans, B L; Smits, S L; Osterhaus, A D M E
2013-06-01
Arenaviruses are bi-segmented negative-stranded RNA viruses, which were until recently only detected in rodents and humans. Now highly divergent arenaviruses have been identified in boid snakes with inclusion body disease (IBD). Here, we describe the identification of a new species and variants of the highly divergent arenaviruses, which were detected in tissues of captive boid snakes with IBD in The Netherlands by next-generation sequencing. Phylogenetic analysis of the complete sequence of the open reading frames of the four predicted proteins of one of the detected viruses revealed that this virus was most closely related to the recently identified Golden Gate virus, while considerable sequence differences were observed between the highly divergent arenaviruses detected in this study. These findings add to the recent identification of the highly divergent arenaviruses in boid snakes with IBD in the United States and indicate that these viruses also circulate among boid snakes in Europe.
Gómez, Fernando; Moreira, David; López-García, Purificación
2012-01-01
Dinophysoid dinoflagellates are usually considered a large monophyletic group. Large subunit and small subunit (SSU) rDNA phylogenies suggest a basal position for Amphisoleniaceae (Amphisolenia,Triposolenia) with respect to two sister groups, one containing most Phalacroma species plus Oxyphysis and the other Dinophysis,Ornithocercus, Dinophysoid dinoflagellates are usually considered a large monophyletic group. Large subunit and small subunit (SSU) rDNA phylogenies suggest a basal position for Amphisoleniaceae (Amphisolenia,Triposolenia) with respect to two sister groups, one containing most Phalacroma species plus Oxyphysis and the other Dinophysis,Ornithocercus, Histioneis,Citharistes and some Phalacroma species. We provide here new SSU rDNA sequences of Pseudophalacroma (pelagic) and Sinophysis (the only benthic dinophysoid genus). Molecular phylogenies support that they are very divergent with respect to the main clade of Dinophysales. Additional molecular markers of these two key genera are needed to elucidate the evolutionary relations among the dinophysoid dinoflagellates. Histioneis,Citharistes and some Phalacroma species. We provide here new SSU rDNA sequences of Pseudophalacroma (pelagic) and Sinophysis (the only benthic dinophysoid genus). Molecular phylogenies support that they are very divergent with respect to the main clade of Dinophysales. Additional molecular markers of these two key genera are needed to elucidate the evolutionary relations among the dinophysoid dinoflagellates. © 2011 The Author(s) Journal of Eukaryotic Microbiology © 2011 International Society of Protistologists.
Century-scale Methylome Stability in a Recently Diverged Arabidopsis thaliana Lineage
Müller, Jonas; Stegle, Oliver; Meyer, Rhonda C.; Wang, George; Schneeberger, Korbinian; Fitz, Joffrey; Altmann, Thomas; Bergelson, Joy; Borgwardt, Karsten; Weigel, Detlef
2015-01-01
There has been much excitement about the possibility that exposure to specific environments can induce an ecological memory in the form of whole-sale, genome-wide epigenetic changes that are maintained over many generations. In the model plant Arabidopsis thaliana, numerous heritable DNA methylation differences have been identified in greenhouse-grown isogenic lines, but it remains unknown how natural, highly variable environments affect the rate and spectrum of such changes. Here we present detailed methylome analyses in a geographically dispersed A. thaliana population that constitutes a collection of near-isogenic lines, diverged for at least a century from a common ancestor. Methylome variation largely reflected genetic distance, and was in many aspects similar to that of lines raised in uniform conditions. Thus, even when plants are grown in varying and diverse natural sites, genome-wide epigenetic variation accumulates mostly in a clock-like manner, and epigenetic divergence thus parallels the pattern of genome-wide DNA sequence divergence. PMID:25569172
Choi, Young-Joon; Thines, Marco
2015-01-01
Even though the microevolution of plant hosts and pathogens has been intensely studied, knowledge regarding macro-evolutionary patterns is limited. Having the highest species diversity and host-specificity among Oomycetes, downy mildews are a useful a model for investigating long-term host-pathogen coevolution. We show that phylogenies of Bremia and Asteraceae are significantly congruent. The accepted hypothesis is that pathogens have diverged contemporarily with their hosts. But maximum clade age estimation and sequence divergence comparison reveal that congruence is not due to long-term coevolution but rather due to host-shift driven speciation (pseudo-cospeciation). This pattern results from parasite radiation in related hosts, long after radiation and speciation of the hosts. As large host shifts free pathogens from hosts with effector triggered immunity subsequent radiation and diversification in related hosts with similar innate immunity may follow, resulting in a pattern mimicking true co-divergence, which is probably limited to the terminal nodes in many pathogen groups.
Choi, Young-Joon; Thines, Marco
2015-01-01
Even though the microevolution of plant hosts and pathogens has been intensely studied, knowledge regarding macro-evolutionary patterns is limited. Having the highest species diversity and host-specificity among Oomycetes, downy mildews are a useful a model for investigating long-term host-pathogen coevolution. We show that phylogenies of Bremia and Asteraceae are significantly congruent. The accepted hypothesis is that pathogens have diverged contemporarily with their hosts. But maximum clade age estimation and sequence divergence comparison reveal that congruence is not due to long-term coevolution but rather due to host-shift driven speciation (pseudo-cospeciation). This pattern results from parasite radiation in related hosts, long after radiation and speciation of the hosts. As large host shifts free pathogens from hosts with effector triggered immunity subsequent radiation and diversification in related hosts with similar innate immunity may follow, resulting in a pattern mimicking true co-divergence, which is probably limited to the terminal nodes in many pathogen groups. PMID:26230508
Pryer, Kathleen M; Schuettpelz, Eric; Wolf, Paul G; Schneider, Harald; Smith, Alan R; Cranfill, Raymond
2004-10-01
The phylogenetic structure of ferns (= monilophytes) is explored here, with a special focus on the early divergences among leptosporangiate lineages. Despite considerable progress in our understanding of fern relationships, a rigorous and comprehensive analysis of the early leptosporangiate divergences was lacking. Therefore, a data set was designed here to include critical taxa that were not included in earlier studies. More than 5000 bp from the plastid (rbcL, atpB, rps4) and the nuclear (18S rDNA) genomes were sequenced for 62 taxa. Phylogenetic analyses of these data (1) confirm that Osmundaceae are sister to the rest of the leptosporangiates, (2) resolve a diverse set of ferns formerly thought to be a subsequent grade as possibly monophyletic (((Dipteridaceae, Matoniaceae), Gleicheniaceae), Hymenophyllaceae), and (3) place schizaeoid ferns as sister to a large clade of "core leptosporangiates" that includes heterosporous ferns, tree ferns, and polypods. Divergence time estimates for ferns are reported from penalized likelihood analyses of our molecular data, with constraints from a reassessment of the fossil record.
Laughter and the Management of Divergent Positions in Peer Review Interactions
Raclaw, Joshua; Ford, Cecilia E.
2017-01-01
In this paper we focus on how participants in peer review interactions use laughter as a resource as they publicly report divergence of evaluative positions, divergence that is typical in the give and take of joint grant evaluation. Using the framework of conversation analysis, we examine the infusion of laughter and multimodal laugh-relevant practices into sequences of talk in meetings of grant reviewers deliberating on the evaluation and scoring of high-level scientific grant applications. We focus on a recurrent sequence in these meetings, what we call the score-reporting sequence, in which the assigned reviewers first announce the preliminary scores they have assigned to the grant. We demonstrate that such sequences are routine sites for the use of laugh practices to navigate the initial moments in which divergence of opinion is made explicit. In the context of meetings convened for the purposes of peer review, laughter thus serves as a valuable resource for managing the socially delicate but institutionally required reporting of divergence and disagreement that is endemic to meetings where these types of evaluative tasks are a focal activity. PMID:29170594
Chloroplast DNA Structural Variation, Phylogeny, and Age of Divergence among Diploid Cotton Species.
Chen, Zhiwen; Feng, Kun; Grover, Corrinne E; Li, Pengbo; Liu, Fang; Wang, Yumei; Xu, Qin; Shang, Mingzhao; Zhou, Zhongli; Cai, Xiaoyan; Wang, Xingxing; Wendel, Jonathan F; Wang, Kunbo; Hua, Jinping
2016-01-01
The cotton genus (Gossypium spp.) contains 8 monophyletic diploid genome groups (A, B, C, D, E, F, G, K) and a single allotetraploid clade (AD). To gain insight into the phylogeny of Gossypium and molecular evolution of the chloroplast genome in this group, we performed a comparative analysis of 19 Gossypium chloroplast genomes, six reported here for the first time. Nucleotide distance in non-coding regions was about three times that of coding regions. As expected, distances were smaller within than among genome groups. Phylogenetic topologies based on nucleotide and indel data support for the resolution of the 8 genome groups into 6 clades. Phylogenetic analysis of indel distribution among the 19 genomes demonstrates contrasting evolutionary dynamics in different clades, with a parallel genome downsizing in two genome groups and a biased accumulation of insertions in the clade containing the cultivated cottons leading to large (for Gossypium) chloroplast genomes. Divergence time estimates derived from the cpDNA sequence suggest that the major diploid clades had diverged approximately 10 to 11 million years ago. The complete nucleotide sequences of 6 cpDNA genomes are provided, offering a resource for cytonuclear studies in Gossypium.
Chloroplast DNA Structural Variation, Phylogeny, and Age of Divergence among Diploid Cotton Species
Li, Pengbo; Liu, Fang; Wang, Yumei; Xu, Qin; Shang, Mingzhao; Zhou, Zhongli; Cai, Xiaoyan; Wang, Xingxing; Wendel, Jonathan F.; Wang, Kunbo
2016-01-01
The cotton genus (Gossypium spp.) contains 8 monophyletic diploid genome groups (A, B, C, D, E, F, G, K) and a single allotetraploid clade (AD). To gain insight into the phylogeny of Gossypium and molecular evolution of the chloroplast genome in this group, we performed a comparative analysis of 19 Gossypium chloroplast genomes, six reported here for the first time. Nucleotide distance in non-coding regions was about three times that of coding regions. As expected, distances were smaller within than among genome groups. Phylogenetic topologies based on nucleotide and indel data support for the resolution of the 8 genome groups into 6 clades. Phylogenetic analysis of indel distribution among the 19 genomes demonstrates contrasting evolutionary dynamics in different clades, with a parallel genome downsizing in two genome groups and a biased accumulation of insertions in the clade containing the cultivated cottons leading to large (for Gossypium) chloroplast genomes. Divergence time estimates derived from the cpDNA sequence suggest that the major diploid clades had diverged approximately 10 to 11 million years ago. The complete nucleotide sequences of 6 cpDNA genomes are provided, offering a resource for cytonuclear studies in Gossypium. PMID:27309527
DNA barcodes for 1/1000 of the animal kingdom.
Hebert, Paul D N; Dewaard, Jeremy R; Landry, Jean-François
2010-06-23
This study reports DNA barcodes for more than 1300 Lepidoptera species from the eastern half of North America, establishing that 99.3 per cent of these species possess diagnostic barcode sequences. Intraspecific divergences averaged just 0.43 per cent among this assemblage, but most values were lower. The mean was elevated by deep barcode divergences (greater than 2%) in 5.1 per cent of the species, often involving the sympatric occurrence of two barcode clusters. A few of these cases have been analysed in detail, revealing species overlooked by the current taxonomic system. This study also provided a large-scale test of the extent of regional divergence in barcode sequences, indicating that geographical differentiation in the Lepidoptera of eastern North America is small, even when comparisons involve populations as much as 2800 km apart. The present results affirm that a highly effective system for the identification of Lepidoptera in this region can be built with few records per species because of the limited intra-specific variation. As most terrestrial and marine taxa are likely to possess a similar pattern of population structure, an effective DNA-based identification system can be developed with modest effort.
Martínez-Castilla, León Patricio; Alvarez-Buylla, Elena R.
2003-01-01
Gene duplication is a substrate of evolution. However, the relative importance of positive selection versus relaxation of constraints in the functional divergence of gene copies is still under debate. Plant MADS-box genes encode transcriptional regulators key in various aspects of development and have undergone extensive duplications to form a large family. We recovered 104 MADS sequences from the Arabidopsis genome. Bayesian phylogenetic trees recover type II lineage as a monophyletic group and resolve a branching sequence of monophyletic groups within this lineage. The type I lineage is comprised of several divergent groups. However, contrasting gene structure and patterns of chromosomal distribution between type I and II sequences suggest that they had different evolutionary histories and support the placement of the root of the gene family between these two groups. Site-specific and site-branch analyses of positive Darwinian selection (PDS) suggest that different selection regimes could have affected the evolution of these lineages. We found evidence for PDS along the branch leading to flowering time genes that have a direct impact on plant fitness. Sites with high probabilities of having been under PDS were found in the MADS and K domains, suggesting that these played important roles in the acquisition of novel functions during MADS-box diversification. Detected sites are targets for further experimental analyses. We argue that adaptive changes in MADS-domain protein sequences have been important for their functional divergence, suggesting that changes within coding regions of transcriptional regulators have influenced phenotypic evolution of plants. PMID:14597714
Fossils matter: improved estimates of divergence times in Pinus reveal older diversification.
Saladin, Bianca; Leslie, Andrew B; Wüest, Rafael O; Litsios, Glenn; Conti, Elena; Salamin, Nicolas; Zimmermann, Niklaus E
2017-04-04
The taxonomy of pines (genus Pinus) is widely accepted and a robust gene tree based on entire plastome sequences exists. However, there is a large discrepancy in estimated divergence times of major pine clades among existing studies, mainly due to differences in fossil placement and dating methods used. We currently lack a dated molecular phylogeny that makes use of the rich pine fossil record, and this study is the first to estimate the divergence dates of pines based on a large number of fossils (21) evenly distributed across all major clades, in combination with applying both node and tip dating methods. We present a range of molecular phylogenetic trees of Pinus generated within a Bayesian framework. We find the origin of crown Pinus is likely up to 30 Myr older (Early Cretaceous) than inferred in most previous studies (Late Cretaceous) and propose generally older divergence times for major clades within Pinus than previously thought. Our age estimates vary significantly between the different dating approaches, but the results generally agree on older divergence times. We present a revised list of 21 fossils that are suitable to use in dating or comparative analyses of pines. Reliable estimates of divergence times in pines are essential if we are to link diversification processes and functional adaptation of this genus to geological events or to changing climates. In addition to older divergence times in Pinus, our results also indicate that node age estimates in pines depend on dating approaches and the specific fossil sets used, reflecting inherent differences in various dating approaches. The sets of dated phylogenetic trees of pines presented here provide a way to account for uncertainties in age estimations when applying comparative phylogenetic methods.
Troggio, Michela; Šurbanovski, Nada; Bianco, Luca; Moretto, Marco; Giongo, Lara; Banchi, Elisa; Viola, Roberto; Fernández, Felicdad Fernández; Costa, Fabrizio; Velasco, Riccardo; Cestaro, Alessandro; Sargent, Daniel James
2013-01-01
High throughput arrays for the simultaneous genotyping of thousands of single-nucleotide polymorphisms (SNPs) have made the rapid genetic characterisation of plant genomes and the development of saturated linkage maps a realistic prospect for many plant species of agronomic importance. However, the correct calling of SNP genotypes in divergent polyploid genomes using array technology can be problematic due to paralogy, and to divergence in probe sequences causing changes in probe binding efficiencies. An Illumina Infinium II whole-genome genotyping array was recently developed for the cultivated apple and used to develop a molecular linkage map for an apple rootstock progeny (M432), but a large proportion of segregating SNPs were not mapped in the progeny, due to unexpected genotype clustering patterns. To investigate the causes of this unexpected clustering we performed BLAST analysis of all probe sequences against the ‘Golden Delicious’ genome sequence and discovered evidence for paralogous annealing sites and probe sequence divergence for a high proportion of probes contained on the array. Following visual re-evaluation of the genotyping data generated for 8,788 SNPs for the M432 progeny using the array, we manually re-scored genotypes at 818 loci and mapped a further 797 markers to the M432 linkage map. The newly mapped markers included the majority of those that could not be mapped previously, as well as loci that were previously scored as monomorphic, but which segregated due to divergence leading to heterozygosity in probe annealing sites. An evaluation of the 8,788 probes in a diverse collection of Malus germplasm showed that more than half the probes returned genotype clustering patterns that were difficult or impossible to interpret reliably, highlighting implications for the use of the array in genome-wide association studies. PMID:23826289
Estimating Divergence Parameters With Small Samples From a Large Number of Loci
Wang, Yong; Hey, Jody
2010-01-01
Most methods for studying divergence with gene flow rely upon data from many individuals at few loci. Such data can be useful for inferring recent population history but they are unlikely to contain sufficient information about older events. However, the growing availability of genome sequences suggests a different kind of sampling scheme, one that may be more suited to studying relatively ancient divergence. Data sets extracted from whole-genome alignments may represent very few individuals but contain a very large number of loci. To take advantage of such data we developed a new maximum-likelihood method for genomic data under the isolation-with-migration model. Unlike many coalescent-based likelihood methods, our method does not rely on Monte Carlo sampling of genealogies, but rather provides a precise calculation of the likelihood by numerical integration over all genealogies. We demonstrate that the method works well on simulated data sets. We also consider two models for accommodating mutation rate variation among loci and find that the model that treats mutation rates as random variables leads to better estimates. We applied the method to the divergence of Drosophila melanogaster and D. simulans and detected a low, but statistically significant, signal of gene flow from D. simulans to D. melanogaster. PMID:19917765
Nadachowska-Brzyska, Krystyna; Burri, Reto; Olason, Pall I.; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans
2013-01-01
Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000–80,000) and census sizes (5–50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to unravel tangled demographic histories. Moreover, it constitutes one of the first examples of the inference of divergence history from genome-wide data in non-model species. PMID:24244198
Nadachowska-Brzyska, Krystyna; Burri, Reto; Olason, Pall I; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans
2013-11-01
Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000-80,000) and census sizes (5-50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to unravel tangled demographic histories. Moreover, it constitutes one of the first examples of the inference of divergence history from genome-wide data in non-model species.
Slatyer, Rachel A; Nash, Michael A; Miller, Adam D; Endo, Yoshinori; Umbers, Kate D L; Hoffmann, Ary A
2014-10-02
Mountain landscapes are topographically complex, creating discontinuous 'islands' of alpine and sub-alpine habitat with a dynamic history. Changing climatic conditions drive their expansion and contraction, leaving signatures on the genetic structure of their flora and fauna. Australia's high country covers a small, highly fragmented area. Although the area is thought to have experienced periods of relative continuity during Pleistocene glacial periods, small-scale studies suggest deep lineage divergence across low-elevation gaps. Using both DNA sequence data and microsatellite markers, we tested the hypothesis that genetic partitioning reflects observable geographic structuring across Australia's mainland high country, in the widespread alpine grasshopper Kosciuscola tristis (Sjösted). We found broadly congruent patterns of regional structure between the DNA sequence and microsatellite datasets, corresponding to strong divergence among isolated mountain regions. Small and isolated mountains in the south of the range were particularly distinct, with well-supported divergence corresponding to climate cycles during the late Pliocene and Pleistocene. We found mixed support, however, for divergence among other mountain regions. Interestingly, within areas of largely contiguous alpine and sub-alpine habitat around Mt Kosciuszko, microsatellite data suggested significant population structure, accompanied by a strong signature of isolation-by-distance. Consistent patterns of strong lineage divergence among different molecular datasets indicate genetic breaks between populations inhabiting geographically distinct mountain regions. Three primary phylogeographic groups were evident in the highly fragmented Victorian high country, while within-region structure detected with microsatellites may reflect more recent population isolation. Despite the small area of Australia's alpine and sub-alpine habitats, their low topographic relief and lack of extensive glaciation, divergence among populations was on the same scale as that detected in much more extensive Northern hemisphere mountain systems. The processes driving divergence in the Australian mountains might therefore differ from their Northern hemisphere counterparts.
Analysis of the Macaca mulatta transcriptome and the sequence divergence between Macaca and human.
Magness, Charles L; Fellin, P Campion; Thomas, Matthew J; Korth, Marcus J; Agy, Michael B; Proll, Sean C; Fitzgibbon, Matthew; Scherer, Christina A; Miner, Douglas G; Katze, Michael G; Iadonato, Shawn P
2005-01-01
We report the initial sequencing and comparative analysis of the Macaca mulatta transcriptome. Cloned sequences from 11 tissues, nine animals, and three species (M. mulatta, M. fascicularis, and M. nemestrina) were sampled, resulting in the generation of 48,642 sequence reads. These data represent an initial sampling of the putative rhesus orthologs for 6,216 human genes. Mean nucleotide diversity within M. mulatta and sequence divergence among M. fascicularis, M. nemestrina, and M. mulatta are also reported.
Quantiprot - a Python package for quantitative analysis of protein sequences.
Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold
2017-07-17
The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.
Middleton, Christopher P.; Senerchia, Natacha; Stein, Nils; Akhunov, Eduard D.; Keller, Beat
2014-01-01
Using Roche/454 technology, we sequenced the chloroplast genomes of 12 Triticeae species, including bread wheat, barley and rye, as well as the diploid progenitors and relatives of bread wheat Triticum urartu, Aegilops speltoides and Ae. tauschii. Two wild tetraploid taxa, Ae. cylindrica and Ae. geniculata, were also included. Additionally, we incorporated wild Einkorn wheat Triticum boeoticum and its domesticated form T. monococcum and two Hordeum spontaneum (wild barley) genotypes. Chloroplast genomes were used for overall sequence comparison, phylogenetic analysis and dating of divergence times. We estimate that barley diverged from rye and wheat approximately 8–9 million years ago (MYA). The genome donors of hexaploid wheat diverged between 2.1–2.9 MYA, while rye diverged from Triticum aestivum approximately 3–4 MYA, more recently than previously estimated. Interestingly, the A genome taxa T. boeoticum and T. urartu were estimated to have diverged approximately 570,000 years ago. As these two have a reproductive barrier, the divergence time estimate also provides an upper limit for the time required for the formation of a species boundary between the two. Furthermore, we conclusively show that the chloroplast genome of hexaploid wheat was contributed by the B genome donor and that this unknown species diverged from Ae. speltoides about 980,000 years ago. Additionally, sequence alignments identified a translocation of a chloroplast segment to the nuclear genome which is specific to the rye/wheat lineage. We propose the presented phylogeny and divergence time estimates as a reference framework for future studies on Triticeae. PMID:24614886
Determining divergence times with a protein clock: update and reevaluation
NASA Technical Reports Server (NTRS)
Feng, D. F.; Cho, G.; Doolittle, R. F.; Bada, J. L. (Principal Investigator)
1997-01-01
A recent study of the divergence times of the major groups of organisms as gauged by amino acid sequence comparison has been expanded and the data have been reanalyzed with a distance measure that corrects for both constraints on amino acid interchange and variation in substitution rate at different sites. Beyond that, the availability of complete genome sequences for several eubacteria and an archaebacterium has had a great impact on the interpretation of certain aspects of the data. Thus, the majority of the archaebacterial sequences are not consistent with currently accepted views of the Tree of Life which cluster the archaebacteria with eukaryotes. Instead, they are either outliers or mixed in with eubacterial orthologs. The simplest resolution of the problem is to postulate that many of these sequences were carried into eukaryotes by early eubacterial endosymbionts about 2 billion years ago, only very shortly after or even coincident with the divergence of eukaryotes and archaebacteria. The strong resemblances of these same enzymes among the major eubacterial groups suggest that the cyanobacteria and Gram-positive and Gram-negative eubacteria also diverged at about this same time, whereas the much greater differences between archaebacterial and eubacterial sequences indicate these two groups may have diverged between 3 and 4 billion years ago.
Neuwald, Andrew F
2009-08-01
The patterns of sequence similarity and divergence present within functionally diverse, evolutionarily related proteins contain implicit information about corresponding biochemical similarities and differences. A first step toward accessing such information is to statistically analyze these patterns, which, in turn, requires that one first identify and accurately align a very large set of protein sequences. Ideally, the set should include many distantly related, functionally divergent subgroups. Because it is extremely difficult, if not impossible for fully automated methods to align such sequences correctly, researchers often resort to manual curation based on detailed structural and biochemical information. However, multiply-aligning vast numbers of sequences in this way is clearly impractical. This problem is addressed using Multiply-Aligned Profiles for Global Alignment of Protein Sequences (MAPGAPS). The MAPGAPS program uses a set of multiply-aligned profiles both as a query to detect and classify related sequences and as a template to multiply-align the sequences. It relies on Karlin-Altschul statistics for sensitivity and on PSI-BLAST (and other) heuristics for speed. Using as input a carefully curated multiple-profile alignment for P-loop GTPases, MAPGAPS correctly aligned weakly conserved sequence motifs within 33 distantly related GTPases of known structure. By comparison, the sequence- and structurally based alignment methods hmmalign and PROMALS3D misaligned at least 11 and 23 of these regions, respectively. When applied to a dataset of 65 million protein sequences, MAPGAPS identified, classified and aligned (with comparable accuracy) nearly half a million putative P-loop GTPase sequences. A C++ implementation of MAPGAPS is available at http://mapgaps.igs.umaryland.edu. Supplementary data are available at Bioinformatics online.
Ned B. Klopfenstein; John W. Hanna; Amy L. Ross-Davis; Jane E. Stewart; Yuko Ota; Rosario Medel-Ortiz; Miguel Armando Lopez-Ramirez; Ruben Damian Elias-Roman; Dionicio Alvarado-Rosales; Mee-Sook Kim
2013-01-01
Armillaria plays diverse ecological roles in forests worldwide, which has inspired interest in understanding phylogenetic relationships within and among species of this genus. Previous rDNA sequence-based phylogenetic analyses of Armillaria have shown general relationships among widely divergent taxa, but rDNA sequences were not reliable for separating closely related...
Longo, Mark S; Carone, Dawn M; Green, Eric D; O'Neill, Michael J; O'Neill, Rachel J
2009-01-01
Background Large-scale genome rearrangements brought about by chromosome breaks underlie numerous inherited diseases, initiate or promote many cancers and are also associated with karyotype diversification during species evolution. Recent research has shown that these breakpoints are nonrandomly distributed throughout the mammalian genome and many, termed "evolutionary breakpoints" (EB), are specific genomic locations that are "reused" during karyotypic evolution. When the phylogenetic trajectory of orthologous chromosome segments is considered, many of these EB are coincident with ancient centromere activity as well as new centromere formation. While EB have been characterized as repeat-rich regions, it has not been determined whether specific sequences have been retained during evolution that would indicate previous centromere activity or a propensity for new centromere formation. Likewise, the conservation of specific sequence motifs or classes at EBs among divergent mammalian taxa has not been determined. Results To define conserved sequence features of EBs associated with centromere evolution, we performed comparative sequence analysis of more than 4.8 Mb within the tammar wallaby, Macropus eugenii, derived from centromeric regions (CEN), euchromatic regions (EU), and an evolutionary breakpoint (EB) that has undergone convergent breakpoint reuse and past centromere activity in marsupials. We found a dramatic enrichment for long interspersed nucleotide elements (LINE1s) and endogenous retroviruses (ERVs) and a depletion of short interspersed nucleotide elements (SINEs) shared between CEN and EBs. We analyzed the orthologous human EB (14q32.33), known to be associated with translocations in many cancers including multiple myelomas and plasma cell leukemias, and found a conserved distribution of similar repetitive elements. Conclusion Our data indicate that EBs tracked within the class Mammalia harbor sequence features retained since the divergence of marsupials and eutherians that may have predisposed these genomic regions to large-scale chromosomal instability. PMID:19630942
Joseph, Sneha; Poriya, Paresh; Kundu, Rahul
2016-11-01
The present study reports the phylogenetic relationship of six zoanthid species belonging to three genera, Isaurus, Palythoa, and Zoanthus identified using systematic computational analysis of mtDNA gene sequences. All six species are first recorded from the coasts of Kathiawar Peninsula, India. Genus: Isaurus is represented by Isaurus tuberculatus, genus Zoanthus is represented by Zoanthus kuroshio and Zoanthus sansibaricus, while genus Palythoa is represented by Palythoa tuberculosa, P. sp. JVK-2006 and Palythoa heliodiscus. Results of the present study revealed that among the various species observed along the coastline, a minimum of 99% sequence divergence and a maximum of 96% sequence divergence were seen. An interspecific divergence of 1-4% and negligible intraspecific divergence was observed. These results not only highlighted the efficiency of the COI gene region in species identification but also demonstrated the genetic variability of zoanthids along the Saurashtra coastline of the west coast of India.
Starrett, James; Hedin, Marshal; Ayoub, Nadia; Hayashi, Cheryl Y
2013-07-25
Hemocyanins are multimeric copper-containing hemolymph proteins involved in oxygen binding and transport in all major arthropod lineages. Most arachnids have seven primary subunits (encoded by paralogous genes a-g), which combine to form a 24-mer (4×6) quaternary structure. Within some spider lineages, however, hemocyanin evolution has been a dynamic process with extensive paralog duplication and loss. We have obtained hemocyanin gene sequences from numerous representatives of the spider infraorders Mygalomorphae and Araneomorphae in order to infer the evolution of the hemocyanin gene family and estimate spider relationships using these conserved loci. Our hemocyanin gene tree is largely consistent with the previous hypotheses of paralog relationships based on immunological studies, but reveals some discrepancies in which paralog types have been lost or duplicated in specific spider lineages. Analyses of concatenated hemocyanin sequences resolved deep nodes in the spider phylogeny and recovered a number of clades that are supported by other molecular studies, particularly for mygalomorph taxa. The concatenated data set is also used to estimate dates of higher-level spider divergences and suggests that the diversification of extant mygalomorphs preceded that of extant araneomorphs. Spiders are diverse in behavior and respiratory morphology, and our results are beneficial for comparative analyses of spider respiration. Lastly, the conserved hemocyanin sequences allow for the inference of spider relationships and ancient divergence dates. Copyright © 2013 Elsevier B.V. All rights reserved.
Berends Sexton, T; Jones, J T; Mullet, J E
1990-05-01
A 6.25 kbp barley plastid DNA region located between psbA and psbD-psbC were sequenced and RNAs produced from this DNA were analyzed. TrnK(UUU), rps16 and trnQ(UUG) were located upstream of psbA. These genes were transcribed from the same DNA strand as psbA and multiple RNAs hybridized to them. TrnK and rsp16 contained introns; a 504 amino acid open reading frame (ORF504) was located within the trnK intron. Between trnQ and psbD-psbC was a 2.24 kbp region encoding psbK, psbI and trnS(GCU). PsbK and psbI are encoded on the same DNA strand as psbD-psbC whereas trnS(GCU) is transcribed from the opposite strand. Two large RNAs accumulate in barley etioplasts which contain psbK, psbI, anti-sense trnS(GCU) and psbD-psbC sequences. Other RNAs encode psbK and psbI only, or psbK only. The divergent trnS(GCU) located upstream of psbD-psbC and a second divergent trnS(UGA) located downstream of psbD-psbC were both expressed. Furthermore, RNA complementary to psbK and psbI mRNA was detected, suggesting that transcription from divergent overlapping transcription units may modulate expression from this DNA region.
Piggy: a rapid, large-scale pan-genome analysis tool for intergenic regions in bacteria.
Thorpe, Harry A; Bayliss, Sion C; Sheppard, Samuel K; Feil, Edward J
2018-04-01
The concept of the "pan-genome," which refers to the total complement of genes within a given sample or species, is well established in bacterial genomics. Rapid and scalable pipelines are available for managing and interpreting pan-genomes from large batches of annotated assemblies. However, despite overwhelming evidence that variation in intergenic regions in bacteria can directly influence phenotypes, most current approaches for analyzing pan-genomes focus exclusively on protein-coding sequences. To address this we present Piggy, a novel pipeline that emulates Roary except that it is based only on intergenic regions. A key utility provided by Piggy is the detection of highly divergent ("switched") intergenic regions (IGRs) upstream of genes. We demonstrate the use of Piggy on large datasets of clinically important lineages of Staphylococcus aureus and Escherichia coli. For S. aureus, we show that highly divergent (switched) IGRs are associated with differences in gene expression and we establish a multilocus reference database of IGR alleles (igMLST; implemented in BIGSdb).
Bewick, Adam J; Chain, Frédéric J J; Heled, Joseph; Evans, Ben J
2012-12-01
The estimation of phylogenetic relationships is an essential component of understanding evolution. Accurate phylogenetic estimation is difficult, however, when internodes are short and old, when genealogical discordance is common due to large ancestral effective population sizes or ancestral population structure, and when homoplasy is prevalent. Inference of divergence times is also hampered by unknown and uneven rates of evolution, the incomplete fossil record, uncertainty in relationships between fossil and extant lineages, and uncertainty in the age of fossils. Ideally, these challenges can be overcome by developing large "phylogenomic" data sets and by analyzing them with methods that accommodate features of the evolutionary process, such as genealogical discordance, recurrent substitution, recombination, ancestral population structure, gene flow after speciation among sampled and unsampled taxa, and variation in evolutionary rates. In some phylogenetic problems, it is possible to use information that is independent of fossils, such as the geological record, to identify putative triggers for diversification whose associated estimated divergence times can then be compared a posteriori with estimated relationships and ages of fossils. The history of diversification of pipid frog genera Pipa, Hymenochirus, Silurana, and Xenopus, for instance, is characterized by many of these evolutionary and analytical challenges. These frogs diversified dozens of millions of years ago, they have a relatively rich fossil record, their distributions span continental plates with a well characterized geological record of ancient connectivity, and there is considerable disagreement across studies in estimated evolutionary relationships. We used high throughput sequencing and public databases to generate a large phylogenomic data set with which we estimated evolutionary relationships using multilocus coalescence methods. We collected sequence data from Pipa, Hymenochirus, Silurana, and Xenopus and the outgroup taxon Rhinophrynus dorsalis from coding sequence of 113 autosomal regions, averaging ∼300 bp in length (range: 102-1695 bp) and also a portion of the mitochondrial genome. Analysis of these data using multiple approaches recovers strong support for the ((Xenopus, Silurana)(Pipa, Hymenochirus)) topology, and geologically calibrated divergence time estimates that are consistent with estimated ages and phylogenetic affinities of many fossils. These results provide new insights into the biogeography and chronology of pipid diversification during the breakup of Gondwanaland and illustrate how phylogenomic data may be necessary to tackle tough problems in molecular systematics. [Coalescence; gene tree; high-throughout sequencing; lineage sorting; pipid; species tree; Xenopus.].
Asaf, Sajjad; Khan, Abdul Latif; Khan, Muhammad Aaqil; Waqas, Muhammad; Kang, Sang-Mo; Yun, Byung-Wook; Lee, In-Jung
2017-08-08
We investigated the complete chloroplast (cp) genomes of non-model Arabidopsis halleri ssp. gemmifera and Arabidopsis lyrata ssp. petraea using Illumina paired-end sequencing to understand their genetic organization and structure. Detailed bioinformatics analysis revealed genome sizes of both subspecies ranging between 154.4~154.5 kbp, with a large single-copy region (84,197~84,158 bp), a small single-copy region (17,738~17,813 bp) and pair of inverted repeats (IRa/IRb; 26,264~26,259 bp). Both cp genomes encode 130 genes, including 85 protein-coding genes, eight ribosomal RNA genes and 37 transfer RNA genes. Whole cp genome comparison of A. halleri ssp. gemmifera and A. lyrata ssp. petraea, along with ten other Arabidopsis species, showed an overall high degree of sequence similarity, with divergence among some intergenic spacers. The location and distribution of repeat sequences were determined, and sequence divergences of shared genes were calculated among related species. Comparative phylogenetic analysis of the entire genomic data set and 70 shared genes between both cp genomes confirmed the previous phylogeny and generated phylogenetic trees with the same topologies. The sister species of A. halleri ssp. gemmifera is A. umezawana, whereas the closest relative of A. lyrata spp. petraea is A. arenicola.
Wang, Jing; Moore, Nicole E.; Murray, Zak L.; McInnes, Kate; White, Daniel J.; Tompkins, Daniel M.
2015-01-01
Bats harbour a diverse array of viruses, including significant human pathogens. Extensive metagenomic studies of material from bats, in particular guano, have revealed a large number of novel or divergent viral taxa that were previously unknown. New Zealand has only two extant indigenous terrestrial mammals, which are both bats, Mystacina tuberculata (the lesser short-tailed bat) and Chalinolobus tuberculatus (the long-tailed bat). Until the human introduction of exotic mammals, these species had been isolated from all other terrestrial mammals for over 1 million years (potentially over 16 million years for M. tuberculata). Four bat guano samples were collected from M. tuberculata roosts on the isolated offshore island of Whenua hou (Codfish Island) in New Zealand. Metagenomic analysis revealed that this species still hosts a plethora of divergent viruses. Whilst the majority of viruses detected were likely to be of dietary origin, some putative vertebrate virus sequences were identified. Papillomavirus, polyomavirus, calicivirus and hepevirus were found in the metagenomic data and subsequently confirmed using independent PCR assays and sequencing. The new hepevirus and calicivirus sequences may represent new genera within these viral families. Our findings may provide an insight into the origins of viral families, given their detection in an isolated host species. PMID:25900137
Wang, Jing; Moore, Nicole E; Murray, Zak L; McInnes, Kate; White, Daniel J; Tompkins, Daniel M; Hall, Richard J
2015-08-01
Bats harbour a diverse array of viruses, including significant human pathogens. Extensive metagenomic studies of material from bats, in particular guano, have revealed a large number of novel or divergent viral taxa that were previously unknown. New Zealand has only two extant indigenous terrestrial mammals, which are both bats, Mystacina tuberculata (the lesser short-tailed bat) and Chalinolobus tuberculatus (the long-tailed bat). Until the human introduction of exotic mammals, these species had been isolated from all other terrestrial mammals for over 1 million years (potentially over 16 million years for M. tuberculata). Four bat guano samples were collected from M. tuberculata roosts on the isolated offshore island of Whenua hou (Codfish Island) in New Zealand. Metagenomic analysis revealed that this species still hosts a plethora of divergent viruses. Whilst the majority of viruses detected were likely to be of dietary origin, some putative vertebrate virus sequences were identified. Papillomavirus, polyomavirus, calicivirus and hepevirus were found in the metagenomic data and subsequently confirmed using independent PCR assays and sequencing. The new hepevirus and calicivirus sequences may represent new genera within these viral families. Our findings may provide an insight into the origins of viral families, given their detection in an isolated host species.
Savard, L; Li, P; Strauss, S H; Chase, M W; Michaud, M; Bousquet, J
1994-01-01
We have estimated the time for the last common ancestor of extant seed plants by using molecular clocks constructed from the sequences of the chloroplastic gene coding for the large subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase (rbcL) and the nuclear gene coding for the small subunit of rRNA (Rrn18). Phylogenetic analyses of nucleotide sequences indicated that the earliest divergence of extant seed plants is likely represented by a split between conifer-cycad and angiosperm lineages. Relative-rate tests were used to assess homogeneity of substitution rates among lineages, and annual angiosperms were found to evolve at a faster rate than other taxa for rbcL and, thus, these sequences were excluded from construction of molecular clocks. Five distinct molecular clocks were calibrated using substitution rates for the two genes and four divergence times based on fossil and published molecular clock estimates. The five estimated times for the last common ancestor of extant seed plants were in agreement with one another, with an average of 285 million years and a range of 275-290 million years. This implies a substantially more recent ancestor of all extant seed plants than suggested by some theories of plant evolution. PMID:8197201
Highly divergent cyclo-like virus in a great roundleaf bat (Hipposideros armiger) in Vietnam.
Kemenesi, Gábor; Kurucz, Kornélia; Zana, Brigitta; Tu, Vuong Tan; Görföl, Tamás; Estók, Péter; Földes, Fanni; Sztancsik, Katalin; Urbán, Péter; Fehér, Enikő; Jakab, Ferenc
2017-08-01
Members of the viral family Circoviridae are increasingly recognized worldwide. Bats seem to be natural reservoirs or dietary-related dispensers of these viruses. Here, we report a distantly related member of the genus Cyclovirus detected in the faeces of a great roundleaf bat (Hipposideros armiger). Interestingly, the novel virus lacks a Circoviridae-specific stem-loop structure, although a Geminiviridae-like nonamer sequence was detected in the large intergenic region. Based on these differences and its phylogenetic position, we propose that our new virus represents a distant and highly divergent member of the genus Cyclovirus. However it is lacking several characteristics of members of the genus, which raises a challenge in its taxonomic classification.
Van Belleghem, Steven M; Baquero, Margarita; Papa, Riccardo; Salazar, Camilo; McMillan, W Owen; Counterman, Brian A; Jiggins, Chris D; Martin, Simon H
2018-03-22
Sex chromosomes are disproportionately involved in reproductive isolation and adaptation. In support of such a "large-X" effect, genome scans between recently diverged populations and species pairs often identify distinct patterns of divergence on the sex chromosome compared to autosomes. When measures of divergence between populations are higher on the sex chromosome compared to autosomes, such patterns could be interpreted as evidence for faster divergence on the sex chromosome, that is "faster-X", barriers to gene flow on the sex chromosome. However, demographic changes can strongly skew divergence estimates and are not always taken into consideration. We used 224 whole-genome sequences representing 36 populations from two Heliconius butterfly clades (H. erato and H. melpomene) to explore patterns of Z chromosome divergence. We show that increased divergence compared to equilibrium expectations can in many cases be explained by demographic change. Among Heliconius erato populations, for instance, population size increase in the ancestral population can explain increased absolute divergence measures on the Z chromosome compared to the autosomes, as a result of increased ancestral Z chromosome genetic diversity. Nonetheless, we do identify increased divergence on the Z chromosome relative to the autosomes in parapatric or sympatric species comparisons that imply postzygotic reproductive barriers. Using simulations, we show that this is consistent with reduced gene flow on the Z chromosome, perhaps due to greater accumulation of incompatibilities. Our work demonstrates the importance of taking demography into account to interpret patterns of divergence on the Z chromosome, but nonetheless provides evidence to support the Z chromosome as a strong barrier to gene flow in incipient Heliconius butterfly species. © 2018 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.
Multiplex primer prediction software for divergent targets
Gardner, Shea N.; Hiddessen, Amy L.; Williams, Peter L.; Hara, Christine; Wagner, Mark C.; Colston, Bill W.
2009-01-01
We describe a Multiplex Primer Prediction (MPP) algorithm to build multiplex compatible primer sets to amplify all members of large, diverse and unalignable sets of target sequences. The MPP algorithm is scalable to larger target sets than other available software, and it does not require a multiple sequence alignment. We applied it to questions in viral detection, and demonstrated that there are no universally conserved priming sequences among viruses and that it could require an unfeasibly large number of primers (∼3700 18-mers or ∼2000 10-mers) to generate amplicons from all sequenced viruses. We then designed primer sets separately for each viral family, and for several diverse species such as foot-and-mouth disease virus (FMDV), hemagglutinin (HA) and neuraminidase (NA) segments of influenza A virus, Norwalk virus, and HIV-1. We empirically demonstrated the application of the software with a multiplex set of 16 short (10 nt) primers designed to amplify the Poxviridae family to produce a specific amplicon from vaccinia virus. PMID:19759213
TRStalker: an efficient heuristic for finding fuzzy tandem repeats.
Pellegrini, Marco; Renda, M Elena; Vecchio, Alessio
2010-06-15
Genomes in higher eukaryotic organisms contain a substantial amount of repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage and are characterized by close spatial contiguity. They play an important role in several molecular regulatory mechanisms, and also in several diseases (e.g. in the group of trinucleotide repeat disorders). While for TRs with a low or medium level of divergence the current methods are rather effective, the problem of detecting TRs with higher divergence (fuzzy TRs) is still open. The detection of fuzzy TRs is propaedeutic to enriching our view of their role in regulatory mechanisms and diseases. Fuzzy TRs are also important as tools to shed light on the evolutionary history of the genome, where higher divergence correlates with more remote duplication events. We have developed an algorithm (christened TRStalker) with the aim of detecting efficiently TRs that are hard to detect because of their inherent fuzziness, due to high levels of base substitutions, insertions and deletions. To attain this goal, we developed heuristics to solve a Steiner version of the problem for which the fuzziness is measured with respect to a motif string not necessarily present in the input string. This problem is akin to the 'generalized median string' that is known to be an NP-hard problem. Experiments with both synthetic and biological sequences demonstrate that our method performs better than current state of the art for fuzzy TRs and that the fuzzy TRs of the type we detect are indeed present in important biological sequences. TRStalker will be integrated in the web-based TRs Discovery Service (TReaDS) at bioalgo.iit.cnr.it. Supplementary data are available at Bioinformatics online.
Xu, Jianpeng; Davis, C. Todd; Christman, Mary C.; Rivailler, Pierre; Zhong, Haizhen; Donis, Ruben O.; Lu, Guoqing
2012-01-01
Background Influenza neuraminidase (NA) is an important surface glycoprotein and plays a vital role in viral replication and drug development. The NA is found in influenza A and B viruses, with nine subtypes classified in influenza A. The complete knowledge of influenza NA evolutionary history and phylodynamics, although critical for the prevention and control of influenza epidemics and pandemics, remains lacking. Methodology/Principal findings Evolutionary and phylogenetic analyses of influenza NA sequences using Maximum Likelihood and Bayesian MCMC methods demonstrated that the divergence of influenza viruses into types A and B occurred earlier than the divergence of influenza A NA subtypes. Twenty-three lineages were identified within influenza A, two lineages were classified within influenza B, and most lineages were specific to host, subtype or geographical location. Interestingly, evolutionary rates vary not only among lineages but also among branches within lineages. The estimated tMRCAs of influenza lineages suggest that the viruses of different lineages emerge several months or even years before their initial detection. The d N /d S ratios ranged from 0.062 to 0.313 for influenza A lineages, and 0.257 to 0.259 for influenza B lineages. Structural analyses revealed that all positively selected sites are at the surface of the NA protein, with a number of sites found to be important for host antibody and drug binding. Conclusions/Significance The divergence into influenza type A and B from a putative ancestral NA was followed by the divergence of type A into nine NA subtypes, of which 23 lineages subsequently diverged. This study provides a better understanding of influenza NA lineages and their evolutionary dynamics, which may facilitate early detection of newly emerging influenza viruses and thus improve influenza surveillance. PMID:22808012
Structural Plasticity of Helical Nanotubes Based on Coiled-Coil Assemblies
Egelman, Edward H.; Xu, C.; DiMaio, F.; ...
2015-01-22
Numerous instances can be seen in evolution in which protein quaternary structures have diverged while the sequences of the building blocks have remained fairly conserved. However, the path through which such divergence has taken place is usually not known. We have designed two synthetic 29-residue α-helical peptides, based on the coiled-coil structural motif, that spontaneously self-assemble into helical nanotubes in vitro. Using electron cryomicroscopy with a newly available direct electron detection capability, we can achieve near-atomic resolution of these thin structures. We show how conservative changes of only one or two amino acids result in dramatic changes in quaternary structure,more » in which the assemblies can be switched between two very different forms. This system provides a framework for understanding how small sequence changes in evolution can translate into very large changes in supramolecular structure, a phenomenon that may have significant implications for the de novo design of synthetic peptide assemblies.« less
Nucleotide sequences of bovine alpha S1- and kappa-casein cDNAs.
Stewart, A F; Willis, I M; Mackinlay, A G
1984-01-01
The nucleotide sequences corresponding to bovine alpha S1- and kappa-casein mRNAs are presented. An unusual alpha S1-casein cDNA has been characterised whose 5' end commences upstream from its putative TATA box. The alpha S1-casein mRNA is compared to rat alpha-casein mRNA and two components of divergence are identified. Firstly, the two sequences have diverged at a high point mutation rate and the rate of amino acid replacement by this mechanism is at least as great as the rate of divergence of any other part of the mRNAs. Secondly, the protein coding sequence has been subjected to several insertion/deletion events, one of which may be an example of exon shuffling . The kappa-casein mRNA sequence verifies the proposition that it has arisen from a different ancestral gene to the other caseins. Images PMID:6328443
El-Sherry, Shiem; Ogedengbe, Mosun E; Hafeez, Mian A; Barta, John R
2013-07-01
Multiple 18S rDNA sequences were obtained from two single-oocyst-derived lines of each of Eimeria meleagrimitis and Eimeria adenoeides. After analysing the 15 new 18S rDNA sequences from two lines of E. meleagrimitis and 17 new sequences from two lines of E. adenoeides, there were clear indications that divergent, paralogous 18S rDNA copies existed within the nuclear genome of E. meleagrimitis. In contrast, mitochondrial cytochrome c oxidase subunit I (COI) partial sequences from all lines of a particular Eimeria sp. were identical and, in phylogenetic analyses, COI sequences clustered unambiguously in monophyletic and highly-supported clades specific to individual Eimeria sp. Phylogenetic analysis of the new 18S rDNA sequences from E. meleagrimitis showed that they formed two distinct clades: Type A with four new sequences; and Type B with nine new sequences; both Types A and B sequences were obtained from each of the single-oocyst-derived lines of E. meleagrimitis. Together these rDNA types formed a well-supported E. meleagrimitis clade. Types A and B 18S rDNA sequences from E. meleagrimitis had a mean sequence identity of only 97.4% whereas mean sequence identity within types was 99.1-99.3%. The observed intraspecific sequence divergence among E. meleagrimitis 18S rDNA sequence types was even higher (approximately 2.6%) than the interspecific sequence divergence present between some well-recognized species such as Eimeria tenella and Eimeria necatrix (1.1%). Our observations suggest that, unlike COI sequences, 18S rDNA sequences are not reliable molecular markers to be used alone for species identification with coccidia, although 18S rDNA sequences have clear utility for phylogenetic reconstruction of apicomplexan parasites at the genus and higher taxonomic ranks. Copyright © 2013. Published by Elsevier Ltd.
Conservation and divergence of ADAM family proteins in the Xenopus genome
2010-01-01
Background Members of the disintegrin metalloproteinase (ADAM) family play important roles in cellular and developmental processes through their functions as proteases and/or binding partners for other proteins. The amphibian Xenopus has long been used as a model for early vertebrate development, but genome-wide analyses for large gene families were not possible until the recent completion of the X. tropicalis genome sequence and the availability of large scale expression sequence tag (EST) databases. In this study we carried out a systematic analysis of the X. tropicalis genome and uncovered several interesting features of ADAM genes in this species. Results Based on the X. tropicalis genome sequence and EST databases, we identified Xenopus orthologues of mammalian ADAMs and obtained full-length cDNA clones for these genes. The deduced protein sequences, synteny and exon-intron boundaries are conserved between most human and X. tropicalis orthologues. The alternative splicing patterns of certain Xenopus ADAM genes, such as adams 22 and 28, are similar to those of their mammalian orthologues. However, we were unable to identify an orthologue for ADAM7 or 8. The Xenopus orthologue of ADAM15, an active metalloproteinase in mammals, does not contain the conserved zinc-binding motif and is hence considered proteolytically inactive. We also found evidence for gain of ADAM genes in Xenopus as compared to other species. There is a homologue of ADAM10 in Xenopus that is missing in most mammals. Furthermore, a single scaffold of X. tropicalis genome contains four genes encoding ADAM28 homologues, suggesting genome duplication in this region. Conclusions Our genome-wide analysis of ADAM genes in X. tropicalis revealed both conservation and evolutionary divergence of these genes in this amphibian species. On the one hand, all ADAMs implicated in normal development and health in other species are conserved in X. tropicalis. On the other hand, some ADAM genes and ADAM protease activities are absent, while other novel ADAM proteins in this species are predicted by this study. The conservation and unique divergence of ADAM genes in Xenopus probably reflect the particular selective pressures these amphibian species faced during evolution. PMID:20630080
Williamson, Scott; Fledel-Alon, Adi; Bustamante, Carlos D
2004-09-01
We develop a Poisson random-field model of polymorphism and divergence that allows arbitrary dominance relations in a diploid context. This model provides a maximum-likelihood framework for estimating both selection and dominance parameters of new mutations using information on the frequency spectrum of sequence polymorphisms. This is the first DNA sequence-based estimator of the dominance parameter. Our model also leads to a likelihood-ratio test for distinguishing nongenic from genic selection; simulations indicate that this test is quite powerful when a large number of segregating sites are available. We also use simulations to explore the bias in selection parameter estimates caused by unacknowledged dominance relations. When inference is based on the frequency spectrum of polymorphisms, genic selection estimates of the selection parameter can be very strongly biased even for minor deviations from the genic selection model. Surprisingly, however, when inference is based on polymorphism and divergence (McDonald-Kreitman) data, genic selection estimates of the selection parameter are nearly unbiased, even for completely dominant or recessive mutations. Further, we find that weak overdominant selection can increase, rather than decrease, the substitution rate relative to levels of polymorphism. This nonintuitive result has major implications for the interpretation of several popular tests of neutrality.
Phylogenetic shadowing of primate sequences to find functional regions of the human genome.
Boffelli, Dario; McAuliffe, Jon; Ovcharenko, Dmitriy; Lewis, Keith D; Ovcharenko, Ivan; Pachter, Lior; Rubin, Edward M
2003-02-28
Nonhuman primates represent the most relevant model organisms to understand the biology of Homo sapiens. The recent divergence and associated overall sequence conservation between individual members of this taxon have nonetheless largely precluded the use of primates in comparative sequence studies. We used sequence comparisons of an extensive set of Old World and New World monkeys and hominoids to identify functional regions in the human genome. Analysis of these data enabled the discovery of primate-specific gene regulatory elements and the demarcation of the exons of multiple genes. Much of the information content of the comprehensive primate sequence comparisons could be captured with a small subset of phylogenetically close primates. These results demonstrate the utility of intraprimate sequence comparisons to discover common mammalian as well as primate-specific functional elements in the human genome, which are unattainable through the evaluation of more evolutionarily distant species.
Ai, Ye; Joo, Sang W; Jiang, Yingtao; Xuan, Xiangchun; Qian, Shizhi
2009-07-01
Transient electrophoretic motion of a charged particle through a converging-diverging microchannel is studied by solving the coupled system of the Navier-Stokes equations for fluid flow and the Laplace equation for electrical field with an arbitrary Lagrangian-Eulerian finite-element method. A spatially non-uniform electric field is induced in the converging-diverging section, which gives rise to a direct current dielectrophoretic (DEP) force in addition to the electrostatic force acting on the charged particle. As a sequence, the symmetry of the particle velocity and trajectory with respect to the throat is broken. We demonstrate that the predicted particle trajectory shifts due to DEP show quantitative agreements with the existing experimental data. Although converging-diverging microchannels can be used for super fast electrophoresis due to the enhancement of the local electric field, it is shown that large particles may be blocked due to the induced DEP force, which thus must be taken into account in the study of electrophoresis in microfluidic devices where non-uniform electric fields are present.
The genomic substrate for adaptive radiation in African cichlid fish.
Brawand, David; Wagner, Catherine E; Li, Yang I; Malinsky, Milan; Keller, Irene; Fan, Shaohua; Simakov, Oleg; Ng, Alvin Y; Lim, Zhi Wei; Bezault, Etienne; Turner-Maier, Jason; Johnson, Jeremy; Alcazar, Rosa; Noh, Hyun Ji; Russell, Pamela; Aken, Bronwen; Alföldi, Jessica; Amemiya, Chris; Azzouzi, Naoual; Baroiller, Jean-François; Barloy-Hubler, Frederique; Berlin, Aaron; Bloomquist, Ryan; Carleton, Karen L; Conte, Matthew A; D'Cotta, Helena; Eshel, Orly; Gaffney, Leslie; Galibert, Francis; Gante, Hugo F; Gnerre, Sante; Greuter, Lucie; Guyon, Richard; Haddad, Natalie S; Haerty, Wilfried; Harris, Rayna M; Hofmann, Hans A; Hourlier, Thibaut; Hulata, Gideon; Jaffe, David B; Lara, Marcia; Lee, Alison P; MacCallum, Iain; Mwaiko, Salome; Nikaido, Masato; Nishihara, Hidenori; Ozouf-Costaz, Catherine; Penman, David J; Przybylski, Dariusz; Rakotomanga, Michaelle; Renn, Suzy C P; Ribeiro, Filipe J; Ron, Micha; Salzburger, Walter; Sanchez-Pulido, Luis; Santos, M Emilia; Searle, Steve; Sharpe, Ted; Swofford, Ross; Tan, Frederick J; Williams, Louise; Young, Sarah; Yin, Shuangye; Okada, Norihiro; Kocher, Thomas D; Miska, Eric A; Lander, Eric S; Venkatesh, Byrappa; Fernald, Russell D; Meyer, Axel; Ponting, Chris P; Streelman, J Todd; Lindblad-Toh, Kerstin; Seehausen, Ole; Di Palma, Federica
2014-09-18
Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification.
The genomic substrate for adaptive radiation in African cichlid fish
Malinsky, Milan; Keller, Irene; Fan, Shaohua; Simakov, Oleg; Ng, Alvin Y.; Lim, Zhi Wei; Bezault, Etienne; Turner-Maier, Jason; Johnson, Jeremy; Alcazar, Rosa; Noh, Hyun Ji; Russell, Pamela; Aken, Bronwen; Alföldi, Jessica; Amemiya, Chris; Azzouzi, Naoual; Baroiller, Jean-François; Barloy-Hubler, Frederique; Berlin, Aaron; Bloomquist, Ryan; Carleton, Karen L.; Conte, Matthew A.; D'Cotta, Helena; Eshel, Orly; Gaffney, Leslie; Galibert, Francis; Gante, Hugo F.; Gnerre, Sante; Greuter, Lucie; Guyon, Richard; Haddad, Natalie S.; Haerty, Wilfried; Harris, Rayna M.; Hofmann, Hans A.; Hourlier, Thibaut; Hulata, Gideon; Jaffe, David B.; Lara, Marcia; Lee, Alison P.; MacCallum, Iain; Mwaiko, Salome; Nikaido, Masato; Nishihara, Hidenori; Ozouf-Costaz, Catherine; Penman, David J.; Przybylski, Dariusz; Rakotomanga, Michaelle; Renn, Suzy C. P.; Ribeiro, Filipe J.; Ron, Micha; Salzburger, Walter; Sanchez-Pulido, Luis; Santos, M. Emilia; Searle, Steve; Sharpe, Ted; Swofford, Ross; Tan, Frederick J.; Williams, Louise; Young, Sarah; Yin, Shuangye; Okada, Norihiro; Kocher, Thomas D.; Miska, Eric A.; Lander, Eric S.; Venkatesh, Byrappa; Fernald, Russell D.; Meyer, Axel; Ponting, Chris P.; Streelman, J. Todd; Lindblad-Toh, Kerstin; Seehausen, Ole; Di Palma, Federica
2015-01-01
Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification. PMID:25186727
Jennings, W Bryan; Wogel, Henrique; Bilate, Marcos; Salles, Rodrigo de O L; Buckup, Paulo A
2016-09-01
The microhylid frogs belonging to the genus Arcovomer have been reported from lowland Atlantic Rainforest in the Brazilian states of Espírito Santo, Rio de Janeiro, and São Paulo. Here, we use DNA barcoding to assess levels of genetic divergence between apparently isolated populations in Espírito Santo and Rio de Janeiro. Our mtDNA data consisting of cytochrome oxidase subunit I (COI) nucleotide sequences reveals 13.2% uncorrected and 30.4% TIM2 + I + Γ corrected genetic divergences between these two populations. This level of divergence exceeds the suggested 10% uncorrected divergence threshold for elevating amphibian populations to candidate species using this marker, which implies that the Espírito Santo population is a species distinct from Arcovomer passarellii. Calibration of our model-corrected sequence divergence estimates suggests that the time of population divergence falls between 12 and 29 million years ago.
Yamada, Kazuhiko; Kamimura, Eikichi; Kondo, Mariko; Tsuchiya, Kimiyuki; Nishida-Umehara, Chizuko; Matsuda, Yoichi
2006-02-01
We molecularly cloned new families of site-specific repetitive DNA sequences from BglII- and EcoRI-digested genomic DNA of the Syrian hamster (Mesocricetus auratus, Cricetrinae, Rodentia) and characterized them by chromosome in situ hybridization and filter hybridization. They were classified into six different types of repetitive DNA sequence families according to chromosomal distribution and genome organization. The hybridization patterns of the sequences were consistent with the distribution of C-positive bands and/or Hoechst-stained heterochromatin. The centromeric major satellite DNA and sex chromosome-specific and telomeric region-specific repetitive sequences were conserved in the same genus (Mesocricetus) but divergent in different genera. The chromosome-2-specific sequence was conserved in two genera, Mesocricetus and Cricetulus, and a low copy number of repetitive sequences on the heterochromatic chromosome arms were conserved in the subfamily Cricetinae but not in the subfamily Calomyscinae. By contrast, the other type of repetitive sequences on the heterochromatic chromosome arms, which had sequence similarities to a LINE sequence of rodents, was conserved through the three subfamilies, Cricetinae, Calomyscinae and Murinae. The nucleotide divergence of the repetitive sequences of heterochromatin was well correlated with the phylogenetic relationships of the Cricetinae species, and each sequence has been independently amplified and diverged in the same genome.
DNA barcodes for dragonflies and damselflies (Odonata) of Mindanao, Philippines.
Casas, Princess Angelie S; Sing, Kong-Wah; Lee, Ping-Shin; Nuñeza, Olga M; Villanueva, Reagan Joseph T; Wilson, John-James
2018-03-01
Reliable species identification provides a sounder basis for use of species in the order Odonata as biological indicators and for their conservation, an urgent concern as many species are threatened with imminent extinction. We generated 134 COI barcodes from 36 morphologically identified species of Odonata collected from Mindanao Island, representing 10 families and 19 genera. Intraspecific sequence divergences ranged from 0 to 6.7% with four species showing more than 2%, while interspecific sequence divergences ranged from 0.5 to 23.3% with seven species showing less than 2%. Consequently, no distinct gap was observed between intraspecific and interspecific DNA barcode divergences. The numerous islands of the Philippine archipelago may have facilitated rapid speciation in the Odonata and resulted in low interspecific sequence divergences among closely related groups of species. This study contributes DNA barcodes for 36 morphologically identified species of Odonata reported from Mindanao including 31 species with no previous DNA barcode records.
TaxI: a software tool for DNA barcoding using distance methods
Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel
2005-01-01
DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755
Ma, Ji; Yang, Bingxian; Zhu, Wei; Sun, Lianli; Tian, Jingkui; Wang, Xumin
2013-10-10
Mahonia bealei (Berberidaceae) is a frequently-used traditional Chinese medicinal plant with efficient anti-inflammatory ability. This plant is one of the sources of berberine, a new cholesterol-lowering drug with anti-diabetic activity. We have sequenced the complete nucleotide sequence of the chloroplast (cp) genome of M. bealei. The complete cp genome of M. bealei is 164,792 bp in length, and has a typical structure with large (LSC 73,052 bp) and small (SSC 18,591 bp) single-copy regions separated by a pair of inverted repeats (IRs 36,501 bp) of large size. The Mahonia cp genome contains 111 unique genes and 39 genes are duplicated in the IR regions. The gene order and content of M. bealei are almost unarranged which is consistent with the hypothesis that large IRs stabilize cp genome and reduce gene loss-and-gain probabilities during evolutionary process. A large IR expansion of over 12 kb has occurred in M. bealei, 15 genes (rps19, rpl22, rps3, rpl16, rpl14, rps8, infA, rpl36, rps11, petD, petB, psbH, psbN, psbT and psbB) have expanded to have an additional copy in the IRs. The IR expansion rearrangement occurred via a double-strand DNA break and subsequence repair, which is different from the ordinary gene conversion mechanism. Repeat analysis identified 39 direct/inverted repeats 30 bp or longer with a sequence identity ≥ 90%. Analysis also revealed 75 simple sequence repeat (SSR) loci and almost all are composed of A or T, contributing to a distinct bias in base composition. Comparison of protein-coding sequences with ESTs reveals 9 putative RNA edits and 5 of them resulted in non-synonymous modifications in rpoC1, rps2, rps19 and ycf1. Phylogenetic analysis using maximum parsimony (MP) and maximum likelihood (ML) was performed on a dataset composed of 65 protein-coding genes from 25 taxa, which yields an identical tree topology as previous plastid-based trees, and provides strong support for the sister relationship between Ranunculaceae and Berberidaceae. Molecular dating analyses suggest that Ranunculaceae and Berberidaceae diverged between 90 and 84 mya, which is congruent with the fossil records and with recent estimates of the divergence time of these two taxa. © 2013.
Llopart, Ana
2018-05-01
The hemizygosity of the X (Z) chromosome fully exposes the fitness effects of mutations on that chromosome and has evolutionary consequences on the relative rates of evolution of X and autosomes. Specifically, several population genetics models predict increased rates of evolution in X-linked loci relative to autosomal loci. This prediction of faster-X evolution has been evaluated and confirmed for both protein coding sequences and gene expression. In the case of faster-X evolution for gene expression divergence, it is often assumed that variation in 5' noncoding sequences is associated with variation in transcript abundance between species but a formal, genomewide test of this hypothesis is still missing. Here, I use whole genome sequence data in Drosophila yakuba and D. santomea to evaluate this hypothesis and report positive correlations between sequence divergence at 5' noncoding sequences and gene expression divergence. I also examine polymorphism and divergence in 9,279 noncoding sequences located at the 5' end of annotated genes and detected multiple signals of positive selection. Notably, I used the traditional synonymous sites as neutral reference to test for adaptive evolution, but I also used bases 8-30 of introns <65 bp, which have been proposed to be a better neutral choice. X-linked genes with high degree of male-biased expression show the most extreme adaptive pattern at 5' noncoding regions, in agreement with faster-X evolution for gene expression divergence and a higher incidence of positively selected recessive mutations. © 2018 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.
Lobo, Jorge; Ferreira, Maria S; Antunes, Ilisa C; Teixeira, Marcos A L; Borges, Luisa M S; Sousa, Ronaldo; Gomes, Pedro A; Costa, Maria Helena; Cunha, Marina R; Costa, Filipe O
2017-02-01
In this study we compared DNA barcode-suggested species boundaries with morphology-based species identifications in the amphipod fauna of the southern European Atlantic coast. DNA sequences of the cytochrome c oxidase subunit I barcode region (COI-5P) were generated for 43 morphospecies (178 specimens) collected along the Portuguese coast which, together with publicly available COI-5P sequences, produced a final dataset comprising 68 morphospecies and 295 sequences. Seventy-five BINs (Barcode Index Numbers) were assigned to these morphospecies, of which 48 were concordant (i.e., 1 BIN = 1 species), 8 were taxonomically discordant, and 19 were singletons. Twelve species had matching sequences (<2% distance) with conspecifics from distant locations (e.g., North Sea). Seven morphospecies were assigned to multiple, and highly divergent, BINs, including specimens of Corophium multisetosum (18% divergence) and Dexamine spiniventris (16% divergence), which originated from sampling locations on the west coast of Portugal (only about 36 and 250 km apart, respectively). We also found deep divergence (4%-22%) among specimens of seven species from Portugal compared to those from the North Sea and Italy. The detection of evolutionarily meaningful divergence among populations of several amphipod species from southern Europe reinforces the need for a comprehensive re-assessment of the diversity of this faunal group.
A Large Pseudoautosomal Region on the Sex Chromosomes of the Frog Silurana tropicalis
Bewick, Adam J.; Chain, Frédéric J.J.; Zimmerman, Lyle B.; Sesay, Abdul; Gilchrist, Michael J.; Owens, Nick D.L.; Seifertova, Eva; Krylov, Vladimir; Macha, Jaroslav; Tlapakova, Tereza; Kubickova, Svatava; Cernohorska, Halina; Zarsky, Vojtech; Evans, Ben J.
2013-01-01
Sex chromosome divergence has been documented across phylogenetically diverse species, with amphibians typically having cytologically nondiverged (“homomorphic”) sex chromosomes. With an aim of further characterizing sex chromosome divergence of an amphibian, we used “RAD-tags” and Sanger sequencing to examine sex specificity and heterozygosity in the Western clawed frog Silurana tropicalis (also known as Xenopus tropicalis). Our findings based on approximately 20 million genotype calls and approximately 200 polymerase chain reaction-amplified regions across multiple male and female genomes failed to identify a substantially sized genomic region with genotypic hallmarks of sex chromosome divergence, including in regions known to be tightly linked to the sex-determining region. We also found that expression and molecular evolution of genes linked to the sex-determining region did not differ substantially from genes in other parts of the genome. This suggests that the pseudoautosomal region, where recombination occurs, comprises a large portion of the sex chromosomes of S. tropicalis. These results may in part explain why African clawed frogs have such a high incidence of polyploidization, shed light on why amphibians have a high rate of sex chromosome turnover, and raise questions about why homomorphic sex chromosomes are so prevalent in amphibians. PMID:23666865
Chambers, E Anne; Hebert, Paul D N
2016-01-01
High rates of species discovery and loss have led to the urgent need for more rapid assessment of species diversity in the herpetofauna. DNA barcoding allows for the preliminary identification of species based on sequence divergence. Prior DNA barcoding work on reptiles and amphibians has revealed higher biodiversity counts than previously estimated due to cases of cryptic and undiscovered species. Past studies have provided DNA barcodes for just 14% of the North American herpetofauna, revealing the need for expanded coverage. This study extends the DNA barcode reference library for North American herpetofauna, assesses the utility of this approach in aiding species delimitation, and examines the correspondence between current species boundaries and sequence clusters designated by the BIN system. Sequences were obtained from 730 specimens, representing 274 species (43%) from the North American herpetofauna. Mean intraspecific divergences were 1% and 3%, while average congeneric sequence divergences were 16% and 14% in amphibians and reptiles, respectively. BIN assignments corresponded with current species boundaries in 79% of amphibians, 100% of turtles, and 60% of squamates. Deep divergences (>2%) were noted in 35% of squamate and 16% of amphibian species, and low divergences (<2%) occurred in 12% of reptiles and 23% of amphibians, patterns reflected in BIN assignments. Sequence recovery declined with specimen age, and variation in recovery success was noted among collections. Within collections, barcodes effectively flagged seven mislabeled tissues, and barcode fragments were recovered from five formalin-fixed specimens. This study demonstrates that DNA barcodes can effectively flag errors in museum collections, while BIN splits and merges reveal taxa belonging to deeply diverged or hybridizing lineages. This study is the first effort to compile a reference library of DNA barcodes for herpetofauna on a continental scale.
Chambers, E. Anne; Hebert, Paul D. N.
2016-01-01
Background High rates of species discovery and loss have led to the urgent need for more rapid assessment of species diversity in the herpetofauna. DNA barcoding allows for the preliminary identification of species based on sequence divergence. Prior DNA barcoding work on reptiles and amphibians has revealed higher biodiversity counts than previously estimated due to cases of cryptic and undiscovered species. Past studies have provided DNA barcodes for just 14% of the North American herpetofauna, revealing the need for expanded coverage. Methodology/Principal Findings This study extends the DNA barcode reference library for North American herpetofauna, assesses the utility of this approach in aiding species delimitation, and examines the correspondence between current species boundaries and sequence clusters designated by the BIN system. Sequences were obtained from 730 specimens, representing 274 species (43%) from the North American herpetofauna. Mean intraspecific divergences were 1% and 3%, while average congeneric sequence divergences were 16% and 14% in amphibians and reptiles, respectively. BIN assignments corresponded with current species boundaries in 79% of amphibians, 100% of turtles, and 60% of squamates. Deep divergences (>2%) were noted in 35% of squamate and 16% of amphibian species, and low divergences (<2%) occurred in 12% of reptiles and 23% of amphibians, patterns reflected in BIN assignments. Sequence recovery declined with specimen age, and variation in recovery success was noted among collections. Within collections, barcodes effectively flagged seven mislabeled tissues, and barcode fragments were recovered from five formalin-fixed specimens. Conclusions/Significance This study demonstrates that DNA barcodes can effectively flag errors in museum collections, while BIN splits and merges reveal taxa belonging to deeply diverged or hybridizing lineages. This study is the first effort to compile a reference library of DNA barcodes for herpetofauna on a continental scale. PMID:27116180
Lim, K Yoong; Kovarik, Ales; Matyasek, Roman; Chase, Mark W; Knapp, Sandra; McCarthy, Elizabeth; Clarkson, James J; Leitch, Andrew R
2006-12-01
Combining phylogenetic reconstructions of species relationships with comparative genomic approaches is a powerful way to decipher evolutionary events associated with genome divergence. Here, we reconstruct the history of karyotype and tandem repeat evolution in species of diploid Nicotiana section Alatae. By analysis of plastid DNA, we resolved two clades with high bootstrap support, one containing N. alata, N. langsdorffii, N. forgetiana and N. bonariensis (called the n = 9 group) and another containing N. plumbaginifolia and N. longiflora (called the n = 10 group). Despite little plastid DNA sequence divergence, we observed, via fluorescent in situ hybridization, substantial chromosomal repatterning, including altered chromosome numbers, structure and distribution of repeats. Effort was focussed on 35S and 5S nuclear ribosomal DNA (rDNA) and the HRS60 satellite family of tandem repeats comprising the elements HRS60, NP3R and NP4R. We compared divergence of these repeats in diploids and polyploids of Nicotiana. There are dramatic shifts in the distribution of the satellite repeats and complete replacement of intergenic spacers (IGSs) of 35S rDNA associated with divergence of the species in section Alatae. We suggest that sequence homogenization has replaced HRS60 family repeats at sub-telomeric regions, but that this process may not occur, or occurs more slowly, when the repeats are found at intercalary locations. Sequence homogenization acts more rapidly (at least two orders of magnitude) on 35S rDNA than 5S rDNA and sub-telomeric satellite sequences. This rapid rate of divergence is analogous to that found in polyploid species, and is therefore, in plants, not only associated with polyploidy.
Tracking the origins of the cave bear (Ursus spelaeus) by mitochondrial DNA sequencing.
Hänni, C; Laudet, V; Stehelin, D; Taberlet, P
1994-01-01
The different European populations of Ursus arctos, the brown bear, were recently studied for mitochondrial DNA polymorphism. Two clearly distinct lineages (eastern and western) were found, which may have diverged approximately 850,000 years ago. In this context, it was interesting to study the cave bear, Ursus spelaeus, a species which became extinct 20,000 years ago. In this study, we have amplified and sequenced a fragment of 139-bp in the mitochondrial DNA control region of a 40,000-year-old specimen of U. spelaeus. Phylogenetic reconstructions using this sequence and the European brown bear sequences already published suggest that U. spelaeus diverged from an early offshoot of U. arctos--i.e., approximately at the same time as the divergence of the two main lineages of U. arctos. This divergence probably took place at the earliest glaciation, likely due to geographic separation during the earlier Quaternary cold periods. This result is in agreement with the paleontological data available and suggests a good correspondence between molecular and morphological data. Images PMID:7991628
Zhang, Honghai; Chen, Lei
2011-03-01
The dhole (Cuon alpinus) is the only existent species in the genus Cuon (Carnivora: Canidae). In the present study, the complete mitochondrial genome of the dhole was sequenced. The total length is 16672 base pairs which is the shortest in Canidae. Sequence analysis revealed that most mitochondrial genomic functional regions were highly consistent among canid animals except the CSB domain of the control region. The difference in length among the Canidae mitochondrial genome sequences is mainly due to the number of short segments of tandem repeated in the CSB domain. Phylogenetic analysis was progressed based on the concatenated data set of 14 mitochondrial genes of 8 canid animals by using maximum parsimony (MP), maximum likelihood (ML) and Bayesian (BI) inference methods. The genera Vulpes and Nyctereutes formed a sister group and split first within Canidae, followed by that in the Cuon. The divergence in the genus Canis was the latest. The divarication of domestic dogs after that of the Canis lupus laniger is completely supported by all the three topologies. Pairwise sequence divergence data of different mitochondrial genes among canid animals were also determined. Except for the synonymous substitutions in protein-coding genes, the control region exhibits the highest sequence divergences. The synonymous rates are approximately two to six times higher than those of the non-synonymous sites except for a slightly higher rate in the non-synonymous substitution between Cuon alpinus and Vulpes vulpes. 16S rRNA genes have a slightly faster sequence divergence than 12S rRNA and tRNA genes. Based on nucleotide substitutions of tRNA genes and rRNA genes, the times since divergence between dhole and other canid animals, and between domestic dogs and three subspecies of wolves were evaluated. The result indicates that Vulpes and Nyctereutes have a close phylogenetic relationship and the divergence of Nyctereutes is a little earlier. The Tibetan wolf may be an archaic pedigree within wolf subspecies. The genetic distance between wolves and domestic dogs is less than that among different subspecies of wolves. The domestication of dogs was about 1.56-1.92 million years ago or even earlier.
Bayesian estimation of post-Messinian divergence times in Balearic Island lizards.
Brown, R P; Terrasa, B; Pérez-Mellado, V; Castro, J A; Hoskisson, P A; Picornell, A; Ramon, M M
2008-07-01
Phylogenetic relationships and timings of major cladogenesis events are investigated in the Balearic Island lizards Podarcislilfordi and P.pityusensis using 2675bp of mitochondrial and nuclear DNA sequences. Partitioned Bayesian and Maximum Parsimony analyses provided a well-resolved phylogeny with high node-support values. Bayesian MCMC estimation of node dates was investigated by comparing means of posterior distributions from different subsets of the sequence against the most robust analysis which used multiple partitions and allowed for rate heterogeneity among branches under a rate-drift model. Evolutionary rates were systematically underestimated and thus divergence times overestimated when sequences containing lower numbers of variable sites were used (based on ingroup node constraints). The following analyses allowed the best recovery of node times under the constant-rate (i.e., perfect clock) model: (i) all cytochrome b sequence (partitioned by codon position), (ii) cytochrome b (codon position 3 alone), (iii) NADH dehydrogenase (subunits 1 and 2; partitioned by codon position), (iv) cytochrome b and NADH dehydrogenase sequence together (six gene-codon partitions), (v) all unpartitioned sequence, (vi) a full multipartition analysis (nine partitions). Of these, only (iv) and (vi) performed well under the rate-drift model. These findings have significant implications for dating of recent divergence times in other taxa. The earliest P.lilfordi cladogenesis event (divergence of Menorcan populations), occurred before the end of the Pliocene, some 2.6Ma. Subsequent events led to a West Mallorcan lineage (2.0Ma ago), followed 1.2Ma ago by divergence of populations from the southern part of the Cabrera archipelago from a widely-distributed group from north Cabrera, northern and southern Mallorcan islets. Divergence within P.pityusensis is more recent with the main Ibiza and Formentera clades sharing a common ancestor at about 1.0Ma ago. Climatic and sea level changes are likely to have initiated cladogenesis, with lineages making secondary contact during periodic landbridge formation. This oscillating cross-archipelago pattern in which ancient divergence is followed by repeated contact resembles that seen between East-West refugia populations from mainland Europe.
Mining sequence variations in representative polyploid sugarcane germplasm accessions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Xiping; Song, Jian; You, Qian
Sugarcane (Saccharum spp.) is one of the most important economic crops because of its high sugar production and biofuel potential. Due to the high polyploid level and complex genome of sugarcane, it has been a huge challenge to investigate genomic sequence variations, which are critical for identifying alleles contributing to important agronomic traits. In order to mine the genetic variations in sugarcane, genotyping by sequencing (GBS), was used to genotype 14 representative Saccharum complex accessions. GBS is a method to generate a large number of markers, enabled by next generation sequencing (NGS) and the genome complexity reduction using restriction enzymes.more » To use GBS for high throughput genotyping highly polyploid sugarcane, the GBS analysis pipelines in 14 Saccharum complex accessions were established by evaluating different alignment methods, sequence variants callers, and sequence depth for single nucleotide polymorphism (SNP) filtering. By using the established pipeline, a total of 76,251 non-redundant SNPs, 5642 InDels, 6380 presence/absence variants (PAVs), and 826 copy number variations (CNVs) were detected among the 14 accessions. In addition, non-reference based universal network enabled analysis kit and Stacks de novo called 34,353 and 109,043 SNPs, respectively. In the 14 accessions, the percentages of single dose SNPs ranged from 38.3% to 62.3% with an average of 49.6%, much more than the portions of multiple dosage SNPs. Concordantly called SNPs were used to evaluate the phylogenetic relationship among the 14 accessions. The results showed that the divergence time between the Erianthus genus and the Saccharum genus was more than 10 million years ago (MYA). The Saccharum species separated from their common ancestors ranging from 0.19 to 1.65 MYA. The GBS pipelines including the reference sequences, alignment methods, sequence variant callers, and sequence depth were recommended and discussed for the Saccharum complex and other related species. A large number of sequence variations were discovered in the Saccharum complex, including SNPs, InDels, PAVs, and CNVs. Genome-wide SNPs were further used to illustrate sequence features of polyploid species and demonstrated the divergence of different species in the Saccharum complex. The results of this study showed that GBS was an effective NGS-based method to discover genomic sequence variations in highly polyploid and heterozygous species.« less
Mining sequence variations in representative polyploid sugarcane germplasm accessions
Yang, Xiping; Song, Jian; You, Qian; ...
2017-08-09
Sugarcane (Saccharum spp.) is one of the most important economic crops because of its high sugar production and biofuel potential. Due to the high polyploid level and complex genome of sugarcane, it has been a huge challenge to investigate genomic sequence variations, which are critical for identifying alleles contributing to important agronomic traits. In order to mine the genetic variations in sugarcane, genotyping by sequencing (GBS), was used to genotype 14 representative Saccharum complex accessions. GBS is a method to generate a large number of markers, enabled by next generation sequencing (NGS) and the genome complexity reduction using restriction enzymes.more » To use GBS for high throughput genotyping highly polyploid sugarcane, the GBS analysis pipelines in 14 Saccharum complex accessions were established by evaluating different alignment methods, sequence variants callers, and sequence depth for single nucleotide polymorphism (SNP) filtering. By using the established pipeline, a total of 76,251 non-redundant SNPs, 5642 InDels, 6380 presence/absence variants (PAVs), and 826 copy number variations (CNVs) were detected among the 14 accessions. In addition, non-reference based universal network enabled analysis kit and Stacks de novo called 34,353 and 109,043 SNPs, respectively. In the 14 accessions, the percentages of single dose SNPs ranged from 38.3% to 62.3% with an average of 49.6%, much more than the portions of multiple dosage SNPs. Concordantly called SNPs were used to evaluate the phylogenetic relationship among the 14 accessions. The results showed that the divergence time between the Erianthus genus and the Saccharum genus was more than 10 million years ago (MYA). The Saccharum species separated from their common ancestors ranging from 0.19 to 1.65 MYA. The GBS pipelines including the reference sequences, alignment methods, sequence variant callers, and sequence depth were recommended and discussed for the Saccharum complex and other related species. A large number of sequence variations were discovered in the Saccharum complex, including SNPs, InDels, PAVs, and CNVs. Genome-wide SNPs were further used to illustrate sequence features of polyploid species and demonstrated the divergence of different species in the Saccharum complex. The results of this study showed that GBS was an effective NGS-based method to discover genomic sequence variations in highly polyploid and heterozygous species.« less
Koloniuk, Igor; Fránová, Jana; Sarkisova, Tatiana; Přibylová, Jaroslava
2018-05-04
Strawberry crinkle disease is one of the major diseases that threatens strawberry production. Although the biological properties of the agent, strawberry crinkle virus (SCV), have been thoroughly investigated, its complete genome sequence has never been published. Existing RT-PCR-based detection relies on a partial sequence of the L protein gene, presumably the least expressed viral gene. Here, we present complete sequences of two divergent SCV isolates co-infecting a single plant, Fragaria x ananassa cv. Čačanská raná.
Lopez, Philippe; Halary, Sébastien; Bapteste, Eric
2015-10-26
Microbial genetic diversity is often investigated via the comparison of relatively similar 16S molecules through multiple alignments between reference sequences and novel environmental samples using phylogenetic trees, direct BLAST matches, or phylotypes counts. However, are we missing novel lineages in the microbial dark universe by relying on standard phylogenetic and BLAST methods? If so, how can we probe that universe using alternative approaches? We performed a novel type of multi-marker analysis of genetic diversity exploiting the topology of inclusive sequence similarity networks. Our protocol identified 86 ancient gene families, well distributed and rarely transferred across the 3 domains of life, and retrieved their environmental homologs among 10 million predicted ORFs from human gut samples and other metagenomic projects. Numerous highly divergent environmental homologs were observed in gut samples, although the most divergent genes were over-represented in non-gut environments. In our networks, most divergent environmental genes grouped exclusively with uncultured relatives, in maximal cliques. Sequences within these groups were under strong purifying selection and presented a range of genetic variation comparable to that of a prokaryotic domain. Many genes families included environmental homologs that were highly divergent from cultured homologs: in 79 gene families (including 18 ribosomal proteins), Bacteria and Archaea were less divergent than some groups of environmental sequences were to any cultured or viral homologs. Moreover, some groups of environmental homologs branched very deeply in phylogenetic trees of life, when they were not too divergent to be aligned. These results underline how limited our understanding of the most diverse elements of the microbial world remains, and encourage a deeper exploration of natural communities and their genetic resources, hinting at the possibility that still unknown yet major divisions of life have yet to be discovered.
Natural Allelic Variations in Highly Polyploidy Saccharum Complex
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Jian; Yang, Xiping; Resende, Jr., Marcio F. R.
Sugarcane ( Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designedmore » based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWAmem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. Furthermore, the target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.« less
Natural Allelic Variations in Highly Polyploidy Saccharum Complex
Song, Jian; Yang, Xiping; Resende, Jr., Marcio F. R.; ...
2016-06-08
Sugarcane ( Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designedmore » based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWAmem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. Furthermore, the target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.« less
Domain architecture conservation in orthologs
2011-01-01
Background As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs. Results The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent. Conclusions On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance. PMID:21819573
Lohse, Konrad; Clarke, Magnus; Ritchie, Michael G.; Etges, William J.
2015-01-01
Models of speciation‐with‐gene‐flow have shown that the reduction in recombination between alternative chromosome arrangements can facilitate the fixation of locally adaptive genes in the face of gene flow and contribute to speciation. However, it has proven frustratingly difficult to show empirically that inversions have reduced gene flow and arose during or shortly after the onset of species divergence rather than represent ancestral polymorphisms. Here, we present an analysis of whole genome data from a pair of cactophilic fruit flies, Drosophila mojavensis and D. arizonae, which are reproductively isolated in the wild and differ by several large inversions on three chromosomes. We found an increase in divergence at rearranged compared to colinear chromosomes. Using the density of divergent sites in short sequence blocks we fit a series of explicit models of species divergence in which gene flow is restricted to an initial period after divergence and may differ between colinear and rearranged parts of the genome. These analyses show that D. mojavensis and D. arizonae have experienced postdivergence gene flow that ceased around 270 KY ago and was significantly reduced in chromosomes with fixed inversions. Moreover, we show that these inversions most likely originated around the time of species divergence which is compatible with theoretical models that posit a role of inversions in speciation with gene flow. PMID:25824653
Marine, Rachel L; Nasko, Daniel J; Wray, Jeffrey; Polson, Shawn W; Wommack, K Eric
2017-01-01
Chaperonins are protein-folding machinery found in all cellular life. Chaperonin genes have been documented within a few viruses, yet, surprisingly, analysis of metagenome sequence data indicated that chaperonin-carrying viruses are common and geographically widespread in marine ecosystems. Also unexpected was the discovery of viral chaperonin sequences related to thermosome proteins of archaea, indicating the presence of virioplankton populations infecting marine archaeal hosts. Virioplankton large subunit chaperonin sequences (GroELs) were divergent from bacterial sequences, indicating that viruses have carried this gene over long evolutionary time. Analysis of viral metagenome contigs indicated that: the order of large and small subunit genes was linked to the phylogeny of GroEL; both lytic and temperate phages may carry group I chaperonin genes; and viruses carrying a GroEL gene likely have large double-stranded DNA (dsDNA) genomes (>70 kb). Given these connections, it is likely that chaperonins are critical to the biology and ecology of virioplankton populations that carry these genes. Moreover, these discoveries raise the intriguing possibility that viral chaperonins may more broadly alter the structure and function of viral and cellular proteins in infected host cells. PMID:28731469
Marine, Rachel L; Nasko, Daniel J; Wray, Jeffrey; Polson, Shawn W; Wommack, K Eric
2017-11-01
Chaperonins are protein-folding machinery found in all cellular life. Chaperonin genes have been documented within a few viruses, yet, surprisingly, analysis of metagenome sequence data indicated that chaperonin-carrying viruses are common and geographically widespread in marine ecosystems. Also unexpected was the discovery of viral chaperonin sequences related to thermosome proteins of archaea, indicating the presence of virioplankton populations infecting marine archaeal hosts. Virioplankton large subunit chaperonin sequences (GroELs) were divergent from bacterial sequences, indicating that viruses have carried this gene over long evolutionary time. Analysis of viral metagenome contigs indicated that: the order of large and small subunit genes was linked to the phylogeny of GroEL; both lytic and temperate phages may carry group I chaperonin genes; and viruses carrying a GroEL gene likely have large double-stranded DNA (dsDNA) genomes (>70 kb). Given these connections, it is likely that chaperonins are critical to the biology and ecology of virioplankton populations that carry these genes. Moreover, these discoveries raise the intriguing possibility that viral chaperonins may more broadly alter the structure and function of viral and cellular proteins in infected host cells.
Understanding the Origin of Species with Genome-Scale Data: the Role of Gene Flow
Sousa, Vitor; Hey, Jody
2017-01-01
As it becomes easier to sequence multiple genomes from closely related species, evolutionary biologists working on speciation are struggling to get the most out of very large population-genomic data sets. Such data hold the potential to resolve evolutionary biology’s long-standing questions about the role of gene exchange in species formation. In principle the new population genomic data can be used to disentangle the conflicting roles of natural selection and gene flow during the divergence process. However there are great challenges in taking full advantage of such data, especially with regard to including recombination in genetic models of the divergence process. Current data, models, methods and the potential pitfalls in using them will be considered here. PMID:23657479
Tseng, Shu-Ping; Li, Shou-Hsien; Hsieh, Chia-Hung; Wang, Hurng-Yi; Lin, Si-Min
2014-10-01
Dating the time of divergence and understanding speciation processes are central to the study of the evolutionary history of organisms but are notoriously difficult. The difficulty is largely rooted in variations in the ancestral population size or in the genealogy variation across loci. To depict the speciation processes and divergence histories of three monophyletic Takydromus species endemic to Taiwan, we sequenced 20 nuclear loci and combined with one mitochondrial locus published in GenBank. They were analysed by a multispecies coalescent approach within a Bayesian framework. Divergence dating based on the gene tree approach showed high variation among loci, and the divergence was estimated at an earlier date than when derived by the species-tree approach. To test whether variations in the ancestral population size accounted for the majority of this variation, we conducted computer inferences using isolation-with-migration (IM) and approximate Bayesian computation (ABC) frameworks. The results revealed that gene flow during the early stage of speciation was strongly favoured over the isolation model, and the initiation of the speciation process was far earlier than the dates estimated by gene- and species-based divergence dating. Due to their limited dispersal ability, it is suggested that geographical isolation may have played a major role in the divergence of these Takydromus species. Nevertheless, this study reveals a more complex situation and demonstrates that gene flow during the speciation process cannot be overlooked and may have a great impact on divergence dating. By using multilocus data and incorporating Bayesian coalescence approaches, we provide a more biologically realistic framework for delineating the divergence history of Takydromus. © 2014 John Wiley & Sons Ltd.
Ashfaq, Muhammad; Prosser, Sean; Nasir, Saima; Masood, Mariyam; Ratnasingham, Sujeevan; Hebert, Paul D. N.
2015-01-01
The study analyzes sequence variation of two mitochondrial genes (COI, cytb) in Pediculus humanus from three countries (Egypt, Pakistan, South Africa) that have received little prior attention, and integrates these results with prior data. Analysis indicates a maximum K2P distance of 10.3% among 960 COI sequences and 13.8% among 479 cytb sequences. Three analytical methods (BIN, PTP, ABGD) reveal five concordant OTUs for COI and cytb. Neighbor-Joining analysis of the COI sequences confirm five clusters; three corresponding to previously recognized mitochondrial clades A, B, C and two new clades, “D” and “E”, showing 2.3% and 2.8% divergence from their nearest neighbors (NN). Cytb data corroborate five clusters showing that clades “D” and “E” are both 4.6% divergent from their respective NN clades. Phylogenetic analysis supports the monophyly of all clusters recovered by NJ analysis. Divergence time estimates suggest that the earliest split of P. humanus clades occured slightly more than one million years ago (MYa) and the latest about 0.3 MYa. Sequence divergences in COI and cytb among the five clades of P. humanus are 10X those in their human host, a difference that likely reflects both rate acceleration and the acquisition of lice clades from several archaic hominid lineages. PMID:26373806
A novel, highly divergent ssDNA virus identified in Brazil infecting apple, pear and grapevine.
Basso, Marcos Fernando; da Silva, José Cleydson Ferreira; Fajardo, Thor Vinícius Martins; Fontes, Elizabeth Pacheco Batista; Zerbini, Francisco Murilo
2015-12-02
Fruit trees of temperate and tropical climates are of great economical importance worldwide and several viruses have been reported affecting their productivity and longevity. Fruit trees of different Brazilian regions displaying virus-like symptoms were evaluated for infection by circular DNA viruses. Seventy-four fruit trees were sampled and a novel, highly divergent, monopartite circular ssDNA virus was cloned from apple, pear and grapevine trees. Forty-five complete viral genomes were sequenced, with a size of approx. 3.4 kb and organized into five ORFs. Deduced amino acid sequences showed identities in the range of 38% with unclassified circular ssDNA viruses, nanoviruses and alphasatellites (putative Replication-associated protein, Rep), and begomo-, curto- and mastreviruses (putative coat protein, CP, and movement protein, MP). A large intergenic region contains a short palindromic sequence capable of forming a hairpin-like structure with the loop sequence TAGTATTAC, identical to the conserved nonanucleotide of circoviruses, nanoviruses and alphasatellites. Recombination events were not detected and phylogenetic analysis showed a relationship with circo-, nano- and geminiviruses. PCR confirmed the presence of this novel ssDNA virus in field plants. Infectivity tests using the cloned viral genome confirmed its ability to infect apple and pear tree seedlings, but not Nicotiana benthamiana. The name "Temperate fruit decay-associated virus" (TFDaV) is proposed for this novel virus. Copyright © 2015 Elsevier B.V. All rights reserved.
Powers, T. O.; Harris, T. S.; Hyman, B. C.
1993-01-01
Mitochondrial DNA sequences were obtained from the NADH dehydrogenase subunit 3 (ND3), large rRNA, and cytochrome b genes from Meloidogyne incognita and Romanomermis culicivorax. Both species show considerable genetic distance within these same genes when compared with Caenorhabditis elegans or Ascaris suum, two species previously analyzed. Caenorhabditis, Ascaris, and Meloidogyne were selected as representatives of three subclasses in the nematode class Secernentea: Rhabditia, Spiruria, and Diplogasteria, respectively. Romanomermis served as a representative out-group of the class Adenophorea. The divergence between the phytoparasitic lineage (represented by Meloidogyne) and the three other species is so great that virtually every variable position in these genes appears to have accumulated multiple mutations, obscuring the phylogenetic information obtainable from these comparisons. The 39 and 42% amino acid similarity between the M. incognita and C. elegans ND3 and cytochrome b coding sequences, respectively, are approximately the same as those of C. elegans-mouse comparisons for the same genes (26 and 44%). This discovery calls into question the feasibility of employing cloned C. elegans probes as reagents to isolate phytoparasitic nematode genes. The genetic distance between the phytoparasitic nematode lineage and C. elegans markedly contrasts with the 79% amino acid similarity between C. elegans and A. suum for the same sequences. The molecular data suggest that Caenorhabditis and Ascaris belong to the same subclass. PMID:19279810
Bernard, Guillaume; Chan, Cheong Xin; Ragan, Mark A
2016-07-01
Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution.
Limborg, Morten T.; Larson, Wesley; Shedd, Kyle; Seeb, Lisa W.; Seeb, James E.
2017-01-01
Preservation of heritable ecological diversity within species and populations is a key challenge for managing natural resources and wild populations. Salmonid fish are iconic and socio-economically important species for commercial, aquaculture, and recreational fisheries across the globe. Many salmonids are known to exhibit ecological divergence within species, including distinct feeding ecotypes within the same lakes. Here we used 5559 SNPs, derived from RAD sequencing, to perform population genetic comparisons between two dietary ecotypes of sockeye salmon (Oncorhynchus nerka) in Jo-Jo Lake, Alaska (USA). We tested the standing hypothesis that these two ecotypes are currently diverging as a result of adaptation to distinct dietary niches; results support earlier conclusions of a single panmictic population. The RAD sequence data revealed 40 new SNPs not previously detected in the species, and our sequence data can be used in future studies of ecotypic diversity in salmonid species.
Rapid rate of control-region evolution in Pacific butterflyfishes (Chaetodontidae).
McMillan, W O; Palumbi, S R
1997-11-01
Sequence differences in the tRNA-proline (tRNApro) end of the mitochondrial control-region of three species of Pacific butterflyfishes accumulated 33-43 times more rapidly than did changes within the mitochondrial cytochrome b gene (cytb). Rapid evolution in this region was accompanied by strong transition/transversion bias and large variation in the probability of a DNA substitution among sites. These substitution constraints placed an absolute ceiling on the magnitude of sequence divergence that could be detected between individuals. This divergence "ceiling" was reached rapidly and led to a decay in the relative rate of control-region/cytb b evolution. A high rate of evolution in this section of the control-region of butterflyfishes stands in marked contrast to the patterns reported in some other fish lineages. Although the mechanism underlying rate variation remains unclear, all taxa with rapid evolution in the 5'-end of the control-region showed extreme transition biases. By contrast, in taxa with slower control-region evolution, transitions accumulated at nearly the same rate as transversions. More information is needed to understand the relationship between nucleotide bias and the rate of evolution in the 5'-end of the control-region. Despite strong constraints on sequence change, phylogenetic information was preserved in the group of recently differentiated species and supported the clustering of sequences into three major mtDNA groupings. Within these groups, very similar control-region sequences were widely distributed across the Pacific Ocean and were shared between recognized species, indicating a lack of mitochondrial sequence monophyly among species.
Irizarry, Kristopher J L; Downs, Eileen; Bryden, Randall; Clark, Jory; Griggs, Lisa; Kopulos, Renee; Boettger, Cynthia M; Carr, Thomas J; Keeler, Calvin L; Collisson, Ellen; Drechsler, Yvonne
2017-01-01
Discovering genetic biomarkers associated with disease resistance and enhanced immunity is critical to developing advanced strategies for controlling viral and bacterial infections in different species. Macrophages, important cells of innate immunity, are directly involved in cellular interactions with pathogens, the release of cytokines activating other immune cells and antigen presentation to cells of the adaptive immune response. IFNγ is a potent activator of macrophages and increased production has been associated with disease resistance in several species. This study characterizes the molecular basis for dramatically different nitric oxide production and immune function between the B2 and the B19 haplotype chicken macrophages.A large-scale RNA sequencing approach was employed to sequence the RNA of purified macrophages from each haplotype group (B2 vs. B19) during differentiation and after stimulation. Our results demonstrate that a large number of genes exhibit divergent expression between B2 and B19 haplotype cells both prior and after stimulation. These differences in gene expression appear to be regulated by complex epigenetic mechanisms that need further investigation.
Molecular phylogenetic analysis of non-sexually transmitted strains of Haemophilus ducreyi.
Gaston, Jordan R; Roberts, Sally A; Humphreys, Tricia L
2015-01-01
Haemophilus ducreyi, the etiologic agent of chancroid, has been previously reported to show genetic variance in several key virulence factors, placing strains of the bacterium into two genetically distinct classes. Recent studies done in yaws-endemic areas of the South Pacific have shown that H. ducreyi is also a major cause of cutaneous limb ulcers (CLU) that are not sexually transmitted. To genetically assess CLU strains relative to the previously described class I, class II phylogenetic hierarchy, we examined nucleotide sequence diversity at 11 H. ducreyi loci, including virulence and housekeeping genes, which encompass approximately 1% of the H. ducreyi genome. Sequences for all 11 loci indicated that strains collected from leg ulcers exhibit DNA sequences homologous to class I strains of H. ducreyi. However, sequences for 3 loci, including a hemoglobin receptor (hgbA), serum resistance protein (dsrA), and a collagen adhesin (ncaA) contained informative amounts of variation. Phylogenetic analyses suggest that these non-sexually transmitted strains of H. ducreyi comprise a sub-clonal population within class I strains of H. ducreyi. Molecular dating suggests that CLU strains are the most recently developed, having diverged approximately 0.355 million years ago, fourteen times more recently than the class I/class II divergence. The CLU strains' divergence falls after the divergence of humans from chimpanzees, making it the first known H. ducreyi divergence event directly influenced by the selective pressures accompanying human hosts.
Kaliszewska, Zofia A; Seger, Jon; Rowntree, Victoria J; Barco, Susan G; Benegas, Rafael; Best, Peter B; Brown, Moira W; Brownell, Robert L; Carribero, Alejandro; Harcourt, Robert; Knowlton, Amy R; Marshall-Tilas, Kim; Patenaude, Nathalie J; Rivarola, Mariana; Schaeff, Catherine M; Sironi, Mariano; Smith, Wendy A; Yamada, Tadasu K
2005-10-01
Right whales carry large populations of three 'whale lice' (Cyamus ovalis, Cyamus gracilis, Cyamus erraticus) that have no other hosts. We used sequence variation in the mitochondrial COI gene to ask (i) whether cyamid population structures might reveal associations among right whale individuals and subpopulations, (ii) whether the divergences of the three nominally conspecific cyamid species on North Atlantic, North Pacific, and southern right whales (Eubalaena glacialis, Eubalaena japonica, Eubalaena australis) might indicate their times of separation, and (iii) whether the shapes of cyamid gene trees might contain information about changes in the population sizes of right whales. We found high levels of nucleotide diversity but almost no population structure within oceans, indicating large effective population sizes and high rates of transfer between whales and subpopulations. North Atlantic and Southern Ocean populations of all three species are reciprocally monophyletic, and North Pacific C. erraticus is well separated from North Atlantic and southern C. erraticus. Mitochondrial clock calibrations suggest that these divergences occurred around 6 million years ago (Ma), and that the Eubalaena mitochondrial clock is very slow. North Pacific C. ovalis forms a clade inside the southern C. ovalis gene tree, implying that at least one right whale has crossed the equator in the Pacific Ocean within the last 1-2 million years (Myr). Low-frequency polymorphisms are more common than expected under neutrality for populations of constant size, but there is no obvious signal of rapid, interspecifically congruent expansion of the kind that would be expected if North Atlantic or southern right whales had experienced a prolonged population bottleneck within the last 0.5 Myr.
Comparative analysis of gene regulatory networks: from network reconstruction to evolution.
Thompson, Dawn; Regev, Aviv; Roy, Sushmita
2015-01-01
Regulation of gene expression is central to many biological processes. Although reconstruction of regulatory circuits from genomic data alone is therefore desirable, this remains a major computational challenge. Comparative approaches that examine the conservation and divergence of circuits and their components across strains and species can help reconstruct circuits as well as provide insights into the evolution of gene regulatory processes and their adaptive contribution. In recent years, advances in genomic and computational tools have led to a wealth of methods for such analysis at the sequence, expression, pathway, module, and entire network level. Here, we review computational methods developed to study transcriptional regulatory networks using comparative genomics, from sequence to functional data. We highlight how these methods use evolutionary conservation and divergence to reliably detect regulatory components as well as estimate the extent and rate of divergence. Finally, we discuss the promise and open challenges in linking regulatory divergence to phenotypic divergence and adaptation.
Taylor, William R.; Gibbs, Melanie; Breuker, Casper J.; Holland, Peter W. H.
2014-01-01
Gene duplications within the conserved Hox cluster are rare in animal evolution, but in Lepidoptera an array of divergent Hox-related genes (Shx genes) has been reported between pb and zen. Here, we use genome sequencing of five lepidopteran species (Polygonia c-album, Pararge aegeria, Callimorpha dominula, Cameraria ohridella, Hepialus sylvina) plus a caddisfly outgroup (Glyphotaelius pellucidus) to trace the evolution of the lepidopteran Shx genes. We demonstrate that Shx genes originated by tandem duplication of zen early in the evolution of large clade Ditrysia; Shx are not found in a caddisfly and a member of the basally diverging Hepialidae (swift moths). Four distinct Shx genes were generated early in ditrysian evolution, and were stably retained in all descendent Lepidoptera except the silkmoth which has additional duplications. Despite extensive sequence divergence, molecular modelling indicates that all four Shx genes have the potential to encode stable homeodomains. The four Shx genes have distinct spatiotemporal expression patterns in early development of the Speckled Wood butterfly (Pararge aegeria), with ShxC demarcating the future sites of extraembryonic tissue formation via strikingly localised maternal RNA in the oocyte. All four genes are also expressed in presumptive serosal cells, prior to the onset of zen expression. Lepidopteran Shx genes represent an unusual example of Hox cluster expansion and integration of novel genes into ancient developmental regulatory networks. PMID:25340822
Determining the Effect of Natural Selection on Linked Neutral Divergence across Species
Phung, Tanya N.; Lohmueller, Kirk E.
2016-01-01
A major goal in evolutionary biology is to understand how natural selection has shaped patterns of genetic variation across genomes. Studies in a variety of species have shown that neutral genetic diversity (intra-species differences) has been reduced at sites linked to those under direct selection. However, the effect of linked selection on neutral sequence divergence (inter-species differences) remains ambiguous. While empirical studies have reported correlations between divergence and recombination, which is interpreted as evidence for natural selection reducing linked neutral divergence, theory argues otherwise, especially for species that have diverged long ago. Here we address these outstanding issues by examining whether natural selection can affect divergence between both closely and distantly related species. We show that neutral divergence between closely related species (e.g. human-primate) is negatively correlated with functional content and positively correlated with human recombination rate. We also find that neutral divergence between distantly related species (e.g. human-rodent) is negatively correlated with functional content and positively correlated with estimates of background selection from primates. These patterns persist after accounting for the confounding factors of hypermutable CpG sites, GC content, and biased gene conversion. Coalescent models indicate that even when the contribution of ancestral polymorphism to divergence is small, background selection in the ancestral population can still explain a large proportion of the variance in divergence across the genome, generating the observed correlations. Our findings reveal that, contrary to previous intuition, natural selection can indirectly affect linked neutral divergence between both closely and distantly related species. Though we cannot formally exclude the possibility that the direct effects of purifying selection drive some of these patterns, such a scenario would be possible only if more of the genome is under purifying selection than currently believed. Our work has implications for understanding the evolution of genomes and interpreting patterns of genetic variation. PMID:27508305
Determining the Effect of Natural Selection on Linked Neutral Divergence across Species.
Phung, Tanya N; Huber, Christian D; Lohmueller, Kirk E
2016-08-01
A major goal in evolutionary biology is to understand how natural selection has shaped patterns of genetic variation across genomes. Studies in a variety of species have shown that neutral genetic diversity (intra-species differences) has been reduced at sites linked to those under direct selection. However, the effect of linked selection on neutral sequence divergence (inter-species differences) remains ambiguous. While empirical studies have reported correlations between divergence and recombination, which is interpreted as evidence for natural selection reducing linked neutral divergence, theory argues otherwise, especially for species that have diverged long ago. Here we address these outstanding issues by examining whether natural selection can affect divergence between both closely and distantly related species. We show that neutral divergence between closely related species (e.g. human-primate) is negatively correlated with functional content and positively correlated with human recombination rate. We also find that neutral divergence between distantly related species (e.g. human-rodent) is negatively correlated with functional content and positively correlated with estimates of background selection from primates. These patterns persist after accounting for the confounding factors of hypermutable CpG sites, GC content, and biased gene conversion. Coalescent models indicate that even when the contribution of ancestral polymorphism to divergence is small, background selection in the ancestral population can still explain a large proportion of the variance in divergence across the genome, generating the observed correlations. Our findings reveal that, contrary to previous intuition, natural selection can indirectly affect linked neutral divergence between both closely and distantly related species. Though we cannot formally exclude the possibility that the direct effects of purifying selection drive some of these patterns, such a scenario would be possible only if more of the genome is under purifying selection than currently believed. Our work has implications for understanding the evolution of genomes and interpreting patterns of genetic variation.
Phylogenomic evidence for ancient hybridization in the genomes of living cats (Felidae)
Li, Gang; Davis, Brian W.; Eizirik, Eduardo; Murphy, William J.
2016-01-01
Inter-species hybridization has been recently recognized as potentially common in wild animals, but the extent to which it shapes modern genomes is still poorly understood. Distinguishing historical hybridization events from other processes leading to phylogenetic discordance among different markers requires a well-resolved species tree that considers all modes of inheritance and overcomes systematic problems due to rapid lineage diversification by sampling large genomic character sets. Here, we assessed genome-wide phylogenetic variation across a diverse mammalian family, Felidae (cats). We combined genotypes from a genome-wide SNP array with additional autosomal, X- and Y-linked variants to sample ∼150 kb of nuclear sequence, in addition to complete mitochondrial genomes generated using light-coverage Illumina sequencing. We present the first robust felid time tree that accounts for unique maternal, paternal, and biparental evolutionary histories. Signatures of phylogenetic discordance were abundant in the genomes of modern cats, in many cases indicating hybridization as the most likely cause. Comparison of big cat whole-genome sequences revealed a substantial reduction of X-linked divergence times across several large recombination cold spots, which were highly enriched for signatures of selection-driven post-divergence hybridization between the ancestors of the snow leopard and lion lineages. These results highlight the mosaic origin of modern felid genomes and the influence of sex chromosomes and sex-biased dispersal in post-speciation gene flow. A complete resolution of the tree of life will require comprehensive genomic sampling of biparental and sex-limited genetic variation to identify and control for phylogenetic conflict caused by ancient admixture and sex-biased differences in genomic transmission. PMID:26518481
Akiva, Eyal; Copp, Janine N.; Tokuriki, Nobuhiko; Babbitt, Patricia C.
2017-01-01
Insight regarding how diverse enzymatic functions and reactions have evolved from ancestral scaffolds is fundamental to understanding chemical and evolutionary biology, and for the exploitation of enzymes for biotechnology. We undertook an extensive computational analysis using a unique and comprehensive combination of tools that include large-scale phylogenetic reconstruction to determine the sequence, structural, and functional relationships of the functionally diverse flavin mononucleotide-dependent nitroreductase (NTR) superfamily (>24,000 sequences from all domains of life, 54 structures, and >10 enzymatic functions). Our results suggest an evolutionary model in which contemporary subgroups of the superfamily have diverged in a radial manner from a minimal flavin-binding scaffold. We identified the structural design principle for this divergence: Insertions at key positions in the minimal scaffold that, combined with the fixation of key residues, have led to functional specialization. These results will aid future efforts to delineate the emergence of functional diversity in enzyme superfamilies, provide clues for functional inference for superfamily members of unknown function, and facilitate rational redesign of the NTR scaffold. PMID:29078300
Trapnell, Cole; Davidson, Stuart; Pachter, Lior; Chu, Hou Cheng; Tonkin, Leath A.; Biggin, Mark D.; Eisen, Michael B.
2010-01-01
Changes in gene expression play an important role in evolution, yet the molecular mechanisms underlying regulatory evolution are poorly understood. Here we compare genome-wide binding of the six transcription factors that initiate segmentation along the anterior-posterior axis in embryos of two closely related species: Drosophila melanogaster and Drosophila yakuba. Where we observe binding by a factor in one species, we almost always observe binding by that factor to the orthologous sequence in the other species. Levels of binding, however, vary considerably. The magnitude and direction of the interspecies differences in binding levels of all six factors are strongly correlated, suggesting a role for chromatin or other factor-independent forces in mediating the divergence of transcription factor binding. Nonetheless, factor-specific quantitative variation in binding is common, and we show that it is driven to a large extent by the gain and loss of cognate recognition sequences for the given factor. We find only a weak correlation between binding variation and regulatory function. These data provide the first genome-wide picture of how modest levels of sequence divergence between highly morphologically similar species affect a system of coordinately acting transcription factors during animal development, and highlight the dominant role of quantitative variation in transcription factor binding over short evolutionary distances. PMID:20351773
Schönberg, Anna; Theunert, Christoph; Li, Mingkun; Stoneking, Mark; Nasidze, Ivan
2011-09-01
To investigate the demographic history of human populations from the Caucasus and surrounding regions, we used high-throughput sequencing to generate 147 complete mtDNA genome sequences from random samples of individuals from three groups from the Caucasus (Armenians, Azeri and Georgians), and one group each from Iran and Turkey. Overall diversity is very high, with 144 different sequences that fall into 97 different haplogroups found among the 147 individuals. Bayesian skyline plots (BSPs) of population size change through time show a population expansion around 40-50 kya, followed by a constant population size, and then another expansion around 15-18 kya for the groups from the Caucasus and Iran. The BSP for Turkey differs the most from the others, with an increase from 35 to 50 kya followed by a prolonged period of constant population size, and no indication of a second period of growth. An approximate Bayesian computation approach was used to estimate divergence times between each pair of populations; the oldest divergence times were between Turkey and the other four groups from the South Caucasus and Iran (~400-600 generations), while the divergence time of the three Caucasus groups from each other was comparable to their divergence time from Iran (average of ~360 generations). These results illustrate the value of random sampling of complete mtDNA genome sequences that can be obtained with high-throughput sequencing platforms.
Schwentner, Martin; Timms, Brian V; Richter, Stefan
2012-01-01
Temporary water bodies are important freshwater habitats in the arid zone of Australia. They harbor a distinct fauna and provide important feeding and breeding grounds for water birds. This paper assesses, on the basis of haplotype networks, analyses of molecular variation and relaxed molecular clock divergence time estimates, the phylogeographic history, and population structure of four common temporary water species of the Australian endemic clam shrimp taxon Limnadopsis in eastern and central Australia (an area of >1,350,000 km2). Mitochondrial cytochrome c oxidase subunit I sequences of 413 individuals and a subset of 63 nuclear internal transcribed spacer 2 sequences were analyzed. Genetic differentiation was observed between populations inhabiting southeastern and central Australia and those inhabiting the northern Lake Eyre Basin and Western Australia. However, over large parts of the study area and across river drainage systems in southeastern and central Australia (the Murray–Darling Basin, Bulloo River, and southern Lake Eyre Basin), no evidence of population subdivision was observed in any of the four Limnadopsis species. This indicates recent gene flow across an area of ∼800,000 km2. This finding contrasts with patterns observed in other Australian arid zone taxa, particularly freshwater species, whose populations are often structured according to drainage systems. The lack of genetic differentiation within the area in question may be linked to the huge number of highly nomadic water birds that potentially disperse the resting eggs of Limnadopsis among temporary water bodies. Genetically undifferentiated populations on a large geographic scale contrast starkly with findings for many other large branchiopods in other parts of the world, where pronounced genetic structure is often observed even in populations inhabiting pools separated by a few kilometers. Due to its divergent genetic lineages (up to 5.6% uncorrected p-distance) and the relaxed molecular clock divergence time estimates obtained, Limnadopsis parvispinus is assumed to have inhabited the Murray–Darling Basin continuously since the mid-Pliocene (∼4 million years ago). This means that suitable temporary water bodies would have existed in this area throughout the wet–dry cycles of the Pleistocene. PMID:22957166
Skoglund, Pontus; Götherström, Anders; Jakobsson, Mattias
2011-04-01
Despite recent technological advances in DNA sequencing, incomplete coverage remains to be an issue in population genomics, in particular for studies that include ancient samples. Here, we describe an approach to estimate population divergence times for non-overlapping sequence data that is based on probabilities of different genealogical topologies under a structured coalescent model. We show that the approach can be adapted to accommodate common problems such as sequencing errors and postmortem nucleotide misincorporations, and we use simulations to investigate biases involved with estimating genealogical topologies from empirical data. The approach relies on three reference genomes and should be particularly useful for future analysis of genomic data that comprise of nonoverlapping sets of sequences, potentially from different points in time. We applied the method to shotgun sequence data from an ancient wolf together with extant dogs and wolves and found striking resemblance to previously described fine-scale population structure among dog breeds. When comparing modern dogs to four geographically distinct wolves, we find that the divergence time between dogs and an Indian wolf is smallest, followed by the divergence times to a Chinese wolf and a Spanish wolf, and a relatively long divergence time to an Alaskan wolf, suggesting that the origin of modern dogs is somewhere in Eurasia, potentially southern Asia. We find that less than two-thirds of all loci in the boxer and poodle genomes are more similar to each other than to a modern gray wolf and that--assuming complete isolation without gene flow--the divergence time between gray wolves and modern European dogs extends to 3,500 generations before the present, corresponding to approximately 10,000 years ago (95% confidence interval [CI]: 9,000-13,000). We explicitly study the effect of gene flow between dogs and wolves on our estimates and show that a low rate of gene flow is compatible with an even earlier domestication date ∼30,000 years ago (95% CI: 15,000-90,000). This observation is in agreement with recent archaeological findings and indicates that human behavior necessary for domestication of wild animals could have appeared much earlier than the development of agriculture.
Yasukochi, Yoshiki; Naka, Izumi; Patarapotikul, Jintana; Hananantachai, Hathairad; Ohashi, Jun
2015-08-01
The 175-kDa erythrocyte binding antigen (EBA-175) of Plasmodium falciparum plays a crucial role in merozoite invasion into human erythrocytes. EBA-175 is believed to have been under diversifying selection; however, there have been no studies investigating the effect of dispersal of humans out of Africa on the genetic variation of EBA-175 in P. falciparum. The PCR-direct sequencing was performed for a part of the eba-175 gene (regions II and III) using DNA samples obtained from Thai patients infected with P. falciparum. The divergence times for the P. falciparum eba-175 alleles were estimated assuming that P. falciparum/Plasmodium reichenowi divergence occurred 6 million years ago (MYA). To examine the possibility of diversifying selection, nonsynonymous and synonymous substitution rates for Plasmodium species were also estimated. A total of 32 eba-175 alleles were identified from 131 Thai P. falciparum isolates. Their estimated divergence time was 0.13-0.14 MYA, before the exodus of humans from Africa. A phylogenetic tree for a large sequence dataset of P. falciparum eba-175 alleles from across the world showed the presence of a basal Asian-specific cluster for all P. falciparum sequences. A markedly more nonsynonymous substitutions than synonymous substitutions in region II in P. falciparum was also detected, but not within Plasmodium species parasitizing African apes, suggesting that diversifying selection has acted specifically on P. falciparum eba-175. Plasmodium falciparum eba-175 genetic diversity appeared to increase following the exodus of Asian ancestors from Africa. Diversifying selection may have played an important role in the diversification of eba-175 allelic lineages. The present results suggest that the dispersals of humans out of Africa influenced significantly the molecular evolution of P. falciparum EBA-175.
Divergent positive selection in rhodopsin from lake and riverine cichlid fishes.
Schott, Ryan K; Refvik, Shannon P; Hauser, Frances E; López-Fernández, Hernán; Chang, Belinda S W
2014-05-01
Studies of cichlid evolution have highlighted the importance of visual pigment genes in the spectacular radiation of the African rift lake cichlids. Recent work, however, has also provided strong evidence for adaptive diversification of riverine cichlids in the Neotropics, which inhabit environments of markedly different spectral properties from the African rift lakes. These ecological and/or biogeographic differences may have imposed divergent selective pressures on the evolution of the cichlid visual system. To test these hypotheses, we investigated the molecular evolution of the dim-light visual pigment, rhodopsin. We sequenced rhodopsin from Neotropical and African riverine cichlids and combined these data with published sequences from African cichlids. We found significant evidence for positive selection using random sites codon models in all cichlid groups, with the highest levels in African lake cichlids. Tests using branch-site and clade models that partitioned the data along ecological (lake, river) and/or biogeographic (African, Neotropical) boundaries found significant evidence of divergent selective pressures among cichlid groups. However, statistical comparisons among these models suggest that ecological, rather than biogeographic, factors may be responsible for divergent selective pressures that have shaped the evolution of the visual system in cichlids. We found that branch-site models did not perform as well as clade models for our data set, in which there was evidence for positive selection in the background. One of our most intriguing results is that the amino acid sites found to be under positive selection in Neotropical and African lake cichlids were largely nonoverlapping, despite falling into the same three functional categories: spectral tuning, retinal uptake/release, and rhodopsin dimerization. Taken together, these results would imply divergent selection across cichlid clades, but targeting similar functions. This study highlights the importance of molecular investigations of ecologically important groups and the flexibility of clade models in explicitly testing ecological hypotheses.
Nijman, Vincent; Aliabadian, Mansour
2013-11-01
The mitochondrial cytochrome c-oxidase subunit I (cox1) can serve as a fast and accurate marker for the identification of animal species, and for the discovery of new species across the tree of life. Distinguishing species using this universal molecular marker, a technique known as DNA barcoding, relies on the identifying the gap between intra- and interspecific divergence. One of the difficulties could be wide-ranging, cosmopolitan species that show large amounts of morphological variation. The barn owl Tyto alba is a case in point. It occurs worldwide and varies morphologically, leading to the recognition of many subspecies or, more recently, species. We analysed data from the cox1 gene for 31 individuals of seven subspecies, and compared this with 214 sequences from 29 other owl species. Phylogenetic analysis of the T. alba samples gives very strong support for an Old World alba-clade (three subspecies) and a New World furcata-clade (four subspecies) that are genetically equidistant. The amount of intraspecific variation within each of these clades ranges from 0.66-0.99%, but variation among these clades ranges from 5.33-6.20%. Combined these data suggest that barn owl of the Old World is indeed best considered a separate species different from that of the New World. For combined dataset, sample size of owl species (n between 1 and 21 sequences) increased with geographic range size but we did not find significant relationships between interspecific divergence and sample size or between interspecific divergence and geographic range. For 21/24 species of owls with sample sizes of n ≥4 the maximum interspecific divergences was ≤ 3.00%. However, similar to those found in barn owls, the largest amount of divergence (3.23-4.09%) was present in two other wide-ranging species (Strix nebulosa and Aegolius funereus) raising the possibility of multiple species in other wide-ranging owls as well.
Richards, Stephen; Liu, Yue; Bettencourt, Brian R.; Hradecky, Pavel; Letovsky, Stan; Nielsen, Rasmus; Thornton, Kevin; Hubisz, Melissa J.; Chen, Rui; Meisel, Richard P.; Couronne, Olivier; Hua, Sujun; Smith, Mark A.; Zhang, Peili; Liu, Jing; Bussemaker, Harmen J.; van Batenburg, Marinus F.; Howells, Sally L.; Scherer, Steven E.; Sodergren, Erica; Matthews, Beverly B.; Crosby, Madeline A.; Schroeder, Andrew J.; Ortiz-Barrientos, Daniel; Rives, Catharine M.; Metzker, Michael L.; Muzny, Donna M.; Scott, Graham; Steffen, David; Wheeler, David A.; Worley, Kim C.; Havlak, Paul; Durbin, K. James; Egan, Amy; Gill, Rachel; Hume, Jennifer; Morgan, Margaret B.; Miner, George; Hamilton, Cerissa; Huang, Yanmei; Waldron, Lenée; Verduzco, Daniel; Clerc-Blankenburg, Kerstin P.; Dubchak, Inna; Noor, Mohamed A.F.; Anderson, Wyatt; White, Kevin P.; Clark, Andrew G.; Schaeffer, Stephen W.; Gelbart, William; Weinstock, George M.; Gibbs, Richard A.
2005-01-01
We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each arm gene order has been extensively reshuffled, leading to a minimum of 921 syntenic blocks shared between the species. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 25–55 million years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences between the species—but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila. PMID:15632085
2010-01-01
Background Cryptic species complexes are common among anophelines. Previous phylogenetic analysis based on the complete mtDNA COI gene sequences detected paraphyly in the Neotropical malaria vector Anopheles marajoara. The "Folmer region" detects a single taxon using a 3% divergence threshold. Methods To test the paraphyletic hypothesis and examine the utility of the Folmer region, genealogical trees based on a concatenated (white + 3' COI sequences) dataset and pairwise differentiation of COI fragments were examined. The population structure and demographic history were based on partial COI sequences for 294 individuals from 14 localities in Amazonian Brazil. 109 individuals from 12 localities were sequenced for the nDNA white gene, and 57 individuals from 11 localities were sequenced for the ribosomal DNA (rDNA) internal transcribed spacer 2 (ITS2). Results Distinct A. marajoara lineages were detected by combined genealogical analysis and were also supported among COI haplotypes using a median joining network and AMOVA, with time since divergence during the Pleistocene (<100,000 ya). COI sequences at the 3' end were more variable, demonstrating significant pairwise differentiation (3.82%) compared to the more moderate 2.92% detected by the Folmer region. Lineage 1 was present in all localities, whereas lineage 2 was restricted mainly to the west. Mismatch distributions for both lineages were bimodal, likely due to multiple colonization events and spatial expansion (~798 - 81,045 ya). There appears to be gene flow within, not between lineages, and a partial barrier was detected near Rio Jari in Amapá state, separating western and eastern populations. In contrast, both nDNA data sets (white gene sequences with or without the retention of the 4th intron, and ITS2 sequences and length) detected a single A. marajoara lineage. Conclusions Strong support for combined data with significant differentiation detected in the COI and absent in the nDNA suggest that the divergence is recent, and detectable only by the faster evolving mtDNA. A within subgenus threshold of >2% may be more appropriate among sister taxa in cryptic anopheline complexes than the standard 3%. Differences in demographic history and climatic changes may have contributed to mtDNA lineage divergence in A. marajoara. PMID:20929572
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sakoyama, Y.; Hong, K.J.; Byun, S.M.
To determine the phylogenetic relationships among hominoids and the dates of their divergence, the complete nucleotide sequences of the constant region of the immunoglobulin eta-chain (C/sub eta1/) genes from chimpanzee and orangutan have been determined. These sequences were compared with the human eta-chain constant-region sequence. A molecular clock (silent molecular clock), measured by the degree of sequence divergence at the synonymous (silent) positions of protein-encoding regions, was introduced for the present study. From the comparison of nucleotide sequences of ..cap alpha../sub 1/-antitrypsin and ..beta..- and delta-globulin genes between humans and Old World monkeys, the silent molecular clock was calibrated: themore » mean evolutionary rate of silent substitution was determined to be 1.56 x 10/sup -9/ substitutions per site per year. Using the silent molecular clock, the mean divergence dates of chimpanzee and orangutan from the human lineage were estimated as 6.4 +/- 2.6 million years and 17.3 +/- 4.5 million years, respectively. It was also shown that the evolutionary rate of primate genes is considerably slower than those of other mammalian genes.« less
The sequence and de novo assembly of the giant panda genome
Li, Ruiqiang; Fan, Wei; Tian, Geng; Zhu, Hongmei; He, Lin; Cai, Jing; Huang, Quanfei; Cai, Qingle; Li, Bo; Bai, Yinqi; Zhang, Zhihe; Zhang, Yaping; Wang, Wen; Li, Jun; Wei, Fuwen; Li, Heng; Jian, Min; Li, Jianwen; Zhang, Zhaolei; Nielsen, Rasmus; Li, Dawei; Gu, Wanjun; Yang, Zhentao; Xuan, Zhaoling; Ryder, Oliver A.; Leung, Frederick Chi-Ching; Zhou, Yan; Cao, Jianjun; Sun, Xiao; Fu, Yonggui; Fang, Xiaodong; Guo, Xiaosen; Wang, Bo; Hou, Rong; Shen, Fujun; Mu, Bo; Ni, Peixiang; Lin, Runmao; Qian, Wubin; Wang, Guodong; Yu, Chang; Nie, Wenhui; Wang, Jinhuan; Wu, Zhigang; Liang, Huiqing; Min, Jiumeng; Wu, Qi; Cheng, Shifeng; Ruan, Jue; Wang, Mingwei; Shi, Zhongbin; Wen, Ming; Liu, Binghang; Ren, Xiaoli; Zheng, Huisong; Dong, Dong; Cook, Kathleen; Shan, Gao; Zhang, Hao; Kosiol, Carolin; Xie, Xueying; Lu, Zuhong; Zheng, Hancheng; Li, Yingrui; Steiner, Cynthia C.; Lam, Tommy Tsan-Yuk; Lin, Siyuan; Zhang, Qinghui; Li, Guoqing; Tian, Jing; Gong, Timing; Liu, Hongde; Zhang, Dejin; Fang, Lin; Ye, Chen; Zhang, Juanbin; Hu, Wenbo; Xu, Anlong; Ren, Yuanyuan; Zhang, Guojie; Bruford, Michael W.; Li, Qibin; Ma, Lijia; Guo, Yiran; An, Na; Hu, Yujie; Zheng, Yang; Shi, Yongyong; Li, Zhiqiang; Liu, Qing; Chen, Yanling; Zhao, Jing; Qu, Ning; Zhao, Shancen; Tian, Feng; Wang, Xiaoling; Wang, Haiyin; Xu, Lizhi; Liu, Xiao; Vinar, Tomas; Wang, Yajun; Lam, Tak-Wah; Yiu, Siu-Ming; Liu, Shiping; Zhang, Hemin; Li, Desheng; Huang, Yan; Wang, Xia; Yang, Guohua; Jiang, Zhi; Wang, Junyi; Qin, Nan; Li, Li; Li, Jingxiang; Bolund, Lars; Kristiansen, Karsten; Wong, Gane Ka-Shu; Olson, Maynard; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun
2013-01-01
Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes. PMID:20010809
Bloom DNA Helicase Facilitates Homologous Recombination between Diverged Homologous Sequences*
Kikuchi, Koji; Abdel-Aziz, H. Ismail; Taniguchi, Yoshihito; Yamazoe, Mitsuyoshi; Takeda, Shunichi; Hirota, Kouji
2009-01-01
Bloom syndrome caused by inactivation of the Bloom DNA helicase (Blm) is characterized by increases in the level of sister chromatid exchange, homologous recombination (HR) associated with cross-over. It is therefore believed that Blm works as an anti-recombinase. Meanwhile, in Drosophila, DmBlm is required specifically to promote the synthesis-dependent strand anneal (SDSA), a type of HR not associating with cross-over. However, conservation of Blm function in SDSA through higher eukaryotes has been a matter of debate. Here, we demonstrate the function of Blm in SDSA type HR in chicken DT40 B lymphocyte line, where Ig gene conversion diversifies the immunoglobulin V gene through intragenic HR between diverged homologous segments. This reaction is initiated by the activation-induced cytidine deaminase enzyme-mediated uracil formation at the V gene, which in turn converts into abasic site, presumably leading to a single strand gap. Ig gene conversion frequency was drastically reduced in BLM−/− cells. In addition, BLM−/− cells used limited donor segments harboring higher identity compared with other segments in Ig gene conversion event, suggesting that Blm can promote HR between diverged sequences. To further understand the role of Blm in HR between diverged homologous sequences, we measured the frequency of gene targeting induced by an I-SceI-endonuclease-mediated double-strand break. BLM−/− cells showed a severer defect in the gene targeting frequency as the number of heterologous sequences increased at the double-strand break site. Conversely, the overexpression of Blm, even an ATPase-defective mutant, strongly stimulated gene targeting. In summary, Blm promotes HR between diverged sequences through a novel ATPase-independent mechanism. PMID:19661064
Lauber, Chris
2012-01-01
The recent advent of genome sequences as the only source available to classify many newly discovered viruses challenges the development of virus taxonomy by expert virologists who traditionally rely on extensive virus characterization. In this proof-of-principle study, we address this issue by presenting a computational approach (DEmARC) to classify viruses of a family into groups at hierarchical levels using a sole criterion—intervirus genetic divergence. To quantify genetic divergence, we used pairwise evolutionary distances (PEDs) estimated by maximum likelihood inference on a multiple alignment of family-wide conserved proteins. PEDs were calculated for all virus pairs, and the resulting distribution was modeled via a mixture of probability density functions. The model enables the quantitative inference of regions of distance discontinuity in the family-wide PED distribution, which define the levels of hierarchy. For each level, a limit on genetic divergence, below which two viruses join the same group, was objectively selected among a set of candidates by minimizing violations of intragroup PEDs to the limit. In a case study, we applied the procedure to hundreds of genome sequences of picornaviruses and extensively evaluated it by modulating four key parameters. It was found that the genetics-based classification largely tolerates variations in virus sampling and multiple alignment construction but is affected by the choice of protein and the measure of genetic divergence. In an accompanying paper (C. Lauber and A. E. Gorbalenya, J. Virol. 86:3905–3915, 2012), we analyze the substantial insight gained with the genetics-based classification approach by comparing it with the expert-based picornavirus taxonomy. PMID:22278230
Kim, Min Jee; Hong, Eui Jeong; Kim, Iksoo
2016-01-01
We sequenced the complete mitochondrial (mt) genome of Camponotus atrox (Hymenoptera: Formicidae), which is only distributed in Korea. The genome was 16 540 bp in size and contained typical sets of genes (13 protein-coding genes, 22 tRNAs, and 2 rRNAs). The C. atrox A+T-rich region, at 1402 bp, was the longest of all sequenced ant genomes and was composed of an identical tandem repeat consisting of six 100-bp copies and one 96-bp copy. A total of 315 bp of intergenic spacer sequence was spread over 23 regions. An alignment of the spacer sequences in ants was largely feasible among congeneric species, and there was substantial sequence divergence, indicating their potential use as molecular markers for congeneric species. The A/T contents at the first and second codon positions of protein-coding genes (PCGs) were similar for ant species, including C. atrox (73.9% vs. 72.3%, on average). With increased taxon sampling among hymenopteran superfamilies, differences in the divergence rates (i.e., the non-synonymous substitution rates) between the suborders Symphyta and Apocrita were detected, consistent with previous results. The C. atrox mt genome had a unique gene arrangement, trnI-trnM-trnQ, at the A+T-rich region and ND2 junction (underline indicates inverted gene). This may have originated from a tandem duplication of trnM-trnI, resulting in trnM-trnI-trnM-trnI-trnQ, and the subsequent loss of the first trnM and second trnI, resulting in trnI-trnM-trnQ.
Pohl, Nélida; Sison-Mangus, Marilou P; Yee, Emily N; Liswi, Saif W; Briscoe, Adriana D
2009-05-13
The increase in availability of genomic sequences for a wide range of organisms has revealed gene duplication to be a relatively common event. Encounters with duplicate gene copies have consequently become almost inevitable in the context of collecting gene sequences for inferring species trees. Here we examine the effect of incorporating duplicate gene copies evolving at different rates on tree reconstruction and time estimation of recent and deep divergences in butterflies. Sequences from ultraviolet-sensitive (UVRh), blue-sensitive (BRh), and long-wavelength sensitive (LWRh) opsins,EF-1 and COI were obtained from 27 taxa representing the five major butterfly families (5535 bp total). Both BRh and LWRh are present in multiple copies in some butterfly lineages and the different copies evolve at different rates. Regardless of the phylogenetic reconstruction method used, we found that analyses of combined data sets using either slower or faster evolving copies of duplicate genes resulted in a single topology in agreement with our current understanding of butterfly family relationships based on morphology and molecules. Interestingly, individual analyses of BRh and LWRh sequences also recovered these family-level relationships. Two different relaxed clock methods resulted in similar divergence time estimates at the shallower nodes in the tree, regardless of whether faster or slower evolving copies were used, with larger discrepancies observed at deeper nodes in the phylogeny. The time of divergence between the monarch butterfly Danaus plexippus and the queen D. gilippus (15.3-35.6 Mya) was found to be much older than the time of divergence between monarch co-mimic Limenitis archippus and red-spotted purple L. arthemis (4.7-13.6 Mya), and overlapping with the time of divergence of the co-mimetic passionflower butterflies Heliconius erato and H. melpomene (13.5-26.1 Mya). Our family-level results are congruent with recent estimates found in the literature and indicate an age of 84-113 million years for the divergence of all butterfly families. These results are consistent with diversification of the butterfly families following the radiation of angiosperms and suggest that some classes of opsin genes may be usefully employed for both phylogenetic reconstruction and divergence time estimation.
Comeau, André M; Arbiol, Christine; Krisch, Henry M
2014-06-19
The diverse T4-like phages (Tquatrovirinae) infect a wide array of gram-negative bacterial hosts. The genome architecture of these phages is generally well conserved, most of the phylogenetically variable genes being grouped together in a series hyperplastic regions (HPRs) that are interspersed among large blocks of conserved core genes. Recent evidence from a pair of closely related T4-like phages has suggested that small, composite terminator/promoter sequences (promoterearly stem loop [PeSLs]) were implicated in mediating the high levels of genetic plasticity by indels occurring within the HPRs. Here, we present the genome sequence analysis of two T4-like phages, PST (168 kb, 272 open reading frames [ORFs]) and nt-1 (248 kb, 405 ORFs). These two phages were chosen for comparative sequence analysis because, although they are closely related to phages that have been previously sequenced (T4 and KVP40, respectively), they have different host ranges. In each case, one member of the pair infects a bacterial strain that is a human pathogen, whereas the other phage's host is a nonpathogen. Despite belonging to phylogenetically distant branches of the T4-likes, these pairs of phage have diverged from each other in part by a mechanism apparently involving PeSL-mediated recombination. This analysis confirms a role of PeSL sequences in the generation of genomic diversity by serving as a point of genetic exchange between otherwise unrelated sequences within the HPRs. Finally, the palette of divergent genes swapped by PeSL-mediated homologous recombination is discussed in the context of the PeSLs' potentially important role in facilitating phage adaption to new hosts and environments. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Horner, David S; Lefkimmiatis, Konstantinos; Reyes, Aurelio; Gissi, Carmela; Saccone, Cecilia; Pesole, Graziano
2007-01-01
Background Phylogenetic relationships between Lagomorpha, Rodentia and Primates and their allies (Euarchontoglires) have long been debated. While it is now generally agreed that Rodentia constitutes a monophyletic sister-group of Lagomorpha and that this clade (Glires) is sister to Primates and Dermoptera, higher-level relationships within Rodentia remain contentious. Results We have sequenced and performed extensive evolutionary analyses on the mitochondrial genome of the scaly-tailed flying squirrel Anomalurus sp., an enigmatic rodent whose phylogenetic affinities have been obscure and extensively debated. Our phylogenetic analyses of the coding regions of available complete mitochondrial genome sequences from Euarchontoglires suggest that Anomalurus is a sister taxon to the Hystricognathi, and that this clade represents the most basal divergence among sampled Rodentia. Bayesian dating methods incorporating a relaxed molecular clock provide divergence-time estimates which are consistently in agreement with the fossil record and which indicate a rapid radiation within Glires around 60 million years ago. Conclusion Taken together, the data presented provide a working hypothesis as to the phylogenetic placement of Anomalurus, underline the utility of mitochondrial sequences in the resolution of even relatively deep divergences and go some way to explaining the difficulty of conclusively resolving higher-level relationships within Glires with available data and methodologies. PMID:17288612
Pereira, J O P; Freitas, B M; Jorge, D M M; Torres, D C; Soares, C E A; Grangeiro, T B
2009-01-01
Melipona quinquefasciata is a ground-nesting South American stingless bee whose geographic distribution was believed to comprise only the central and southern states of Brazil. We obtained partial sequences (about 500-570 bp) of first internal transcribed spacer (ITS1) nuclear ribosomal DNA from Melipona specimens putatively identified as M. quinquefasciata collected from different localities in northeastern Brazil. To confirm the taxonomic identity of the northeastern samples, specimens from the state of Goiás (Central region of Brazil) were included for comparison. All sequences were deposited in GenBank (accession numbers EU073751-EU073759). The mean nucleotide divergence (excluding sites with insertions/deletions) in the ITS1 sequences was only 1.4%, ranging from 0 to 4.1%. When the sites with insertions/deletions were also taken into account, sequence divergences varied from 0 to 5.3%. In all pairwise comparisons, the ITS1 sequence from the specimens collected in Goiás was most divergent compared to the ITS1 sequences of the bees from the other locations. However, neighbor-joining phylogenetic analysis showed that all ITS1 sequences from northeastern specimens along with the sample of Goiás were resolved in a single clade with a bootstrap support of 100%. The ITS1 sequencing data thus support the occurrence of M. quinquefasciata in northeast Brazil.
2013-01-01
Background Cotton, one of the world’s leading crops, is important to the world’s textile and energy industries, and is a model species for studies of plant polyploidization, cellulose biosynthesis and cell wall biogenesis. Here, we report the construction of a plant-transformation-competent binary bacterial artificial chromosome (BIBAC) library and comparative genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.) with one of its diploid putative progenitor species, G. raimondii Ulbr. Results We constructed the cotton BIBAC library in a vector competent for high-molecular-weight DNA transformation in different plant species through either Agrobacterium or particle bombardment. The library contains 76,800 clones with an average insert size of 135 kb, providing an approximate 99% probability of obtaining at least one positive clone from the library using a single-copy probe. The quality and utility of the library were verified by identifying BIBACs containing genes important for fiber development, fiber cellulose biosynthesis, seed fatty acid metabolism, cotton-nematode interaction, and bacterial blight resistance. In order to gain an insight into the Upland cotton genome and its relationship with G. raimondii, we sequenced nearly 10,000 BIBAC ends (BESs) randomly selected from the library, generating approximately one BES for every 250 kb along the Upland cotton genome. The retroelement Gypsy/DIRS1 family predominates in the Upland cotton genome, accounting for over 77% of all transposable elements. From the BESs, we identified 1,269 simple sequence repeats (SSRs), of which 1,006 were new, thus providing additional markers for cotton genome research. Surprisingly, comparative sequence analysis showed that Upland cotton is much more diverged from G. raimondii at the genomic sequence level than expected. There seems to be no significant difference between the relationships of the Upland cotton D- and A-subgenomes with the G. raimondii genome, even though G. raimondii contains a D genome (D5). Conclusions The library represents the first BIBAC library in cotton and related species, thus providing tools useful for integrative physical mapping, large-scale genome sequencing and large-scale functional analysis of the Upland cotton genome. Comparative sequence analysis provides insights into the Upland cotton genome, and a possible mechanism underlying the divergence and evolution of polyploid Upland cotton from its diploid putative progenitor species, G. raimondii. PMID:23537070
2012-01-01
Background A detailed knowledge about spatial and temporal gene expression is important for understanding both the function of genes and their evolution. For the vast majority of species, transcriptomes are still largely uncharacterized and even in those where substantial information is available it is often in the form of partially sequenced transcriptomes. With the development of next generation sequencing, a single experiment can now simultaneously identify the transcribed part of a species genome and estimate levels of gene expression. Results mRNA from actively growing needles of Norway spruce (Picea abies) was sequenced using next generation sequencing technology. In total, close to 70 million fragments with a length of 76 bp were sequenced resulting in 5 Gbp of raw data. A de novo assembly of these reads, together with publicly available expressed sequence tag (EST) data from Norway spruce, was used to create a reference transcriptome. Of the 38,419 PUTs (putative unique transcripts) longer than 150 bp in this reference assembly, 83.5% show similarity to ESTs from other spruce species and of the remaining PUTs, 3,704 show similarity to protein sequences from other plant species, leaving 4,167 PUTs with limited similarity to currently available plant proteins. By predicting coding frames and comparing not only the Norway spruce PUTs, but also PUTs from the close relatives Picea glauca and Picea sitchensis to both Pinus taeda and Taxus mairei, we obtained estimates of synonymous and non-synonymous divergence among conifer species. In addition, we detected close to 15,000 SNPs of high quality and estimated gene expression differences between samples collected under dark and light conditions. Conclusions Our study yielded a large number of single nucleotide polymorphisms as well as estimates of gene expression on transcriptome scale. In agreement with a recent study we find that the synonymous substitution rate per year (0.6 × 10−09 and 1.1 × 10−09) is an order of magnitude smaller than values reported for angiosperm herbs. However, if one takes generation time into account, most of this difference disappears. The estimates of the dN/dS ratio (non-synonymous over synonymous divergence) reported here are in general much lower than 1 and only a few genes showed a ratio larger than 1. PMID:23122049
NASA Astrophysics Data System (ADS)
Tang, Le; Zhu, Songling; Mastriani, Emilio; Fang, Xin; Zhou, Yu-Jie; Li, Yong-Guo; Johnston, Randal N.; Guo, Zheng; Liu, Gui-Rong; Liu, Shu-Lin
2017-03-01
Highly conserved short sequences help identify functional genomic regions and facilitate genomic annotation. We used Salmonella as the model to search the genome for evolutionarily conserved regions and focused on the tetranucleotide sequence CTAG for its potentially important functions. In Salmonella, CTAG is highly conserved across the lineages and large numbers of CTAG-containing short sequences fall in intergenic regions, strongly indicating their biological importance. Computer modeling demonstrated stable stem-loop structures in some of the CTAG-containing intergenic regions, and substitution of a nucleotide of the CTAG sequence would radically rearrange the free energy and disrupt the structure. The postulated degeneration of CTAG takes distinct patterns among Salmonella lineages and provides novel information about genomic divergence and evolution of these bacterial pathogens. Comparison of the vertically and horizontally transmitted genomic segments showed different CTAG distribution landscapes, with the genome amelioration process to remove CTAG taking place inward from both terminals of the horizontally acquired segment.
Divergent transcription is associated with promoters of transcriptional regulators
2013-01-01
Background Divergent transcription is a wide-spread phenomenon in mammals. For instance, short bidirectional transcripts are a hallmark of active promoters, while longer transcripts can be detected antisense from active genes in conditions where the RNA degradation machinery is inhibited. Moreover, many described long non-coding RNAs (lncRNAs) are transcribed antisense from coding gene promoters. However, the general significance of divergent lncRNA/mRNA gene pair transcription is still poorly understood. Here, we used strand-specific RNA-seq with high sequencing depth to thoroughly identify antisense transcripts from coding gene promoters in primary mouse tissues. Results We found that a substantial fraction of coding-gene promoters sustain divergent transcription of long non-coding RNA (lncRNA)/mRNA gene pairs. Strikingly, upstream antisense transcription is significantly associated with genes related to transcriptional regulation and development. Their promoters share several characteristics with those of transcriptional developmental genes, including very large CpG islands, high degree of conservation and epigenetic regulation in ES cells. In-depth analysis revealed a unique GC skew profile at these promoter regions, while the associated coding genes were found to have large first exons, two genomic features that might enforce bidirectional transcription. Finally, genes associated with antisense transcription harbor specific H3K79me2 epigenetic marking and RNA polymerase II enrichment profiles linked to an intensified rate of early transcriptional elongation. Conclusions We concluded that promoters of a class of transcription regulators are characterized by a specialized transcriptional control mechanism, which is directly coupled to relaxed bidirectional transcription. PMID:24365181
Intraspecific variation in Cryptocaryon irritans.
Diggles, B K; Adlard, R D
1997-01-01
Intraspecific variation in the ciliate Cryptocaryon irritans was examined using sequences of the first internal transcribed spacer region (ITS-1) of ribosomal DNA (rDNA) combined with developmental and morphological characters. Amplified rDNA sequences consisting of 151 bases of the flanking 18 S and 5.8 S regions, and the entire ITS-1 region (169 or 170 bases), were determined and compared for 16 isolates of C. irritans from Australia, Israel and the USA. There was one variable base between isolates in the 18 S region and 11 variable bases in the ITS-1 region. Despite their similar morphology, significant sequence variation (4.1% divergence) and developmental differences indicate that Australian C. irritans isolates from estuarine (Moreton Bay) and coral reef (Heron Island) environments are distinct. The Heron Island isolate was genetically closer to morphologically dissimilar isolates from Israel (1.8% divergence) and the USA (2.3% divergence) than it was to the Moreton Bay isolates. Three isolates maintained in our laboratory since February 1994 differed in sequence from earlier laboratory isolates (2.9% to 3.5% divergence), even though all were similar morphologically and originated from the same source. During this time the sequence of the isolates from wild fish in Moreton Bay remained unchanged. These genetic differences indicate the existence of a founder effect in laboratory populations of C. irritans. The genetic variation found here, combined with known morphological and developmental differences, is used to characterise four strains of C. irritans.
Chakona, Albert; Swartz, Ernst R.; Gouws, Gavin
2013-01-01
This study used phylogenetic analyses of mitochondrial cytochrome b sequences to investigate genetic diversity within three broadly co-distributed freshwater fish genera (Galaxias, Pseudobarbus and Sandelia) to shed some light on the processes that promoted lineage diversification and shaped geographical distribution patterns. A total of 205 sequences of Galaxias, 177 sequences of Pseudobarbus and 98 sequences of Sandelia from 146 localities across nine river systems in the south-western Cape Floristic Region (South Africa) were used. The data were analysed using phylogenetic and haplotype network methods and divergence times for the clades retrieved were estimated using *BEAST. Nine extremely divergent (3.5–25.3%) lineages were found within Galaxias. Similarly, deep phylogeographic divergence was evident within Pseudobarbus, with four markedly distinct (3.8–10.0%) phylogroups identified. Sandelia had two deeply divergent (5.5–5.9%) lineages, but seven minor lineages with strong geographical congruence were also identified. The Miocene-Pliocene major sea-level transgression and the resultant isolation of populations in upland refugia appear to have driven widespread allopatric divergence within the three genera. Subsequent coalescence of rivers during the Pleistocene major sea-level regression as well as intermittent drainage connections during wet periods are proposed to have facilitated range expansion of lineages that currently occur across isolated river systems. The high degree of genetic differentiation recovered from the present and previous studies suggest that freshwater fish diversity within the south-western CFR may be vastly underestimated, and taxonomic revisions are required. PMID:23951050
Pitteloud, Camille; Arrigo, Nils; Suchan, Tomasz; Mastretta-Yanes, Alicia; Dincă, Vlad; Hernández-Roldán, Juan; Brockmann, Ernst; Chittaro, Yannick; Kleckova, Irena; Fumagalli, Luca; Buerki, Sven; Pellissier, Loïc
2017-01-01
Understanding how speciation relates to ecological divergence has long fascinated biologists. It is assumed that ecological divergence is essential to sympatric speciation, as a mechanism to avoid competition and eventually lead to reproductive isolation, while divergence in allopatry is not necessarily associated with niche differentiation. The impact of the spatial context of divergence on the evolutionary rates of abiotic dimensions of the ecological niche has rarely been explored for an entire clade. Here, we compare the magnitude of climatic niche shifts between sympatric versus allopatric divergence of lineages in butterflies. By combining next-generation sequencing, parametric biogeography and ecological niche analyses applied to a genus-wide phylogeny of Palaearctic Pyrgus butterflies, we compare evolutionary rates along eight climatic dimensions across sister lineages that diverged in large-scale sympatry versus allopatry. In order to examine the possible effects of the spatial scale at which sympatry is defined, we considered three sets of biogeographic assignments, ranging from narrow to broad definition. Our findings suggest higher rates of niche evolution along all climatic dimensions for sister lineages that diverge in sympatry, when using a narrow delineation of biogeographic areas. This result contrasts with significantly lower rates of climatic niche evolution found in cases of allopatric speciation, despite the biogeographic regions defined here being characterized by significantly different climates. Higher rates in allopatry are retrieved when biogeographic areas are too widely defined—in such a case allopatric events may be recorded as sympatric. Our results reveal the macro-evolutionary significance of abiotic niche differentiation involved in speciation processes within biogeographic regions, and illustrate the importance of the spatial scale chosen to define areas when applying parametric biogeographic analyses. PMID:28404781
Antell, Gregory C.; Zhong, Wen; Kercher, Katherine; Passic, Shendra; Williams, Jean; Liu, Yucheng; James, Tony; Jacobson, Jeffrey M.; Szep, Zsofia
2017-01-01
Vpr is an HIV-1 accessory protein that plays numerous roles during viral replication, and some of which are cell type dependent. To test the hypothesis that HIV-1 tropism extends beyond the envelope into the vpr gene, studies were performed to identify the associations between coreceptor usage and Vpr variation in HIV-1-infected patients. Colinear HIV-1 Env-V3 and Vpr amino acid sequences were obtained from the LANL HIV-1 sequence database and from well-suppressed patients in the Drexel/Temple Medicine CNS AIDS Research and Eradication Study (CARES) Cohort. Genotypic classification of Env-V3 sequences as X4 (CXCR4-utilizing) or R5 (CCR5-utilizing) was used to group colinear Vpr sequences. To reveal the sequences associated with a specific coreceptor usage genotype, Vpr amino acid sequences were assessed for amino acid diversity and Jensen-Shannon divergence between the two groups. Five amino acid alphabets were used to comprehensively examine the impact of amino acid substitutions involving side chains with similar physiochemical properties. Positions 36, 37, 41, 89, and 96 of Vpr were characterized by statistically significant divergence across multiple alphabets when X4 and R5 sequence groups were compared. In addition, consensus amino acid switches were found at positions 37 and 41 in comparisons of the R5 and X4 sequence populations. These results suggest an evolutionary link between Vpr and gp120 in HIV-1-infected patients. PMID:28620613
2004-01-01
Flagellar genes Presentb Presentc Presentc Tagatose utilization genes Absent Present Partiald Functional PlcR Absente Presente Presente Mobile genetic...closely related and one that is divergent (Supplementary ®g. S3). dThere are similar tagatose utilization genes in B.cereus ATCC 14579; however, they...replacement responsible for the transport and utilization of the carbohydrate tagatose (BCE1896±BCE1912). The corres- ponding 5.0 kb region in
Toledo, G; Palenik, B
1997-01-01
Because they are ubiquitous in a range of aquatic environments and culture methods are relatively advanced, cyanobacteria may be useful models for understanding the extent of evolutionary adaptation of prokaryotes in general to environmental gradients. The roles of environmental variables such as light and nutrients in influencing cyanobacterial genetic diversity are still poorly characterized, however. In this study, a total of 15 Synechococcus strains were isolated from the oligotrophic edge of the California Current from two depths (5 and 95 m) with large differences in light intensity, light quality, and nutrient concentrations. RNA polymerase gene (rpoC1) fragment sequences of the strains revealed two major genetic lineages, distinct from other marine or freshwater cyanobacterial isolates or groups seen in shotgun-cloned sequences from the oligotrophic Atlantic Ocean. The California Current low-phycourobilin (CCLPUB) group represented by six isolates in a single lineage was less diverse than the California Current high-phycourobilin (CCHPUB) group with nine isolates in three relatively divergent lineages. The former was found to be the closest known genetic group to Prochlorococcus spp., a chlorophyll b-containing cyanobacterial group. Having an isolate from this group will be valuable for looking at the molecular changes necessary for the transition from the use of phycobiliproteins to chlorophyll b as light-harvesting pigments. Both of the CCHPUB and CCLPUB groups included strains obtained from surface (5 m) and deep (95 m) samples. Thus, contrary to expectations, there was no clear correlation between sampling depth and isolation of genetic groups, despite the large environmental gradients present. To our knowledge, this is the first demonstration with isolates that genetically divergent Synechococcus groups coexist in the same seawater sample. PMID:9361417
Follin, Elna; Karlsson, Maria; Lundegaard, Claus; Nielsen, Morten; Wallin, Stefan; Paulsson, Kajsa; Westerdahl, Helena
2013-04-01
The major histocompatibility complex (MHC) genes are the most polymorphic genes found in the vertebrate genome, and they encode proteins that play an essential role in the adaptive immune response. Many songbirds (passerines) have been shown to have a large number of transcribed MHC class I genes compared to most mammals. To elucidate the reason for this large number of genes, we compared 14 MHC class I alleles (α1-α3 domains), from great reed warbler, house sparrow and tree sparrow, via phylogenetic analysis, homology modelling and in silico peptide-binding predictions to investigate their functional and genetic relationships. We found more pronounced clustering of the MHC class I allomorphs (allele specific proteins) in regards to their function (peptide-binding specificities) compared to their genetic relationships (amino acid sequences), indicating that the high number of alleles is of functional significance. The MHC class I allomorphs from house sparrow and tree sparrow, species that diverged 10 million years ago (MYA), had overlapping peptide-binding specificities, and these similarities across species were also confirmed in phylogenetic analyses based on amino acid sequences. Notably, there were also overlapping peptide-binding specificities in the allomorphs from house sparrow and great reed warbler, although these species diverged 30 MYA. This overlap was not found in a tree based on amino acid sequences. Our interpretation is that convergent evolution on the level of the protein function, possibly driven by selection from shared pathogens, has resulted in allomorphs with similar peptide-binding repertoires, although trans-species evolution in combination with gene conversion cannot be ruled out.
Burgar, Joanna M; Murray, Daithi C; Craig, Michael D; Haile, James; Houston, Jayne; Stokes, Vicki; Bunce, Michael
2014-08-01
Effective management and conservation of biodiversity requires understanding of predator-prey relationships to ensure the continued existence of both predator and prey populations. Gathering dietary data from predatory species, such as insectivorous bats, often presents logistical challenges, further exacerbated in biodiversity hot spots because prey items are highly speciose, yet their taxonomy is largely undescribed. We used high-throughput sequencing (HTS) and bioinformatic analyses to phylogenetically group DNA sequences into molecular operational taxonomic units (MOTUs) to examine predator-prey dynamics of three sympatric insectivorous bat species in the biodiversity hotspot of south-western Australia. We could only assign between 4% and 20% of MOTUs to known genera or species, depending on the method used, underscoring the importance of examining dietary diversity irrespective of taxonomic knowledge in areas lacking a comprehensive genetic reference database. MOTU analysis confirmed that resource partitioning occurred, with dietary divergence positively related to the ecomorphological divergence of the three bat species. We predicted that bat species' diets would converge during times of high energetic requirements, that is, the maternity season for females and the mating season for males. There was an interactive effect of season on female, but not male, bat species' diets, although small sample sizes may have limited our findings. Contrary to our predictions, females of two ecomorphologically similar species showed dietary convergence during the mating season rather than the maternity season. HTS-based approaches can help elucidate complex predator-prey relationships in highly speciose regions, which should facilitate the conservation of biodiversity in genetically uncharacterized areas, such as biodiversity hotspots. © 2013 John Wiley & Sons Ltd.
Zou, Hong; Zhang, Jin; Li, Wenxiang; Wu, Shangong; Wang, Guitang
2012-01-01
The 17,922 base pairs (bp) nucleotide sequence of the linear mitochondrial DNA (mtDNA) molecule of the freshwater jellyfish Craspedacusta sowerbyi (Hydrozoa, Trachylina, Limnomedusae) has been determined. This sequence exhibits surprisingly low A+T content (57.1%), containing genes for 13 energy pathway proteins, a small and a large subunit rRNAs, and methionine and tryptophan tRNAs. Mitochondrial ancestral medusozoan gene order (AMGO) was found in the C. sowerbyi, as those found in Cubaia aphrodite (Hydrozoa, Trachylina, Limnomedusae), discomedusan Scyphozoa and Staurozoa. The genes of C. sowerbyi mtDNA are arranged in two clusters with opposite transcriptional polarities, whereby transcription proceeds toward the ends of the DNA molecule. Identical inverted terminal repeats (ITRs) flank the ends of the mitochondrial DNA molecule, a characteristic typical of medusozoans. In addition, two open reading frames (ORFs) of 354 and 1611 bp in length were found downstream of the large subunit rRNA gene, similar to the two ORFs of ORF314 and polB discovered in the linear mtDNA of C. aphrodite, discomedusan Scyphozoa and Staurozoa. Phylogenetic analyses of C. sowerbyi and other cnidarians were carried out based on both nucleotide and inferred amino acid sequences of the 13 mitochondrial energy pathway genes. Our working hypothesis supports the monophyletic Medusozoa being a sister group to Octocorallia (Cnidaria, Anthozoa). Within Medusozoa, the phylogenetic analysis suggests that Staurozoa may be the earliest diverging class and the sister group of all other medusozoans. Cubozoa and coronate Scyphozoa form a clade that is the sister group of Hydrozoa plus discomedusan Scyphozoa. Hydrozoa is the sister group of discomedusan Scyphozoa. Semaeostomeae is a paraphyletic clade with Rhizostomeae, while Limnomedusae (Trachylina) is the sister group of hydroidolinans and may be the earliest diverging lineage among Hydrozoa.
Zou, Hong; Zhang, Jin; Li, Wenxiang; Wu, Shangong; Wang, Guitang
2012-01-01
The 17,922 base pairs (bp) nucleotide sequence of the linear mitochondrial DNA (mtDNA) molecule of the freshwater jellyfish Craspedacusta sowerbyi (Hydrozoa,Trachylina, Limnomedusae) has been determined. This sequence exhibits surprisingly low A+T content (57.1%), containing genes for 13 energy pathway proteins, a small and a large subunit rRNAs, and methionine and tryptophan tRNAs. Mitochondrial ancestral medusozoan gene order (AMGO) was found in the C. sowerbyi, as those found in Cubaia aphrodite (Hydrozoa, Trachylina, Limnomedusae), discomedusan Scyphozoa and Staurozoa. The genes of C. sowerbyi mtDNA are arranged in two clusters with opposite transcriptional polarities, whereby transcription proceeds toward the ends of the DNA molecule. Identical inverted terminal repeats (ITRs) flank the ends of the mitochondrial DNA molecule, a characteristic typical of medusozoans. In addition, two open reading frames (ORFs) of 354 and 1611 bp in length were found downstream of the large subunit rRNA gene, similar to the two ORFs of ORF314 and polB discovered in the linear mtDNA of C. aphrodite, discomedusan Scyphozoa and Staurozoa. Phylogenetic analyses of C. sowerbyi and other cnidarians were carried out based on both nucleotide and inferred amino acid sequences of the 13 mitochondrial energy pathway genes. Our working hypothesis supports the monophyletic Medusozoa being a sister group to Octocorallia (Cnidaria, Anthozoa). Within Medusozoa, the phylogenetic analysis suggests that Staurozoa may be the earliest diverging class and the sister group of all other medusozoans. Cubozoa and coronate Scyphozoa form a clade that is the sister group of Hydrozoa plus discomedusan Scyphozoa. Hydrozoa is the sister group of discomedusan Scyphozoa. Semaeostomeae is a paraphyletic clade with Rhizostomeae, while Limnomedusae (Trachylina) is the sister group of hydroidolinans and may be the earliest diverging lineage among Hydrozoa. PMID:23240028
NASA Astrophysics Data System (ADS)
Nallaseth, Ferez Soli
The Y-chromosome presents a unique cytogenetic framework for the evolution of nucleotide sequences. Alignment of nine Y-chromosomal fragments in their increasing Y-specific/non Y-specific (male/female) sequence divergence ratios was directly and inversely related to their interspersion on these two respective genomic fractions. Sequence analysis confirmed a direct relationship between divergence ratios and the Alu, LINE-1, Satellite and their derivative oligonucleotide contents. Thus their relocation on the Y-chromosome is followed by sequence divergence rather than the well documented concerted evolution of these non-coding progenitor repeated sequences. Five of the nine Y-chromosomal fragments are non-pseudoautosomal and transcribed into heterogeneous PolyA^+ RNA and thus can be retrotransposed. Evolutionary and computer analysis identified homologous oligonucleotide tracts in several human loci suggesting common and random mechanistic origins. Dysgenic genomes represent the accelerated evolution driving sequence divergence (McClintock, 1984). Sex reversal and sterility characterizing dysgenesis occurs in C57BL/6JY ^{rm Pos} but not in 129/SvY^{rm Pos} derivative strains. High frequency, random, multi-locus deletion products of the feral Y^{ rm Pos}-chromosome are generated in the germlines of F1(C57BL/6J X 129/SvY^{ rm Pos})(male) and C57BL/6JY ^{rm Pos}(male) but not in 129/SvY^{rm Pos}(male). Equal, 10^{-1}, 10^ {-2}, and 0 copies (relative to males) of Y^{rm Pos}-specific deletion products respectively characterize C57BL/6JY ^{rm Pos} (HC), (LC), (T) and (F) females. The testes determining loci of inactive Y^{rm Pos}-chromosomes in C57BL/6JY^{rm Pos} HC females are the preferentially deleted/rearranged Y ^{rm Pos}-sequences. Disruption of regulation of plasma testosterone and hepatic MUP-A mRNA levels, TRD of a 4.7 Kbp EcoR1 fragment suggest disruption of autosomal/X-chromosomal sequences. These data and the highly repeated progenitor (Alu, GATA, LINE-1) sequence content of deletion products confirmed the previously unidentified loss of genetic control of mammalian chromosome biology and hybrid dysgenesis.
Extreme Quantum Memory Advantage for Rare-Event Sampling
NASA Astrophysics Data System (ADS)
Aghamohammadi, Cina; Loomis, Samuel P.; Mahoney, John R.; Crutchfield, James P.
2018-02-01
We introduce a quantum algorithm for memory-efficient biased sampling of rare events generated by classical memoryful stochastic processes. Two efficiency metrics are used to compare quantum and classical resources for rare-event sampling. For a fixed stochastic process, the first is the classical-to-quantum ratio of required memory. We show for two example processes that there exists an infinite number of rare-event classes for which the memory ratio for sampling is larger than r , for any large real number r . Then, for a sequence of processes each labeled by an integer size N , we compare how the classical and quantum required memories scale with N . In this setting, since both memories can diverge as N →∞ , the efficiency metric tracks how fast they diverge. An extreme quantum memory advantage exists when the classical memory diverges in the limit N →∞ , but the quantum memory has a finite bound. We then show that finite-state Markov processes and spin chains exhibit memory advantage for sampling of almost all of their rare-event classes.
Gaitán-Espitia, Juan Diego; Nespolo, Roberto F.; Opazo, Juan C.
2013-01-01
The complete sequences of three mitochondrial genomes from the land snail Cornu aspersum were determined. The mitogenome has a length of 14050 bp, and it encodes 13 protein-coding genes, 22 transfer RNA genes and two ribosomal RNA genes. It also includes nine small intergene spacers, and a large AT-rich intergenic spacer. The intra-specific divergence analysis revealed that COX1 has the lower genetic differentiation, while the most divergent genes were NADH1, NADH3 and NADH4. With the exception of Euhadra herklotsi, the structural comparisons showed the same gene order within the family Helicidae, and nearly identical gene organization to that found in order Pulmonata. Phylogenetic reconstruction recovered Basommatophora as polyphyletic group, whereas Eupulmonata and Pulmonata as paraphyletic groups. Bayesian and Maximum Likelihood analyses showed that C. aspersum is a close relative of Cepaea nemoralis, and with the other Helicidae species form a sister group of Albinaria caerulea, supporting the monophyly of the Stylommatophora clade. PMID:23826260
Chang, Chin-Feng; Lee, Ching-Fu; Liu, Shiu-Mei
2010-01-01
A new ascomycetous yeast species, Candida neustonensis is proposed in this study based on four strains (SN92(T), SN47, SJ22, SJ25) isolated from sea surface microlayer in Taiwan. These four yeast strains were morphologically, physiologically and phylogenetically identical to each other. No sexual reproduction was observed on 5% malt extract agar, corn meal agar, V8 agar, McClary's acetate agar and potato-dextrose agar. Phylogenetic analysis of the sequences of the D1/D2 domain of the large subunit (LSU) rRNA gene places C. neustonensis as a member of the Pichia guilliermondii clade, it also reveals that the phylogenetically closest relatives of C. neustonensis are C. fukuyamaensis (4.4% divergence), C. xestobii (4.4% divergence) and P. guilliermondii (4.5% divergence). C. neustonensis also is clearly distinguished from other known species in the P. guilliermondii clade based on the results of physiology tests. From these comparison analyses, the following novel yeast species is proposed: Candida neustonensis sp. nov., with strain SN92(T) (= BCRC 23108(T) = JCM 14892(T) = CBS 11061(T)) as the type strain.
Comparing COI and ITS as DNA barcode markers for mushrooms and allies (Agaricomycotina).
Dentinger, Bryn T M; Didukh, Maryna Y; Moncalvo, Jean-Marc
2011-01-01
DNA barcoding is an approach to rapidly identify species using short, standard genetic markers. The mitochondrial cytochrome oxidase I gene (COI) has been proposed as the universal barcode locus, but its utility for barcoding in mushrooms (ca. 20,000 species) has not been established. We succeeded in generating 167 partial COI sequences (~450 bp) representing ~100 morphospecies from ~650 collections of Agaricomycotina using several sets of new primers. Large introns (~1500 bp) at variable locations were detected in ~5% of the sequences we obtained. We suspect that widespread presence of large introns is responsible for our low PCR success (~30%) with this locus. We also sequenced the nuclear internal transcribed spacer rDNA regions (ITS) to compare with COI. Among the small proportion of taxa for which COI could be sequenced, COI and ITS perform similarly as a barcode. However, in a densely sampled set of closely related taxa, COI was less divergent than ITS and failed to distinguish all terminal clades. Given our results and the wealth of ITS data already available in public databases, we recommend that COI be abandoned in favor of ITS as the primary DNA barcode locus in mushrooms.
Comparing COI and ITS as DNA Barcode Markers for Mushrooms and Allies (Agaricomycotina)
Dentinger, Bryn T. M.; Didukh, Maryna Y.; Moncalvo, Jean-Marc
2011-01-01
DNA barcoding is an approach to rapidly identify species using short, standard genetic markers. The mitochondrial cytochrome oxidase I gene (COI) has been proposed as the universal barcode locus, but its utility for barcoding in mushrooms (ca. 20,000 species) has not been established. We succeeded in generating 167 partial COI sequences (∼450 bp) representing ∼100 morphospecies from ∼650 collections of Agaricomycotina using several sets of new primers. Large introns (∼1500 bp) at variable locations were detected in ∼5% of the sequences we obtained. We suspect that widespread presence of large introns is responsible for our low PCR success (∼30%) with this locus. We also sequenced the nuclear internal transcribed spacer rDNA regions (ITS) to compare with COI. Among the small proportion of taxa for which COI could be sequenced, COI and ITS perform similarly as a barcode. However, in a densely sampled set of closely related taxa, COI was less divergent than ITS and failed to distinguish all terminal clades. Given our results and the wealth of ITS data already available in public databases, we recommend that COI be abandoned in favor of ITS as the primary DNA barcode locus in mushrooms. PMID:21966418
2011-01-01
Background Freshwater harbors approximately 12,000 fish species accounting for 43% of the diversity of all modern fish. A single ancestral lineage evolved into about two-thirds of this enormous biodiversity (≈ 7900 spp.) and is currently distributed throughout the world's continents except Antarctica. Despite such remarkable species diversity and ubiquity, the evolutionary history of this major freshwater fish clade, Otophysi, remains largely unexplored. To gain insight into the history of otophysan diversification, we constructed a timetree based on whole mitogenome sequences across 110 species representing 55 of the 64 families. Results Partitioned maximum likelihood analysis based on unambiguously aligned sequences (9923 bp) confidently recovered the monophyly of Otophysi and the two constituent subgroups (Cypriniformes and Characiphysi). The latter clade comprised three orders (Gymnotiformes, Characiformes, Siluriformes), and Gymnotiformes was sister to the latter two groups. One of the two suborders in Characiformes (Characoidei) was more closely related to Siluriformes than to its own suborder (Citharinoidei), rendering the characiforms paraphyletic. Although this novel relationship did not receive strong statistical support, it was supported by analyzing independent nuclear markers. A relaxed molecular clock Bayesian analysis of the divergence times and reconstruction of ancestral habitats on the timetree suggest a Pangaean origin and Mesozoic radiation of otophysans. Conclusions The present timetree demonstrates that survival of the ancestral lineages through the two consecutive mass extinctions on Pangaea, and subsequent radiations during the Jurassic through early Cretaceous shaped the modern familial diversity of otophysans. This evolutionary scenario is consistent with recent arguments based on biogeographic inferences and molecular divergence time estimates. No fossil otophysan, however, has been recorded before the Albian, the early Cretaceous 100-112 Ma, creating an over 100 million year time span without fossil evidence. This formidable ghost range partially reflects a genuine difference between the estimated ages of stem group origin (molecular divergence time) and crown group morphological diversification (fossil divergence time); the ghost range, however, would be filled with discoveries of older fossils that can be used as more reasonable time constraints as well as with developments of more realistic models that capture the rates of molecular sequences accurately. PMID:21693066
Nakatani, Masanori; Miya, Masaki; Mabuchi, Kohji; Saitoh, Kenji; Nishida, Mutsumi
2011-06-22
Freshwater harbors approximately 12,000 fish species accounting for 43% of the diversity of all modern fish. A single ancestral lineage evolved into about two-thirds of this enormous biodiversity (≈ 7900 spp.) and is currently distributed throughout the world's continents except Antarctica. Despite such remarkable species diversity and ubiquity, the evolutionary history of this major freshwater fish clade, Otophysi, remains largely unexplored. To gain insight into the history of otophysan diversification, we constructed a timetree based on whole mitogenome sequences across 110 species representing 55 of the 64 families. Partitioned maximum likelihood analysis based on unambiguously aligned sequences (9923 bp) confidently recovered the monophyly of Otophysi and the two constituent subgroups (Cypriniformes and Characiphysi). The latter clade comprised three orders (Gymnotiformes, Characiformes, Siluriformes), and Gymnotiformes was sister to the latter two groups. One of the two suborders in Characiformes (Characoidei) was more closely related to Siluriformes than to its own suborder (Citharinoidei), rendering the characiforms paraphyletic. Although this novel relationship did not receive strong statistical support, it was supported by analyzing independent nuclear markers. A relaxed molecular clock Bayesian analysis of the divergence times and reconstruction of ancestral habitats on the timetree suggest a Pangaean origin and Mesozoic radiation of otophysans. The present timetree demonstrates that survival of the ancestral lineages through the two consecutive mass extinctions on Pangaea, and subsequent radiations during the Jurassic through early Cretaceous shaped the modern familial diversity of otophysans. This evolutionary scenario is consistent with recent arguments based on biogeographic inferences and molecular divergence time estimates. No fossil otophysan, however, has been recorded before the Albian, the early Cretaceous 100-112 Ma, creating an over 100 million year time span without fossil evidence. This formidable ghost range partially reflects a genuine difference between the estimated ages of stem group origin (molecular divergence time) and crown group morphological diversification (fossil divergence time); the ghost range, however, would be filled with discoveries of older fossils that can be used as more reasonable time constraints as well as with developments of more realistic models that capture the rates of molecular sequences accurately.
Chen, Chao; Wang, Huihua; Liu, Zhiguang; Chen, Xiao; Tang, Jiao; Meng, Fanming; Shi, Wei
2018-06-20
The mechanisms by which organisms adapt to variable environments are a fundamental question in evolutionary biology and are important to protect important species in response to a changing climate. An interesting candidate to study this question is the honey bee Apis cerana, a keystone pollinator with a wide distribution throughout a large variety of climates, that exhibits rapid dispersal. Here, we re-sequenced the genome of 180 A. cerana individuals from eighteen populations throughout China. Using a population genomics approach, we observed considerable genetic variation in A. cerana. Patterns of genetic differentiation indicate high divergence at the subspecies level, and physical barriers rather than distance are the driving force for population divergence. Estimations of divergence time suggested that the main branches diverged between 300 and 500 ka. Analyses of the population history revealed a substantial influence of the Earth's climate on the effective population size of A. cerana, as increased population sizes were observed during warmer periods. Further analyses identified candidate genes under natural selection that are potentially related to honey bee cognition, temperature adaptation, and olfactory. Based on our results, A. cerana may have great potential in response to climate change. Our study provides fundamental knowledge of the evolution and adaptation of A. cerana.
On the origin of smallpox: correlating variola phylogenics with historical smallpox records.
Li, Yu; Carroll, Darin S; Gardner, Shea N; Walsh, Matthew C; Vitalis, Elizabeth A; Damon, Inger K
2007-10-02
Human disease likely attributable to variola virus (VARV), the etiologic agent of smallpox, has been reported in human populations for >2,000 years. VARV is unique among orthopoxviruses in that it is an exclusively human pathogen. Because VARV has a large, slowly evolving DNA genome, we were able to construct a robust phylogeny of VARV by analyzing concatenated single nucleotide polymorphisms (SNPs) from genome sequences of 47 VARV isolates with broad geographic distributions. Our results show two primary VARV clades, which likely diverged from an ancestral African rodent-borne variola-like virus either approximately 16,000 or approximately 68,000 years before present (YBP), depending on which historical records (East Asian or African) are used to calibrate the molecular clock. One primary clade was represented by the Asian VARV major strains, the more clinically severe form of smallpox, which spread from Asia either 400 or 1,600 YBP. Another primary clade included both alastrim minor, a phenotypically mild smallpox described from the American continents, and isolates from West Africa. This clade diverged from an ancestral VARV either 1,400 or 6,300 YBP, and then further diverged into two subclades at least 800 YBP. All of these analyses indicate that the divergence of alastrim and variola major occurred earlier than previously believed.
Iftikhar, Romana; Ashfaq, Muhammad; Rasool, Akhtar; Hebert, Paul D N
2016-01-01
Although thrips are globally important crop pests and vectors of viral disease, species identifications are difficult because of their small size and inconspicuous morphological differences. Sequence variation in the mitochondrial COI-5' (DNA barcode) region has proven effective for the identification of species in many groups of insect pests. We analyzed barcode sequence variation among 471 thrips from various plant hosts in north-central Pakistan. The Barcode Index Number (BIN) system assigned these sequences to 55 BINs, while the Automatic Barcode Gap Discovery detected 56 partitions, a count that coincided with the number of monophyletic lineages recognized by Neighbor-Joining analysis and Bayesian inference. Congeneric species showed an average of 19% sequence divergence (range = 5.6% - 27%) at COI, while intraspecific distances averaged 0.6% (range = 0.0% - 7.6%). BIN analysis suggested that all intraspecific divergence >3.0% actually involved a species complex. In fact, sequences for three major pest species (Haplothrips reuteri, Thrips palmi, Thrips tabaci), and one predatory thrips (Aeolothrips intermedius) showed deep intraspecific divergences, providing evidence that each is a cryptic species complex. The study compiles the first barcode reference library for the thrips of Pakistan, and examines global haplotype diversity in four important pest thrips.
Extensive concerted evolution of rice paralogs and the road to regaining independence.
Wang, Xiyin; Tang, Haibao; Bowers, John E; Feltus, Frank A; Paterson, Andrew H
2007-11-01
Many genes duplicated by whole-genome duplications (WGDs) are more similar to one another than expected. We investigated whether concerted evolution through conversion and crossing over, well-known to affect tandem gene clusters, also affects dispersed paralogs. Genome sequences for two Oryza subspecies reveal appreciable gene conversion in the approximately 0.4 MY since their divergence, with a gradual progression toward independent evolution of older paralogs. Since divergence from subspecies indica, approximately 8% of japonica paralogs produced 5-7 MYA on chromosomes 11 and 12 have been affected by gene conversion and several reciprocal exchanges of chromosomal segments, while approximately 70-MY-old "paleologs" resulting from a genome duplication (GD) show much less conversion. Sequence similarity analysis in proximal gene clusters also suggests more conversion between younger paralogs. About 8% of paleologs may have been converted since rice-sorghum divergence approximately 41 MYA. Domain-encoding sequences are more frequently converted than nondomain sequences, suggesting a sort of circularity--that sequences conserved by selection may be further conserved by relatively frequent conversion. The higher level of concerted evolution in the 5-7 MY-old segmental duplication may reflect the behavior of many genomes within the first few million years after duplication or polyploidization.
Comparative sequence analyses of sixteen reptilian paramyxoviruses
Ahne, W.; Batts, W.N.; Kurath, G.; Winton, J.R.
1999-01-01
Viral genomic RNA of Fer-de-Lance virus (FDLV), a paramyxovirus highly pathogenic for reptiles, was reverse transcribed and cloned. Plasmids with significant sequence similarities to the hemagglutinin-neuraminidase (HN) and polymerase (L) genes of mammalian paramyxoviruses were identified by BLAST search. Partial sequences of the FDLV genes were used to design primers for amplification by nested polymerase chain reaction (PCR) and sequencing of 518-bp L gene and 352-bp HN gene fragments from a collection of 15 previously uncharacterized reptilian paramyxoviruses. Phylogenetic analyses of the partial L and HN sequences produced similar trees in which there were two distinct subgroups of isolates that were supported with maximum bootstrap values, and several intermediate isolates. Within each subgroup the nucleotide divergence values were less than 2.5%, while the divergence between the two subgroups was 20-22%. This indicated that the two subgroups represent distinct virus species containing multiple virus strains. The five intermediate isolates had nucleotide divergence values of 11-20% and may represent additional distinct species. In addition to establishing diversity among reptilian paramyxoviruses, the phylogenetic groupings showed some correlation with geographic location, and clearly demonstrated a low level of host species-specificity within these viruses. Copyright (C) 1999 Elsevier Science B.V.
srRNA evolution and phylogenetic relationships of the genus Naegleria (Protista: Rhizopoda).
Baverstock, P R; Illana, S; Christy, P E; Robinson, B S; Johnson, A M
1989-05-01
A rapid RNA sequencing technique was used to partially sequence the small-subunit ribosomal RNA (srRNA) of four species of the amoeboid genus Naegleria. The extent of nucleotide sequence divergence between the two most divergent species was roughly similar to that found between mammals and frogs. However, the pattern of variation among the Naegleria species was quite different from that found for those species of tetrapods characterized to date. A phylogenetic analysis of the consensus Naegleria sequence showed that Naegleria was not monophyletic with either Acanthamoeba castellanii or Dictyostelium discoideum, two other amoebas for which sequences were available. It was shown that the semiconserved regions of the srRNA molecule evolve in a clocklike fashion and that the clock is time dependent rather than generation dependent.
Deep phylogeographic divergence and cytonuclear discordance in the grasshopper Oedaleus decorus.
Kindler, Eveline; Arlettaz, Raphaël; Heckel, Gerald
2012-11-01
The grasshopper Oedaleus decorus is a thermophilic insect with a large, mostly south-Palaearctic distribution range, stretching from the Mediterranean regions in Europe to Central-Asia and China. In this study, we analyzed the extent of phylogenetic divergence and the recent evolutionary history of the species based on 274 specimens from 26 localities across the distribution range in Europe. Phylogenetic relationships were determined using sequences of two mitochondrial loci (ctr, ND2) with neighbour-joining and Bayesian methods. Additionally, genetic differentiation was analyzed based on mitochondrial DNA and 11 microsatellite markers using F-statistics, model-free multivariate and model-based Bayesian clustering approaches. Phylogenetic analyses detected consistently two highly divergent, allopatrically distributed lineages within O. decorus. The divergence among these Western and Eastern lineages meeting in the region of the Alps was similar to the divergence of each lineage to the sister species O. asiaticus. Genetic differentiation for ctr was extremely high between Western and Eastern grasshopper populations (F(ct)=0.95). Microsatellite markers detected much lower but nevertheless very significant genetic structure among population samples. The nuclear data also demonstrated a case of cytonuclear discordance because the affiliation with mitochondrial lineages was incongruent in Northern Italy. Taken together these results provide evidence of an ancient separation within Oedaleus and either historical introgression of mtDNA among lineages and/or ongoing sex-specific gene flow in this grasshopper. Our study stresses the importance of multilocus approaches for unravelling the history and status of taxa of uncertain evolutionary divergence. Copyright © 2012 Elsevier Inc. All rights reserved.
Approximate likelihood calculation on a phylogeny for Bayesian estimation of divergence times.
dos Reis, Mario; Yang, Ziheng
2011-07-01
The molecular clock provides a powerful way to estimate species divergence times. If information on some species divergence times is available from the fossil or geological record, it can be used to calibrate a phylogeny and estimate divergence times for all nodes in the tree. The Bayesian method provides a natural framework to incorporate different sources of information concerning divergence times, such as information in the fossil and molecular data. Current models of sequence evolution are intractable in a Bayesian setting, and Markov chain Monte Carlo (MCMC) is used to generate the posterior distribution of divergence times and evolutionary rates. This method is computationally expensive, as it involves the repeated calculation of the likelihood function. Here, we explore the use of Taylor expansion to approximate the likelihood during MCMC iteration. The approximation is much faster than conventional likelihood calculation. However, the approximation is expected to be poor when the proposed parameters are far from the likelihood peak. We explore the use of parameter transforms (square root, logarithm, and arcsine) to improve the approximation to the likelihood curve. We found that the new methods, particularly the arcsine-based transform, provided very good approximations under relaxed clock models and also under the global clock model when the global clock is not seriously violated. The approximation is poorer for analysis under the global clock when the global clock is seriously wrong and should thus not be used. The results suggest that the approximate method may be useful for Bayesian dating analysis using large data sets.
Wan, Tsai-Wen; Higuchi, Wataru; Hung, Wei-Chun; Reva, Ivan V.; Singur, Olga A.; Gostev, Vladimir V.; Sidorenko, Sergey V.; Peryanova, Olga V.; Salmina, Alla B.; Reva, Galina V.; Teng, Lee-Jene; Yamamoto, Tatsuo
2016-01-01
ST8/SCCmecIV community-associated methicillin-resistant Staphylococcus aureus (CA-MRSA) has been a common threat, with large USA300 epidemics in the United States. The global geographical structure of ST8/SCCmecIV has not yet been fully elucidated. We herein determined the complete circular genome sequence of ST8/SCCmecIVc strain OC8 from Siberian Russia. We found that 36.0% of the genome was inverted relative to USA300. Two IS256, oppositely oriented, at IS256-enriched hot spots were implicated with the one-megabase genomic inversion (MbIN) and vSaβ split. The behavior of IS256 was flexible: its insertion site (att) sequences on the genome and junction sequences of extrachromosomal circular DNA were all divergent, albeit with fixed sizes. A similar multi-IS256 system was detected, even in prevalent ST239 healthcare-associated MRSA in Russia, suggesting IS256’s strong transmission potential and advantage in evolution. Regarding epidemiology, all ST8/SCCmecIVc strains from European, Siberian, and Far Eastern Russia, examined had MbIN, and geographical expansion accompanied divergent spa types and resistance to fluoroquinolones, chloramphenicol, and often rifampicin. Russia ST8/SCCmecIVc has been associated with life-threatening infections such as pneumonia and sepsis in both community and hospital settings. Regarding virulence, the OC8 genome carried a series of toxin and immune evasion genes, a truncated giant surface protein gene, and IS256 insertion adjacent to a pan-regulatory gene. These results suggest that unique single ST8/spa1(t008)/SCCmecIVc CA-MRSA (clade, Russia ST8-IVc) emerged in Russia, and this was followed by large geographical expansion, with MbIN as an epidemiological marker, and fluoroquinolone resistance, multiple virulence factors, and possibly a multi-IS256 system as selective advantages. PMID:27741255
Wan, Tsai-Wen; Khokhlova, Olga E; Iwao, Yasuhisa; Higuchi, Wataru; Hung, Wei-Chun; Reva, Ivan V; Singur, Olga A; Gostev, Vladimir V; Sidorenko, Sergey V; Peryanova, Olga V; Salmina, Alla B; Reva, Galina V; Teng, Lee-Jene; Yamamoto, Tatsuo
2016-01-01
ST8/SCCmecIV community-associated methicillin-resistant Staphylococcus aureus (CA-MRSA) has been a common threat, with large USA300 epidemics in the United States. The global geographical structure of ST8/SCCmecIV has not yet been fully elucidated. We herein determined the complete circular genome sequence of ST8/SCCmecIVc strain OC8 from Siberian Russia. We found that 36.0% of the genome was inverted relative to USA300. Two IS256, oppositely oriented, at IS256-enriched hot spots were implicated with the one-megabase genomic inversion (MbIN) and vSaβ split. The behavior of IS256 was flexible: its insertion site (att) sequences on the genome and junction sequences of extrachromosomal circular DNA were all divergent, albeit with fixed sizes. A similar multi-IS256 system was detected, even in prevalent ST239 healthcare-associated MRSA in Russia, suggesting IS256's strong transmission potential and advantage in evolution. Regarding epidemiology, all ST8/SCCmecIVc strains from European, Siberian, and Far Eastern Russia, examined had MbIN, and geographical expansion accompanied divergent spa types and resistance to fluoroquinolones, chloramphenicol, and often rifampicin. Russia ST8/SCCmecIVc has been associated with life-threatening infections such as pneumonia and sepsis in both community and hospital settings. Regarding virulence, the OC8 genome carried a series of toxin and immune evasion genes, a truncated giant surface protein gene, and IS256 insertion adjacent to a pan-regulatory gene. These results suggest that unique single ST8/spa1(t008)/SCCmecIVc CA-MRSA (clade, Russia ST8-IVc) emerged in Russia, and this was followed by large geographical expansion, with MbIN as an epidemiological marker, and fluoroquinolone resistance, multiple virulence factors, and possibly a multi-IS256 system as selective advantages.
No evidence for MHC class II-based non-random mating at the gametic haplotype in Atlantic salmon.
Promerová, M; Alavioon, G; Tusso, S; Burri, R; Immler, S
2017-06-01
Genes of the major histocompatibility complex (MHC) are a likely target of mate choice because of their role in inbreeding avoidance and potential benefits for offspring immunocompetence. Evidence for female choice for complementary MHC alleles among competing males exists both for the pre- and the postmating stages. However, it remains unclear whether the latter may involve non-random fusion of gametes depending on gametic haplotypes resulting in transmission ratio distortion or non-random sequence divergence among fused gametes. We tested whether non-random gametic fusion of MHC-II haplotypes occurs in Atlantic salmon Salmo salar. We performed in vitro fertilizations that excluded interindividual sperm competition using a split family design with large clutch sample sizes to test for a possible role of the gametic haplotype in mate choice. We sequenced two MHC-II loci in 50 embryos per clutch to assess allelic frequencies and sequence divergence. We found no evidence for transmission ratio distortion at two linked MHC-II loci, nor for non-random gamete fusion with respect to MHC-II alleles. Our findings suggest that the gametic MHC-II haplotypes play no role in gamete association in Atlantic salmon and that earlier findings of MHC-based mate choice most likely reflect choice among diploid genotypes. We discuss possible explanations for these findings and how they differ from findings in mammals.
High levels of Y-chromosome nucleotide diversity in the genus Pan
Stone, Anne C.; Griffiths, Robert C.; Zegura, Stephen L.; Hammer, Michael F.
2002-01-01
Although some mitochondrial, X chromosome, and autosomal sequence diversity data are available for our closest relatives, Pan troglodytes and Pan paniscus, data from the nonrecombining portion of the Y chromosome (NRY) are more limited. We examined ≈3 kb of NRY DNA from 101 chimpanzees, seven bonobos, and 42 humans to investigate: (i) relative levels of intraspecific diversity; (ii) the degree of paternal lineage sorting among species and subspecies of the genus Pan; and (iii) the date of the chimpanzee/bonobo divergence. We identified 10 informative sequence-tagged sites associated with 23 polymorphisms on the NRY from the genus Pan. Nucleotide diversity was significantly higher on the NRY of chimpanzees and bonobos than on the human NRY. Similar to mtDNA, but unlike X-linked and autosomal loci, lineages defined by mutations on the NRY were not shared among subspecies of P. troglodytes. Comparisons with mtDNA ND2 sequences from some of the same individuals revealed a larger female versus male effective population size for chimpanzees. The NRY-based divergence time between chimpanzees and bonobos was estimated at ≈1.8 million years ago. In contrast to human populations who appear to have had a low effective size and a recent origin with subsequent population growth, some taxa within the genus Pan may be characterized by large populations of relatively constant size, more ancient origins, and high levels of subdivision. PMID:11756656
2015-01-01
Culex pipiens, an invasive mosquito and vector of West Nile virus in the US, has two morphologically indistinguishable forms that differ dramatically in behavior and physiology. Cx. pipiens form pipiens is primarily a bird-feeding temperate mosquito, while the sub-tropical Cx. pipiens form molestus thrives in sewers and feeds on mammals. Because the feral form can diapause during the cold winters but the domestic form cannot, the two Cx. pipiens forms are allopatric in northern Europe and, although viable, hybrids are rare. Cx. pipiens form molestus has spread across all inhabited continents and hybrids of the two forms are common in the US. Here we elucidate the genes and gene families with the greatest divergence rates between these phenotypically diverged mosquito populations, and discuss them in light of their potential biological and ecological effects. After generating and assembling novel transcriptome data for each population, we performed pairwise tests for nonsynonymous divergence (Ka) of homologous coding sequences and examined gene ontology terms that were statistically over-represented in those sequences with the greatest divergence rates. We identified genes involved in digestion (serine endopeptidases), innate immunity (fibrinogens and α-macroglobulins), hemostasis (D7 salivary proteins), olfaction (odorant binding proteins) and chitin binding (peritrophic matrix proteins). By examining molecular divergence between closely related yet phenotypically divergent forms of the same species, our results provide insights into the identity of rapidly-evolving genes between incipient species. Additionally, we found that families of signal transducers, ATP synthases and transcription regulators remained identical at the amino acid level, thus constituting conserved components of the Cx. pipiens proteome. We provide a reference with which to gauge the divergence reported in this analysis by performing a comparison of transcriptome sequences from conspecific (yet allopatric) populations of another member of the Cx. pipiens complex, Cx. quinquefasciatus. PMID:25755934
[Hepatitis C virus: sequence homology of a European isolate and divergence from the prototype].
Seelig, R; Seelig, H P; Renz, M
1991-08-01
The polymerase chain reaction (PCR) detected specific hepatitis C viral (HCV) RNA sequences in liver biopsies from two patients with chronic hepatitis, in the tissue of a liver implantate, in plasma from four chronic non-A, non-B hepatitis (NANBH) patients and, for the first time, in an infectious anti-D-immunoglobulin preparation. A comparison of the viral sequences coding for a region for the nonstructural NS3 protein from the liver tissues revealed only a very small degree of sequence divergence on the cDNA as well as on the amino acid level (between 0 and 5%). The sequence similarities of the RNA isolated from plasma of the four chronic NANBH patients and the anti-D-immunoglobulin preparation were partly somewhat lower but altogether also high (between 90 and 100%). In contrast, all eight cDNA and amino acid sequences exhibited a significantly higher degree of divergence in comparison with the HCV prototype sequence (between 29 and 32%) than among themselves (between 0 and 10%). This unexpected high sequence similarity of the eight European isolates and their low homology to the Northamerican prototype sequence is indicative for the existence of different types of HCV. This will be important not only for epidemiological studies but also for the development of effective diagnostic procedures and vaccines. Concerning the pathogenesis of NANBH, a double infection or a helper mechanism has to be considered: in addition to the C virus, sequences of an other virus particle were found in the infectious IgG preparation as well as in the liver biopsies.
Horai, S; Hayasaka, K; Kondo, R; Tsugane, K; Takahata, N
1995-01-01
We analyzed the complete mitochondrial DNA (mtDNA) sequences of three humans (African, European, and Japanese), three African apes (common and pygmy chimpanzees, and gorilla), and one orangutan in an attempt to estimate most accurately the substitution rates and divergence times of hominoid mtDNAs. Nonsynonymous substitutions and substitutions in RNA genes have accumulated with an approximately clock-like regularity. From these substitutions and under the assumption that the orangutan and African apes diverged 13 million years ago, we obtained a divergence time for humans and chimpanzees of 4.9 million years. This divergence time permitted calibration of the synonymous substitution rate (3.89 x 10(-8)/site per year). To obtain the substitution rate in the displacement (D)-loop region, we compared the three human mtDNAs and measured the relative abundance of substitutions in the D-loop region and at synonymous sites. The estimated substitution rate in the D-loop region was 7.00 x 10(-8)/site per year. Using both synonymous and D-loop substitutions, we inferred the age of the last common ancestor of the human mtDNAs as 143,000 +/- 18,000 years. The shallow ancestry of human mtDNAs, together with the observation that the African sequence is the most diverged among humans, strongly supports the recent African origin of modern humans, Homo sapiens sapiens. PMID:7530363
Population genomics of parallel hybrid zones in the mimetic butterflies, H. melpomene and H. erato
Ruiz, Mayté; Salazar, Patricio; Counterman, Brian; Medina, Jose Alejandro; Ortiz-Zuazaga, Humberto; Morrison, Anna; Papa, Riccardo
2014-01-01
Hybrid zones can be valuable tools for studying evolution and identifying genomic regions responsible for adaptive divergence and underlying phenotypic variation. Hybrid zones between subspecies of Heliconius butterflies can be very narrow and are maintained by strong selection acting on color pattern. The comimetic species, H. erato and H. melpomene, have parallel hybrid zones in which both species undergo a change from one color pattern form to another. We use restriction-associated DNA sequencing to obtain several thousand genome-wide sequence markers and use these to analyze patterns of population divergence across two pairs of parallel hybrid zones in Peru and Ecuador. We compare two approaches for analysis of this type of data—alignment to a reference genome and de novo assembly—and find that alignment gives the best results for species both closely (H. melpomene) and distantly (H. erato, ∼15% divergent) related to the reference sequence. Our results confirm that the color pattern controlling loci account for the majority of divergent regions across the genome, but we also detect other divergent regions apparently unlinked to color pattern differences. We also use association mapping to identify previously unmapped color pattern loci, in particular the Ro locus. Finally, we identify a new cryptic population of H. timareta in Ecuador, which occurs at relatively low altitude and is mimetic with H. melpomene malleti. PMID:24823669
Chloroplast Genome Evolution in Early Diverged Leptosporangiate Ferns
Kim, Hyoung Tae; Chung, Myong Gi; Kim, Ki-Joong
2014-01-01
In this study, the chloroplast (cp) genome sequences from three early diverged leptosporangiate ferns were completed and analyzed in order to understand the evolution of the genome of the fern lineages. The complete cp genome sequence of Osmunda cinnamomea (Osmundales) was 142,812 base pairs (bp). The cp genome structure was similar to that of eusporangiate ferns. The gene/intron losses that frequently occurred in the cp genome of leptosporangiate ferns were not found in the cp genome of O. cinnamomea. In addition, putative RNA editing sites in the cp genome were rare in O. cinnamomea, even though the sites were frequently predicted to be present in leptosporangiate ferns. The complete cp genome sequence of Diplopterygium glaucum (Gleicheniales) was 151,007 bp and has a 9.7 kb inversion between the trnL-CAA and trnV-GCA genes when compared to O. cinnamomea. Several repeated sequences were detected around the inversion break points. The complete cp genome sequence of Lygodium japonicum (Schizaeales) was 157,142 bp and a deletion of the rpoC1 intron was detected. This intron loss was shared by all of the studied species of the genus Lygodium. The GC contents and the effective numbers of co-dons (ENCs) in ferns varied significantly when compared to seed plants. The ENC values of the early diverged leptosporangiate ferns showed intermediate levels between eusporangiate and core leptosporangiate ferns. However, our phylogenetic tree based on all of the cp gene sequences clearly indicated that the cp genome similarity between O. cinnamomea (Osmundales) and eusporangiate ferns are symplesiomorphies, rather than synapomorphies. Therefore, our data is in agreement with the view that Osmundales is a distinct early diverged lineage in the leptosporangiate ferns. PMID:24823358
Chloroplast genome evolution in early diverged leptosporangiate ferns.
Kim, Hyoung Tae; Chung, Myong Gi; Kim, Ki-Joong
2014-05-01
In this study, the chloroplast (cp) genome sequences from three early diverged leptosporangiate ferns were completed and analyzed in order to understand the evolution of the genome of the fern lineages. The complete cp genome sequence of Osmunda cinnamomea (Osmundales) was 142,812 base pairs (bp). The cp genome structure was similar to that of eusporangiate ferns. The gene/intron losses that frequently occurred in the cp genome of leptosporangiate ferns were not found in the cp genome of O. cinnamomea. In addition, putative RNA editing sites in the cp genome were rare in O. cinnamomea, even though the sites were frequently predicted to be present in leptosporangiate ferns. The complete cp genome sequence of Diplopterygium glaucum (Gleicheniales) was 151,007 bp and has a 9.7 kb inversion between the trnL-CAA and trnVGCA genes when compared to O. cinnamomea. Several repeated sequences were detected around the inversion break points. The complete cp genome sequence of Lygodium japonicum (Schizaeales) was 157,142 bp and a deletion of the rpoC1 intron was detected. This intron loss was shared by all of the studied species of the genus Lygodium. The GC contents and the effective numbers of codons (ENCs) in ferns varied significantly when compared to seed plants. The ENC values of the early diverged leptosporangiate ferns showed intermediate levels between eusporangiate and core leptosporangiate ferns. However, our phylogenetic tree based on all of the cp gene sequences clearly indicated that the cp genome similarity between O. cinnamomea (Osmundales) and eusporangiate ferns are symplesiomorphies, rather than synapomorphies. Therefore, our data is in agreement with the view that Osmundales is a distinct early diverged lineage in the leptosporangiate ferns.
NASA Astrophysics Data System (ADS)
von Beeren, Christoph; Stoeckle, Mark Y.; Xia, Joyce; Burke, Griffin; Kronauer, Daniel J. C.
2015-02-01
DNA barcoding promises to be a useful tool to identify pest species assuming adequate representation of genetic variants in a reference library. Here we examined mitochondrial DNA barcodes in a global urban pest, the American cockroach (Periplaneta americana). Our sampling effort generated 284 cockroach specimens, most from New York City, plus 15 additional U.S. states and six other countries, enabling the first large-scale survey of P. americana barcode variation. Periplaneta americana barcode sequences (n = 247, including 24 GenBank records) formed a monophyletic lineage separate from other Periplaneta species. We found three distinct P. americana haplogroups with relatively small differences within (<=0.6%) and larger differences among groups (2.4%-4.7%). This could be interpreted as indicative of multiple cryptic species. However, nuclear DNA sequences (n = 77 specimens) revealed extensive gene flow among mitochondrial haplogroups, confirming a single species. This unusual genetic pattern likely reflects multiple introductions from genetically divergent source populations, followed by interbreeding in the invasive range. Our findings highlight the need for comprehensive reference databases in DNA barcoding studies, especially when dealing with invasive populations that might be derived from multiple genetically distinct source populations.
Fourment, Mathieu; Holmes, Edward C
2014-07-24
Early methods for estimating divergence times from gene sequence data relied on the assumption of a molecular clock. More sophisticated methods were created to model rate variation and used auto-correlation of rates, local clocks, or the so called "uncorrelated relaxed clock" where substitution rates are assumed to be drawn from a parametric distribution. In the case of Bayesian inference methods the impact of the prior on branching times is not clearly understood, and if the amount of data is limited the posterior could be strongly influenced by the prior. We develop a maximum likelihood method--Physher--that uses local or discrete clocks to estimate evolutionary rates and divergence times from heterochronous sequence data. Using two empirical data sets we show that our discrete clock estimates are similar to those obtained by other methods, and that Physher outperformed some methods in the estimation of the root age of an influenza virus data set. A simulation analysis suggests that Physher can outperform a Bayesian method when the real topology contains two long branches below the root node, even when evolution is strongly clock-like. These results suggest it is advisable to use a variety of methods to estimate evolutionary rates and divergence times from heterochronous sequence data. Physher and the associated data sets used here are available online at http://code.google.com/p/physher/.
Lashbrook, C C; Gonzalez-Bosch, C; Bennett, A B
1994-01-01
Two structurally divergent endo-beta-1,4-glucanase (EGase) cDNAs were cloned from tomato. Although both cDNAs (Cel1 and Cel2) encode potentially glycosylated, basic proteins of 51 to 53 kD and possess multiple amino acid domains conserved in both plant and microbial EGases, Cel1 and Cel2 exhibit only 50% amino acid identity at the overall sequence level. Amino acid sequence comparisons to other plant EGases indicate that tomato Cel1 is most similar to bean abscission zone EGase (68%), whereas Cel2 exhibits greatest sequence identity to avocado fruit EGase (57%). Sequence comparisons suggest the presence of at least two structurally divergent EGase families in plants. Unlike ripening avocado fruit and bean abscission zones in which a single EGase mRNA predominates, EGase expression in tomato reflects the overlapping accumulation of both Cel1 and Cel2 transcripts in ripening fruit and in plant organs undergoing cell separation. Cel1 mRNA contributes significantly to total EGase mRNA accumulation within plant organs undergoing cell separation (abscission zones and mature anthers), whereas Cel2 mRNA is most abundant in ripening fruit. The overlapping expression of divergent EGase genes within a single species may suggest that multiple activities are required for the cooperative disassembly of cell wall components during fruit ripening, floral abscission, and anther dehiscence. PMID:7994180
A DNA Barcode Library for North American Ephemeroptera: Progress and Prospects
Webb, Jeffrey M.; Jacobus, Luke M.; Funk, David H.; Zhou, Xin; Kondratieff, Boris; Geraci, Christy J.; DeWalt, R. Edward; Baird, Donald J.; Richard, Barton; Phillips, Iain; Hebert, Paul D. N.
2012-01-01
DNA barcoding of aquatic macroinvertebrates holds much promise as a tool for taxonomic research and for providing the reliable identifications needed for water quality assessment programs. A prerequisite for identification using barcodes is a reliable reference library. We gathered 4165 sequences from the barcode region of the mitochondrial cytochrome c oxidase subunit I gene representing 264 nominal and 90 provisional species of mayflies (Insecta: Ephemeroptera) from Canada, Mexico, and the United States. No species shared barcode sequences and all can be identified with barcodes with the possible exception of some Caenis. Minimum interspecific distances ranged from 0.3–24.7% (mean: 12.5%), while the average intraspecific divergence was 1.97%. The latter value was inflated by the presence of very high divergences in some taxa. In fact, nearly 20% of the species included two or three haplotype clusters showing greater than 5.0% sequence divergence and some values are as high as 26.7%. Many of the species with high divergences are polyphyletic and likely represent species complexes. Indeed, many of these polyphyletic species have numerous synonyms and individuals in some barcode clusters show morphological attributes characteristic of the synonymized species. In light of our findings, it is imperative that type or topotype specimens be sequenced to correctly associate barcode clusters with morphological species concepts and to determine the status of currently synonymized species. PMID:22666447
Lischer, Heidi E L; Excoffier, Laurent; Heckel, Gerald
2014-04-01
Phylogenetic reconstruction of the evolutionary history of closely related organisms may be difficult because of the presence of unsorted lineages and of a relatively high proportion of heterozygous sites that are usually not handled well by phylogenetic programs. Genomic data may provide enough fixed polymorphisms to resolve phylogenetic trees, but the diploid nature of sequence data remains analytically challenging. Here, we performed a phylogenomic reconstruction of the evolutionary history of the common vole (Microtus arvalis) with a focus on the influence of heterozygosity on the estimation of intraspecific divergence times. We used genome-wide sequence information from 15 voles distributed across the European range. We provide a novel approach to integrate heterozygous information in existing phylogenetic programs by repeated random haplotype sampling from sequences with multiple unphased heterozygous sites. We evaluated the impact of the use of full, partial, or no heterozygous information for tree reconstructions on divergence time estimates. All results consistently showed four deep and strongly supported evolutionary lineages in the vole data. These lineages undergoing divergence processes split only at the end or after the last glacial maximum based on calibration with radiocarbon-dated paleontological material. However, the incorporation of information from heterozygous sites had a significant impact on absolute and relative branch length estimations. Ignoring heterozygous information led to an overestimation of divergence times between the evolutionary lineages of M. arvalis. We conclude that the exclusion of heterozygous sites from evolutionary analyses may cause biased and misleading divergence time estimates in closely related taxa.
Design and construction of 2A peptide-linked multicistronic vectors.
Szymczak-Workman, Andrea L; Vignali, Kate M; Vignali, Dario A A
2012-02-01
The need for reliable, multicistronic vectors for multigene delivery is at the forefront of biomedical technology. This article describes the design and construction of 2A peptide-linked multicistronic vectors, which can be used to express multiple proteins from a single open reading frame (ORF). The small 2A peptide sequences, when cloned between genes, allow for efficient, stoichiometric production of discrete protein products within a single vector through a novel "cleavage" event within the 2A peptide sequence. Expression of more than two genes using conventional approaches has several limitations, most notably imbalanced protein expression and large size. The use of 2A peptide sequences alleviates these concerns. They are small (18-22 amino acids) and have divergent amino-terminal sequences, which minimizes the chance for homologous recombination and allows for multiple, different 2A peptide sequences to be used within a single vector. Importantly, separation of genes placed between 2A peptide sequences is nearly 100%, which allows for stoichiometric and concordant expression of the genes, regardless of the order of placement within the vector.
Hirata, Daisuke; Mano, Tsutomu; Abramov, Alexei V; Baryshnikov, Gennady F; Kosintsev, Pavel A; Vorobiev, Alexandr A; Raichev, Evgeny G; Tsunoda, Hiroshi; Kaneko, Yayoi; Murata, Koichi; Fukui, Daisuke; Masuda, Ryuichi
2013-07-01
To further elucidate the migration history of the brown bears (Ursus arctos) on Hokkaido Island, Japan, we analyzed the complete mitochondrial DNA (mtDNA) sequences of 35 brown bears from Hokkaido, the southern Kuril Islands (Etorofu and Kunashiri), Sakhalin Island, and the Eurasian Continent (continental Russia, Bulgaria, and Tibet), and those of four polar bears. Based on these sequences, we reconstructed the maternal phylogeny of the brown bear and estimated divergence times to investigate the timing of brown bear migrations, especially in northeastern Eurasia. Our gene tree showed the mtDNA haplotypes of all 73 brown and polar bears to be divided into eight divergent lineages. The brown bear on Hokkaido was divided into three lineages (central, eastern, and southern). The Sakhalin brown bear grouped with eastern European and western Alaskan brown bears. Etorofu and Kunashiri brown bears were closely related to eastern Hokkaido brown bears and could have diverged from the eastern Hokkaido lineage after formation of the channel between Hokkaido and the southern Kuril Islands. Tibetan brown bears diverged early in the eastern lineage. Southern Hokkaido brown bears were closely related to North American brown bears.
Han, Xiang Y; Sizer, Kurt C; Thompson, Erika J; Kabanja, Juma; Li, Jun; Hu, Peter; Gómez-Valero, Laura; Silva, Francisco J
2009-10-01
Mycobacterium lepromatosis is a newly discovered leprosy-causing organism. Preliminary phylogenetic analysis of its 16S rRNA gene and a few other gene segments revealed significant divergence from Mycobacterium leprae, a well-known cause of leprosy, that justifies the status of M. lepromatosis as a new species. In this study we analyzed the sequences of 20 genes and pseudogenes (22,814 nucleotides). Overall, the level of matching of these sequences with M. leprae sequences was 90.9%, which substantiated the species-level difference; the levels of matching for the 16S rRNA genes and 14 protein-encoding genes were 98.0% and 93.1%, respectively, but the level of matching for five pseudogenes was only 79.1%. Five conserved protein-encoding genes were selected to construct phylogenetic trees and to calculate the numbers of synonymous substitutions (dS values) and nonsynonymous substitutions (dN values) in the two species. Robust phylogenetic trees constructed using concatenated alignment of these genes placed M. lepromatosis and M. leprae in a tight cluster with long terminal branches, implying that the divergence occurred long ago. The dS and dN values were also much higher than those for other closest pairs of mycobacteria. The dS values were 14 to 28% of the dS values for M. leprae and Mycobacterium tuberculosis, a more divergent pair of species. These results thus indicate that M. lepromatosis and M. leprae diverged approximately 10 million years ago. The M. lepromatosis pseudogenes analyzed that were also pseudogenes in M. leprae showed nearly neutral evolution, and their relative ages were similar to those of M. leprae pseudogenes, suggesting that they were pseudogenes before divergence. Taken together, the results described above indicate that M. lepromatosis and M. leprae diverged from a common ancestor after the massive gene inactivation event described previously for M. leprae.
The genome sequence of the model ascomycete fungus Podospora anserina.
Espagne, Eric; Lespinet, Olivier; Malagnac, Fabienne; Da Silva, Corinne; Jaillon, Olivier; Porcel, Betina M; Couloux, Arnaud; Aury, Jean-Marc; Ségurens, Béatrice; Poulain, Julie; Anthouard, Véronique; Grossetete, Sandrine; Khalili, Hamid; Coppin, Evelyne; Déquard-Chablat, Michelle; Picard, Marguerite; Contamine, Véronique; Arnaise, Sylvie; Bourdais, Anne; Berteaux-Lecellier, Véronique; Gautheret, Daniel; de Vries, Ronald P; Battaglia, Evy; Coutinho, Pedro M; Danchin, Etienne Gj; Henrissat, Bernard; Khoury, Riyad El; Sainsard-Chanet, Annie; Boivin, Antoine; Pinan-Lucarré, Bérangère; Sellem, Carole H; Debuchy, Robert; Wincker, Patrick; Weissenbach, Jean; Silar, Philippe
2008-01-01
The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development. We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed sequence tag collection. Similar to higher eukaryotes, the P. anserina transcription/splicing machinery generates numerous non-conventional transcripts. Comparison of the P. anserina genome and orthologous gene set with the one of its close relatives, Neurospora crassa, shows that synteny is poorly conserved, the main result of evolution being gene shuffling in the same chromosome. The P. anserina genome contains fewer repeated sequences and has evolved new genes by duplication since its separation from N. crassa, despite the presence of the repeat induced point mutation mechanism that mutates duplicated sequences. We also provide evidence that frequent gene loss took place in the lineages leading to P. anserina and N. crassa. P. anserina contains a large and highly specialized set of genes involved in utilization of natural carbon sources commonly found in its natural biotope. It includes genes potentially involved in lignin degradation and efficient cellulose breakdown. The features of the P. anserina genome indicate a highly dynamic evolution since the divergence of P. anserina and N. crassa, leading to the ability of the former to use specific complex carbon sources that match its needs in its natural biotope.
Cenci, Albero; Guignon, Valentin; Roux, Nicolas; Rouard, Mathieu
2014-05-01
Identifying the molecular mechanisms underlying tolerance to abiotic stresses is important in crop breeding. A comprehensive understanding of the gene families associated with drought tolerance is therefore highly relevant. NAC transcription factors form a large plant-specific gene family involved in the regulation of tissue development and responses to biotic and abiotic stresses. The main goal of this study was to set up a framework of orthologous groups determined by an expert sequence comparison of NAC genes from both monocots and dicots. In order to clarify the orthologous relationships among NAC genes of different species, we performed an in-depth comparative study of four divergent taxa, in dicots and monocots, whose genomes have already been completely sequenced: Arabidopsis thaliana, Vitis vinifera, Musa acuminata and Oryza sativa. Due to independent evolution, NAC copy number is highly variable in these plant genomes. Based on an expert NAC sequence comparison, we propose forty orthologous groups of NAC sequences that were probably derived from an ancestor gene present in the most recent common ancestor of dicots and monocots. These orthologous groups provide a curated resource for large-scale protein sequence annotation of NAC transcription factors. The established orthology relationships also provide a useful reference for NAC function studies in newly sequenced genomes such as M. acuminata and other plant species.
Pohl, Nélida; Sison-Mangus, Marilou P; Yee, Emily N; Liswi, Saif W; Briscoe, Adriana D
2009-01-01
Background The increase in availability of genomic sequences for a wide range of organisms has revealed gene duplication to be a relatively common event. Encounters with duplicate gene copies have consequently become almost inevitable in the context of collecting gene sequences for inferring species trees. Here we examine the effect of incorporating duplicate gene copies evolving at different rates on tree reconstruction and time estimation of recent and deep divergences in butterflies. Results Sequences from ultraviolet-sensitive (UVRh), blue-sensitive (BRh), and long-wavelength sensitive (LWRh) opsins,EF-1α and COI were obtained from 27 taxa representing the five major butterfly families (5535 bp total). Both BRh and LWRh are present in multiple copies in some butterfly lineages and the different copies evolve at different rates. Regardless of the phylogenetic reconstruction method used, we found that analyses of combined data sets using either slower or faster evolving copies of duplicate genes resulted in a single topology in agreement with our current understanding of butterfly family relationships based on morphology and molecules. Interestingly, individual analyses of BRh and LWRh sequences also recovered these family-level relationships. Two different relaxed clock methods resulted in similar divergence time estimates at the shallower nodes in the tree, regardless of whether faster or slower evolving copies were used, with larger discrepancies observed at deeper nodes in the phylogeny. The time of divergence between the monarch butterfly Danaus plexippus and the queen D. gilippus (15.3–35.6 Mya) was found to be much older than the time of divergence between monarch co-mimic Limenitis archippus and red-spotted purple L. arthemis (4.7–13.6 Mya), and overlapping with the time of divergence of the co-mimetic passionflower butterflies Heliconius erato and H. melpomene (13.5–26.1 Mya). Our family-level results are congruent with recent estimates found in the literature and indicate an age of 84–113 million years for the divergence of all butterfly families. Conclusion These results are consistent with diversification of the butterfly families following the radiation of angiosperms and suggest that some classes of opsin genes may be usefully employed for both phylogenetic reconstruction and divergence time estimation. PMID:19439087
Chan, Yvonne H.; Venev, Sergey V.; Zeldovich, Konstantin B.; Matthews, C. Robert
2017-01-01
Sequence divergence of orthologous proteins enables adaptation to environmental stresses and promotes evolution of novel functions. Limits on evolution imposed by constraints on sequence and structure were explored using a model TIM barrel protein, indole-3-glycerol phosphate synthase (IGPS). Fitness effects of point mutations in three phylogenetically divergent IGPS proteins during adaptation to temperature stress were probed by auxotrophic complementation of yeast with prokaryotic, thermophilic IGPS. Analysis of beneficial mutations pointed to an unexpected, long-range allosteric pathway towards the active site of the protein. Significant correlations between the fitness landscapes of distant orthologues implicate both sequence and structure as primary forces in defining the TIM barrel fitness landscape and suggest that fitness landscapes can be translocated in sequence space. Exploration of fitness landscapes in the context of a protein fold provides a strategy for elucidating the sequence-structure-fitness relationships in other common motifs. PMID:28262665
Transcription Start Site Evolution in Drosophila
Main, Bradley J.; Smith, Andrew D.; Jang, Hyosik; Nuzhdin, Sergey V.
2013-01-01
Transcription start site (TSS) evolution remains largely undescribed in Drosophila, likely due to limited annotations in non-melanogaster species. In this study, we introduce a concise new method that selectively sequences from the 5′-end of mRNA and used it to identify TSS in four Drosophila species, including Drosophila melanogaster, D. simulans, D. sechellia, and D. pseudoobscura. For verification, we compared our results in D. melanogaster with known annotations, published 5′-rapid amplification of cDNA ends data, and with RNAseq from the same mRNA pool. Then, we paired 2,849 D. melanogaster TSS with its closest equivalent TSS in each species (likely to be its true ortholog) using the available multiple sequence alignments. Most of the D. melanogaster TSSs were successfully paired with an ortholog in each species (83%, 86%, and 55% for D. simulans, D. sechellia, and D. pseudoobscura, respectively). On the basis of the number and distribution of reads mapped at each TSS, we also estimated promoter-specific expression (PSE) and TSS peak shape, respectively. Among paired TSS orthologs, the location and promoter activity were largely conserved. TSS location appears important as PSE, and TSS peak shape was more frequently divergent among TSS that had moved. Unpaired TSS were surprisingly common in D. pseudoobscura. An increased mutation rate upstream of TSS might explain this pattern. We found an enrichment of ribosomal protein genes among diverged TSS, suggesting that TSS evolution is not uniform across the genome. PMID:23649539
Jeukens, Julie; Bernatchez, Louis
2012-01-01
While gene expression divergence is known to be involved in adaptive phenotypic divergence and speciation, the relative importance of regulatory and structural evolution of genes is poorly understood. A recent next-generation sequencing experiment allowed identifying candidate genes potentially involved in the ongoing speciation of sympatric dwarf and normal lake whitefish (Coregonus clupeaformis), such as cytosolic malate dehydrogenase (MDH1), which showed both significant expression and sequence divergence. The main goal of this study was to investigate into more details the signatures of natural selection in the regulatory and coding sequences of MDH1 in lake whitefish and test for parallelism of these signatures with other coregonine species. Sequencing of the two regions in 118 fish from four sympatric pairs of whitefish and two cisco species revealed a total of 35 single nucleotide polymorphisms (SNPs), with more genetic diversity in European compared to North American coregonine species. While the coding region was found to be under purifying selection, an SNP in the proximal promoter exhibited significant allele frequency divergence in a parallel manner among independent sympatric pairs of North American lake whitefish and European whitefish (C. lavaretus). According to transcription factor binding simulation for 22 regulatory haplotypes of MDH1, putative binding profiles were fairly conserved among species, except for the region around this SNP. Moreover, we found evidence for the role of this SNP in the regulation of MDH1 expression level. Overall, these results provide further evidence for the role of natural selection in gene regulation evolution among whitefish species pairs and suggest its possible link with patterns of phenotypic diversity observed in coregonine species. PMID:22408741
Jeukens, Julie; Bernatchez, Louis
2012-01-01
While gene expression divergence is known to be involved in adaptive phenotypic divergence and speciation, the relative importance of regulatory and structural evolution of genes is poorly understood. A recent next-generation sequencing experiment allowed identifying candidate genes potentially involved in the ongoing speciation of sympatric dwarf and normal lake whitefish (Coregonus clupeaformis), such as cytosolic malate dehydrogenase (MDH1), which showed both significant expression and sequence divergence. The main goal of this study was to investigate into more details the signatures of natural selection in the regulatory and coding sequences of MDH1 in lake whitefish and test for parallelism of these signatures with other coregonine species. Sequencing of the two regions in 118 fish from four sympatric pairs of whitefish and two cisco species revealed a total of 35 single nucleotide polymorphisms (SNPs), with more genetic diversity in European compared to North American coregonine species. While the coding region was found to be under purifying selection, an SNP in the proximal promoter exhibited significant allele frequency divergence in a parallel manner among independent sympatric pairs of North American lake whitefish and European whitefish (C. lavaretus). According to transcription factor binding simulation for 22 regulatory haplotypes of MDH1, putative binding profiles were fairly conserved among species, except for the region around this SNP. Moreover, we found evidence for the role of this SNP in the regulation of MDH1 expression level. Overall, these results provide further evidence for the role of natural selection in gene regulation evolution among whitefish species pairs and suggest its possible link with patterns of phenotypic diversity observed in coregonine species.
Patterns and rates of intron divergence between humans and chimpanzees
Gazave, Elodie; Marqués-Bonet, Tomàs; Fernando, Olga; Charlesworth, Brian; Navarro, Arcadi
2007-01-01
Background Introns, which constitute the largest fraction of eukaryotic genes and which had been considered to be neutral sequences, are increasingly acknowledged as having important functions. Several studies have investigated levels of evolutionary constraint along introns and across classes of introns of different length and location within genes. However, thus far these studies have yielded contradictory results. Results We present the first analysis of human-chimpanzee intron divergence, in which differences in the number of substitutions per intronic site (Ki) can be interpreted as the footprint of different intensities and directions of the pressures of natural selection. Our main findings are as follows: there was a strong positive correlation between intron length and divergence; there was a strong negative correlation between intron length and GC content; and divergence rates vary along introns and depending on their ordinal position within genes (for instance, first introns are more GC rich, longer and more divergent, and divergence is lower at the 3' and 5' ends of all types of introns). Conclusion We show that the higher divergence of first introns is related to their larger size. Also, the lower divergence of short introns suggests that they may harbor a relatively greater proportion of regulatory elements than long introns. Moreover, our results are consistent with the presence of functionally relevant sequences near the 5' and 3' ends of introns. Finally, our findings suggest that other parts of introns may also be under selective constraints. PMID:17309804
Lexer, C; Wüest, R O; Mangili, S; Heuertz, M; Stölting, K N; Pearman, P B; Forest, F; Salamin, N; Zimmermann, N E; Bossolini, E
2014-09-01
Understanding the drivers of population divergence, speciation and species persistence is of great interest to molecular ecology, especially for species-rich radiations inhabiting the world's biodiversity hotspots. The toolbox of population genomics holds great promise for addressing these key issues, especially if genomic data are analysed within a spatially and ecologically explicit context. We have studied the earliest stages of the divergence continuum in the Restionaceae, a species-rich and ecologically important plant family of the Cape Floristic Region (CFR) of South Africa, using the widespread CFR endemic Restio capensis (L.) H.P. Linder & C.R. Hardy as an example. We studied diverging populations of this morphotaxon for plastid DNA sequences and >14 400 nuclear DNA polymorphisms from Restriction site Associated DNA (RAD) sequencing and analysed the results jointly with spatial, climatic and phytogeographic data, using a Bayesian generalized linear mixed modelling (GLMM) approach. The results indicate that population divergence across the extreme environmental mosaic of the CFR is mostly driven by isolation by environment (IBE) rather than isolation by distance (IBD) for both neutral and non-neutral markers, consistent with genome hitchhiking or coupling effects during early stages of divergence. Mixed modelling of plastid DNA and single divergent outlier loci from a Bayesian genome scan confirmed the predominant role of climate and pointed to additional drivers of divergence, such as drift and ecological agents of selection captured by phytogeographic zones. Our study demonstrates the usefulness of population genomics for disentangling the effects of IBD and IBE along the divergence continuum often found in species radiations across heterogeneous ecological landscapes. © 2014 John Wiley & Sons Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hamilton, A T; Huntley, S; Tran-Gyamfi, M
Although most genes are conserved as one-to-one orthologs in different mammalian orders, certain gene families have evolved to comprise different numbers and types of protein-coding genes through independent series of gene duplications, divergence and gene loss in each evolutionary lineage. One such family encodes KRAB-zinc finger (KRAB-ZNF) genes, which are likely to function as transcriptional repressors. One KRAB-ZNF subfamily, the ZNF91 clade, has expanded specifically in primates to comprise more than 110 loci in the human genome, yielding large gene clusters in human chromosomes 19 and 7 and smaller clusters or isolated copies at other chromosomal locations. Although phylogenetic analysismore » indicates that many of these genes arose before the split between old world monkeys and new world monkeys, the ZNF91 subfamily has continued to expand and diversify throughout the evolution of apes and humans. The paralogous loci are distinguished by sequence divergence within their zinc finger arrays indicating a selection for proteins with different DNA binding specificities. RT-PCR and in situ hybridization data show that some of these ZNF genes can have tissue-specific expression patterns, however many KRAB-ZNFs that are near-ubiquitous could also be playing very specific roles in halting target pathways in all tissues except for a few, where the target is released by the absence of its repressor. The number of variant KRAB-ZNF proteins is increased not only because of the large number of loci, but also because many loci can produce multiple splice variants, which because of the modular structure of these genes may have separate and perhaps even conflicting regulatory roles. The lineage-specific duplication and rapid divergence of this family of transcription factor genes suggests a role in determining species-specific biological differences and the evolution of novel primate traits.« less
Ribeiro, José R de A; Carvalho, Patrícia M B de; Cabral, Anderson de S; Macrae, Andrew; Mendonça-Hagler, Leda C S; Berbara, Ricardo L L; Hagler, Allen N
2011-10-01
A novel yeast species within the Metschnikowiaceae is described based on a strain from the sugarcane (Saccharum sp.) rhizoplane of an organically managed farm in Rio de Janeiro, Brazil. The D1/D2 domain of the large subunit ribosomal RNA gene sequence analysis showed that the closest related species were Candida tsuchiyae with 86.2% and Candida thailandica with 86.7% of sequence identity. All three are anamorphs in the Clavispora opuntiae clade. The name Candida middelhoveniana sp. nov. is proposed to accommodate this highly divergent organism with the type strain Instituto de Microbiologia, Universidade Federal do Rio de Janeiro (IMUFRJ) 51965(T) (=Centraalbureau voor Schimmelcultures (CBS) 12306(T), Universidade Federal de Minas Gerais (UFMG)-70(T), DBVPG 8031(T)) and the GenBank/EMBL/DDBJ accession number for the D1/D2 domain LSU rDNA sequence is FN428871. The Mycobank deposit number is MB 519801.
Palopoli, M F; Wu, C I
1994-10-01
To study the genetic differences responsible for the sterility of their male hybrids, we introgressed small segments of an X chromosome from Drosophila simulans into a pure Drosophila mauritiana genetic background, then assessed the fertility of males carrying heterospecific introgressions of varying size. Although this analysis examined less than 20% of the X chromosome (roughly 5% of the euchromatic portion of the D. simulans genome), and the segments were introgressed in only one direction, a minimum of four factors that contribute to hybrid male sterility were revealed. At least two of the factors exhibited strong epistasis: males carrying either factor alone were consistently fertile, whereas males carrying both factors together were always sterile. Distinct spermatogenic phenotypes were observed for sterile introgressions of different lengths, and it appeared that an interaction between introgressed segments also influenced the stage of spermatogenic defect. Males with one category of introgression often produced large quantities of motile sperm and were observed copulating, but never inseminated females. Evidently these two species have diverged at a large number of loci which have varied effects on hybrid male fertility. By extrapolation, we estimate that there are at least 40 such loci on the X chromosome alone. Because these species exhibit little DNA-sequence divergence at arbitrarily chosen loci, it seems unlikely that the extensive functional divergence observed could be due mainly to random genetic drift. Significant epistasis between conspecific genes appears to be a common component of hybrid sterility between recently diverged species of Drosophila. The linkage relationships of interacting factors could shed light on the role played by epistatic selection in the dynamics of the allele substitutions responsible for reproductive barriers between species.
Palopoli, M. F.; Wu, C. I.
1994-01-01
To study the genetic differences responsible for the sterility of their male hybrids, we introgressed small segments of an X chromosome from Drosophila simulans into a pure Drosophila mauritiana genetic background, then assessed the fertility of males carrying heterospecific introgressions of varying size. Although this analysis examined less than 20% of the X chromosome (roughly 5% of the euchromatic portion of the D. simulans genome), and the segments were introgressed in only one direction, a minimum of four factors that contribute to hybrid male sterility were revealed. At least two of the factors exhibited strong epistasis: males carrying either factor alone were consistently fertile, whereas males carrying both factors together were always sterile. Distinct spermatogenic phenotypes were observed for sterile introgressions of different lengths, and it appeared that an interaction between introgressed segments also influenced the stage of spermatogenic defect. Males with one category of introgression often produced large quantities of motile sperm and were observed copulating, but never inseminated females. Evidently these two species have diverged at a large number of loci which have varied effects on hybrid male fertility. By extrapolation, we estimate that there are at least 40 such loci on the X chromosome alone. Because these species exhibit little DNA-sequence divergence at arbitrarily chosen loci, it seems unlikely that the extensive functional divergence observed could be due mainly to random genetic drift. Significant epistasis between conspecific genes appears to be a common component of hybrid sterility between recently diverged species of Drosophila. The linkage relationships of interacting factors could shed light on the role played by epistatic selection in the dynamics of the allele substitutions responsible for reproductive barriers between species. PMID:7828817
Singhal, Dinesh K; Singhal, Raxita; Malik, Hruda N; Kumar, Surender; Kumar, Sudarshan; Mohanty, Ashok K; Kaushik, Jai K; Malakar, Dhruba
2014-01-01
Nanog is a homeodomain containing protein which plays important roles in regulation of signaling pathways for maintenance and induction of pluripotency in stem cells. Because of its unique expression in stem cells it is also regarded as pluripotency marker. In this study goat Nanog (gNanog) gene has been amplified, cloned and characterized at sequence level with successful over-expression in CHO-K1 cell line using a lentiviral based system. gNanog ORF is 903 bp long which codes for Nanog protein of size 300 amino acids (aas). Complete nucleotide sequence shows some evolutionary mutation in goat in comparision to other species. Protein sequence of goat is highly similar to other species. Overall, gNanog nucleotide sequence and predicted protein sequence showed high similarity and minimum divergence with cattle (96 % identity/4 % divergence) and buffalo (94/5 %) while low similarity and high divergence with pig (84/15 %), human (81/23 %) and mouse (69/40 %) indicating evolutionary closeness of gNanog to cattle and buffalo. gNanog lentiviral expression construct was prepared for over-expression of Nanog gene in adult goat fibroblast cells. Lentiviral expression construct of Nanog enabled continuous protein expression for induction and maintenance of pluripotency. Western blotting revealed the expression of Nanog gene at protein level which supported that the lentiviral expression system is highly promising for Nanog protein expression in differentiated goat cell.
Shahin, Arwa; Smulders, Marinus J. M.; van Tuyl, Jaap M.; Arens, Paul; Bakker, Freek T.
2014-01-01
Next Generation Sequencing (NGS) may enable estimating relationships among genotypes using allelic variation of multiple nuclear genes simultaneously. We explored the potential and caveats of this strategy in four genetically distant Lilium cultivars to estimate their genetic divergence from transcriptome sequences using three approaches: POFAD (Phylogeny of Organisms from Allelic Data, uses allelic information of sequence data), RAxML (Randomized Accelerated Maximum Likelihood, tree building based on concatenated consensus sequences) and Consensus Network (constructing a network summarizing among gene tree conflicts). Twenty six gene contigs were chosen based on the presence of orthologous sequences in all cultivars, seven of which also had an orthologous sequence in Tulipa, used as out-group. The three approaches generated the same topology. Although the resolution offered by these approaches is high, in this case there was no extra benefit in using allelic information. We conclude that these 26 genes can be widely applied to construct a species tree for the genus Lilium. PMID:25368628
NASA Astrophysics Data System (ADS)
Takada, Yoshitake; Sakuma, Kay; Fujii, Tetsuo; Kojima, Shigeaki
2018-01-01
Recent findings of genetic breaks within apparently continuous marine populations challenge the traditional vicariance paradigm in population genetics. Such "invisible" boundaries are sometimes associated with potential geographic barriers that have forced divergence of an ancestral population, habitat discontinuities, biogeographic disjunctions due to environmental gradients, or a combination of these factors. To explore the factors that influence the genetic population structure of apparently continuous populations along the Sea of Japan, the sandy beach amphipod Haustorioides japonicus was examined. We sampled a total of 300 individuals of H. japonicus from the coast of Japan, and obtained partial sequences of the mitochondrial COI gene. The sequences from 19 local populations were clustered into five groups (Northwestern Pacific, Northern, Central, Southern Sea of Japan, and East China Sea) based on a spatial genetic mixture analysis and a minimum-spanning network. AMOVA and pairwise Fst tests further supported the significant divergence of the five groups. Phylogenetic analysis revealed the relationship among the haplotypes of H. japonicus and outgroups, which inferred the northward range expansion of the species. A relaxed molecular-clock Bayesian analysis inferred the early-to middle-Pleistocene divergence of the populations. Among the five clusters, the Central Sea of Japan showed the highest values for genetic diversity indices indicating the existence of a relatively stable and large population there. The hypothesis is also supported by Bayesian Skyline Plots that showed sudden population expansion for all the clusters except for Central Sea of Japan. The present study shows genetic boundaries between the Sea of Japan and the neighboring seas, probably due to geographic isolation during the Pleistocene glacial periods. We further found divergence between the populations along the apparently continuous coast of the Sea of Japan. Historical changes in the geographic range of H. japonicus in relation to sandy beach habitat availability, account for the genetic breaks among the three populations in the Sea of Japan. The present results infer that the past geographic events influenced the population formation of H. japonicus.
Roger, Andrew J; Hug, Laura A
2006-01-01
Determining the relationships among and divergence times for the major eukaryotic lineages remains one of the most important and controversial outstanding problems in evolutionary biology. The sequencing and phylogenetic analyses of ribosomal RNA (rRNA) genes led to the first nearly comprehensive phylogenies of eukaryotes in the late 1980s, and supported a view where cellular complexity was acquired during the divergence of extant unicellular eukaryote lineages. More recently, however, refinements in analytical methods coupled with the availability of many additional genes for phylogenetic analysis showed that much of the deep structure of early rRNA trees was artefactual. Recent phylogenetic analyses of a multiple genes and the discovery of important molecular and ultrastructural phylogenetic characters have resolved eukaryotic diversity into six major hypothetical groups. Yet relationships among these groups remain poorly understood because of saturation of sequence changes on the billion-year time-scale, possible rapid radiations of major lineages, phylogenetic artefacts and endosymbiotic or lateral gene transfer among eukaryotes. Estimating the divergence dates between the major eukaryote lineages using molecular analyses is even more difficult than phylogenetic estimation. Error in such analyses comes from a myriad of sources including: (i) calibration fossil dates, (ii) the assumed phylogenetic tree, (iii) the nucleotide or amino acid substitution model, (iv) substitution number (branch length) estimates, (v) the model of how rates of evolution change over the tree, (vi) error inherent in the time estimates for a given model and (vii) how multiple gene data are treated. By reanalysing datasets from recently published molecular clock studies, we show that when errors from these various sources are properly accounted for, the confidence intervals on inferred dates can be very large. Furthermore, estimated dates of divergence vary hugely depending on the methods used and their assumptions. Accurate dating of divergence times among the major eukaryote lineages will require a robust tree of eukaryotes, a much richer Proterozoic fossil record of microbial eukaryotes assignable to extant groups for calibration, more sophisticated relaxed molecular clock methods and many more genes sampled from the full diversity of microbial eukaryotes. PMID:16754613
Extensive Concerted Evolution of Rice Paralogs and the Road to Regaining Independence
Wang, Xiyin; Tang, Haibao; Bowers, John E.; Feltus, Frank A.; Paterson, Andrew H.
2007-01-01
Many genes duplicated by whole-genome duplications (WGDs) are more similar to one another than expected. We investigated whether concerted evolution through conversion and crossing over, well-known to affect tandem gene clusters, also affects dispersed paralogs. Genome sequences for two Oryza subspecies reveal appreciable gene conversion in the ∼0.4 MY since their divergence, with a gradual progression toward independent evolution of older paralogs. Since divergence from subspecies indica, ∼8% of japonica paralogs produced 5–7 MYA on chromosomes 11 and 12 have been affected by gene conversion and several reciprocal exchanges of chromosomal segments, while ∼70-MY-old “paleologs” resulting from a genome duplication (GD) show much less conversion. Sequence similarity analysis in proximal gene clusters also suggests more conversion between younger paralogs. About 8% of paleologs may have been converted since rice–sorghum divergence ∼41 MYA. Domain-encoding sequences are more frequently converted than nondomain sequences, suggesting a sort of circularity—that sequences conserved by selection may be further conserved by relatively frequent conversion. The higher level of concerted evolution in the 5–7 MY-old segmental duplication may reflect the behavior of many genomes within the first few million years after duplication or polyploidization. PMID:18039882
A Generalized Least-Squares Estimate for the Origin of Sporophytic Self-Incompatibility
Uyenoyama, M. K.
1995-01-01
Analysis of nucleotide sequences that regulate the expression of self-incompatibility in flowering plants affords a direct means of examining classical hypotheses for the origin and evolution of this major feature of mating systems. Departing from the classical view of monophyly of all forms of self-incompatibility, the current paradigm for the origin of self-incompatibility postulates multiple episodes of recruitment and modification of preexisting genes. In Brassica, the S locus, which regulates sporophytic self-incompatibility, shows homology to a multigene family present both in self-compatible congeners and in groups for which this form of self-incompatibility is atypical. A phylogenetic analysis of S-allele sequences together with homologous sequences that do not cosegregate with self-incompatibility permits dating the change of function that marked the origin of self-incompatibility. A generalized least-squares method is introduced that provides closed-form expressions for estimates and standard errors for function-specific divergence rates and times of divergence among sequences. This analysis suggests that the age of the sporophytic self-incompatibility system expressed in Brassica exceeds species divergence within the genus by four- to fivefold. The extraordinarily high levels of sequence diversity exhibited by S alleles appears to reflect their ancient derivation, with the alternative hypothesis of hypermutability rejected by the analysis. PMID:7713446
LeDuc, Richard G; Robertson, Kelly M; Pitman, Robert L
2008-08-23
Recently, three visually distinct forms of killer whales (Orcinus orca) were described from Antarctic waters and designated as types A, B and C. Based on consistent differences in prey selection and habitat preferences, morphological divergence and apparent lack of interbreeding among these broadly sympatric forms, it was suggested that they may represent separate species. To evaluate this hypothesis, we compared complete sequences of the mitochondrial control region from 81 Antarctic killer whale samples, including 9 type A, 18 type B, 47 type C and 7 type-undetermined individuals. We found three fixed differences that separated type A from B and C, and a single fixed difference that separated type C from A and B. These results are consistent with reproductive isolation among the different forms, although caution is needed in drawing further conclusions. Despite dramatic differences in morphology and ecology, the relatively low levels of sequence divergence in Antarctic killer whales indicate that these evolutionary changes occurred relatively rapidly and recently.
Özdemir, Ebru; Altındağ, Ahmet; Kandemir, İrfan
2017-05-01
Daphnia is a freshwater zooplankton species with controversial taxonomy due to its high morphological variation linked to environmental factors and inter-specific hybridization and polyploidy in some groups. The aim of the present study is to examine molecular diversity of some Daphnia species in Turkey and to establish DNA barcodes of Turkish Daphnia species. Sequence analysis was performed using 540 bp region of cytochrome oxidase subunit I gene of mitochondrial DNA. A total of 34 haplotypes have been identified for Turkey. Daphnia pulex complex was divided into two clades with 16.1% sequence divergence according to molecular taxonomy based on Kimura 2-parameter. The clade which was molecularly diverged from Daphnia pulex with 16.1% sequence divergence was found to show 99% similarity with Daphnia cf. pulicaria (sensu Alonso 1996) instead of Daphnia pulicaria Forbes, 1893. Furthermore, this study has contributed to Turkish zoogeography by demonstrating the distribution of Daphnia species in Turkey.
Archaebacterial rhodopsin sequences: Implications for evolution
NASA Technical Reports Server (NTRS)
Lanyi, J. K.
1991-01-01
It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.
Divergence with gene flow within the recent chipmunk radiation (Tamias)
Sullivan, J; Demboski, J R; Bell, K C; Hird, S; Sarver, B; Reid, N; Good, J M
2014-01-01
Increasing data have supported the importance of divergence with gene flow (DGF) in the generation of biological diversity. In such cases, lineage divergence occurs on a shorter timescale than does the completion of reproductive isolation. Although it is critical to explore the mechanisms driving divergence and preventing homogenization by hybridization, it is equally important to document cases of DGF in nature. Here we synthesize data that have accumulated over the last dozen or so years on DGF in the chipmunk (Tamias) radiation with new data that quantify very high rates of mitochondrial DNA (mtDNA) introgression among para- and sympatric species in the T. quadrivittatus group in the central and southern Rocky Mountains. These new data (188 cytochrome b sequences) bring the total number of sequences up to 1871; roughly 16% (298) of the chipmunks we have sequenced exhibit introgressed mtDNA. This includes ongoing introgression between subspecies and between both closely related and distantly related taxa. In addition, we have identified several taxa that are apparently fixed for ancient introgressions and in which there is no evidence of ongoing introgression. A recurrent observation is that these introgressions occur between ecologically and morphologically diverged, sometimes non-sister taxa that engage in well-documented niche partitioning. Thus, the chipmunk radiation in western North America represents an excellent mammalian example of speciation in the face of recurrent gene flow among lineages and where biogeography, habitat differentiation and mating systems suggest important roles for both ecological and sexual selection. PMID:24781803
Generation of 2A-linked multicistronic cassettes by recombinant PCR.
Szymczak-Workman, Andrea L; Vignali, Kate M; Vignali, Dario A A
2012-02-01
The need for reliable, multicistronic vectors for multigene delivery is at the forefront of biomedical technology. It is now possible to express multiple proteins from a single open reading frame (ORF) using 2A peptide-linked multicistronic vectors. These small sequences, when cloned between genes, allow for efficient, stoichiometric production of discrete protein products within a single vector through a novel "cleavage" event within the 2A peptide sequence. Expression of more than two genes using conventional approaches has several limitations, most notably imbalanced protein expression and large size. The use of 2A peptide sequences alleviates these concerns. They are small (18-22 amino acids) and have divergent amino-terminal sequences, which minimizes the chance for homologous recombination and allows for multiple, different 2A peptide sequences to be used within a single vector. Importantly, separation of genes placed between 2A peptide sequences is nearly 100%, which allows for stoichiometric and concordant expression of the genes, regardless of the order of placement within the vector. This protocol describes the use of recombinant polymerase chain reaction (PCR) to connect multiple 2A-linked protein sequences. The final construct is subcloned into an expression vector.
Mitochondrial divergence between slow- and fast-aging garter snakes.
Schwartz, Tonia S; Arendsee, Zebulun W; Bronikowski, Anne M
2015-11-01
Mitochondrial function has long been hypothesized to be intimately involved in aging processes--either directly through declining efficiency of mitochondrial respiration and ATP production with advancing age, or indirectly, e.g., through increased mitochondrial production of damaging free radicals with age. Yet we lack a comprehensive understanding of the evolution of mitochondrial genotypes and phenotypes across diverse animal models, particularly in species that have extremely labile physiology. Here, we measure mitochondrial genome-types and transcription in ecotypes of garter snakes (Thamnophis elegans) that are adapted to disparate habitats and have diverged in aging rates and lifespans despite residing in close proximity. Using two RNA-seq datasets, we (1) reconstruct the garter snake mitochondrial genome sequence and bioinformatically identify regulatory elements, (2) test for divergence of mitochondrial gene expression between the ecotypes and in response to heat stress, and (3) test for sequence divergence in mitochondrial protein-coding regions in these slow-aging (SA) and fast-aging (FA) naturally occurring ecotypes. At the nucleotide sequence level, we confirmed two (duplicated) mitochondrial control regions one of which contains a glucocorticoid response element (GRE). Gene expression of protein-coding genes was higher in FA snakes relative to SA snakes for most genes, but was neither affected by heat stress nor an interaction between heat stress and ecotype. SA and FA ecotypes had unique mitochondrial haplotypes with amino acid substitutions in both CYTB and ND5. The CYTB amino acid change (Isoleucine → Threonine) was highly segregated between ecotypes. This divergence of mitochondrial haplotypes between SA and FA snakes contrasts with nuclear gene-flow estimates, but correlates with previously reported divergence in mitochondrial function (mitochondrial oxygen consumption, ATP production, and reactive oxygen species consequences). Copyright © 2015 Elsevier Inc. All rights reserved.
Chromosomal Speciation in the Genomics Era: Disentangling Phylogenetic Evolution of Rock-wallabies.
Potter, Sally; Bragg, Jason G; Blom, Mozes P K; Deakin, Janine E; Kirkpatrick, Mark; Eldridge, Mark D B; Moritz, Craig
2017-01-01
The association of chromosome rearrangements (CRs) with speciation is well established, and there is a long history of theory and evidence relating to "chromosomal speciation." Genomic sequencing has the potential to provide new insights into how reorganization of genome structure promotes divergence, and in model systems has demonstrated reduced gene flow in rearranged segments. However, there are limits to what we can understand from a small number of model systems, which each only tell us about one episode of chromosomal speciation. Progressing from patterns of association between chromosome (and genic) change, to understanding processes of speciation requires both comparative studies across diverse systems and integration of genome-scale sequence comparisons with other lines of evidence. Here, we showcase a promising example of chromosomal speciation in a non-model organism, the endemic Australian marsupial genus Petrogale . We present initial phylogenetic results from exon-capture that resolve a history of divergence associated with extensive and repeated CRs. Yet it remains challenging to disentangle gene tree heterogeneity caused by recent divergence and gene flow in this and other such recent radiations. We outline a way forward for better integration of comparative genomic sequence data with evidence from molecular cytogenetics, and analyses of shifts in the recombination landscape and potential disruption of meiotic segregation and epigenetic programming. In all likelihood, CRs impact multiple cellular processes and these effects need to be considered together, along with effects of genic divergence. Understanding the effects of CRs together with genic divergence will require development of more integrative theory and inference methods. Together, new data and analysis tools will combine to shed light on long standing questions of how chromosome and genic divergence promote speciation.
Kim, Young Kyun; Kim, Seung Hyeon; Yi, Joo Mi; Kang, Chang-Keun; Short, Frederick; Lee, Kun-Seop
2017-01-01
Although seagrass species in the genus Halophila are generally distributed in tropical or subtropical regions, H. nipponica has been reported to occur in temperate coastal waters of the northwestern Pacific. Because H. nipponica occurs only in the warm temperate areas influenced by the Kuroshio Current and shows a tropical seasonal growth pattern, such as severely restricted growth in low water temperatures, it was hypothesized that this temperate Halophila species diverged from tropical species in the relatively recent evolutionary past. We used a phylogenetic analysis of internal transcribed spacer (ITS) regions to examine the genetic variability and evolutionary trend of H. nipponica. ITS sequences of H. nipponica from various locations in Korea and Japan were identical or showed very low sequence divergence (less than 3-base pair, bp, difference), confirming that H. nipponica from Japan and Korea are the same species. Halophila species in the section Halophila, which have simple phyllotaxy (a pair of petiolate leaves at the rhizome node), were separated into five well-supported clades by maximum parsimony analysis. H. nipponica grouped with H. okinawensis and H. gaudichaudii from the subtropical regions in the same clade, the latter two species having quite low ITS sequence divergence from H. nipponica (7-15-bp). H. nipponica in Clade I diverged 2.95 ± 1.08 million years ago from species in Clade II, which includes H. ovalis. According to geographical distribution and genetic similarity, H. nipponica appears to have diverged from a tropical species like H. ovalis and adapted to warm temperate environments. The results of divergence time estimates suggest that the temperate H. nipponica is an older species than the subtropical H. okinawensis and H. gaudichaudii and they may have different evolutionary histories.
Kim, Young Kyun; Kim, Seung Hyeon; Yi, Joo Mi; Kang, Chang-Keun; Short, Frederick; Lee, Kun-Seop
2017-01-01
Although seagrass species in the genus Halophila are generally distributed in tropical or subtropical regions, H. nipponica has been reported to occur in temperate coastal waters of the northwestern Pacific. Because H. nipponica occurs only in the warm temperate areas influenced by the Kuroshio Current and shows a tropical seasonal growth pattern, such as severely restricted growth in low water temperatures, it was hypothesized that this temperate Halophila species diverged from tropical species in the relatively recent evolutionary past. We used a phylogenetic analysis of internal transcribed spacer (ITS) regions to examine the genetic variability and evolutionary trend of H. nipponica. ITS sequences of H. nipponica from various locations in Korea and Japan were identical or showed very low sequence divergence (less than 3-base pair, bp, difference), confirming that H. nipponica from Japan and Korea are the same species. Halophila species in the section Halophila, which have simple phyllotaxy (a pair of petiolate leaves at the rhizome node), were separated into five well-supported clades by maximum parsimony analysis. H. nipponica grouped with H. okinawensis and H. gaudichaudii from the subtropical regions in the same clade, the latter two species having quite low ITS sequence divergence from H. nipponica (7–15-bp). H. nipponica in Clade I diverged 2.95 ± 1.08 million years ago from species in Clade II, which includes H. ovalis. According to geographical distribution and genetic similarity, H. nipponica appears to have diverged from a tropical species like H. ovalis and adapted to warm temperate environments. The results of divergence time estimates suggest that the temperate H. nipponica is an older species than the subtropical H. okinawensis and H. gaudichaudii and they may have different evolutionary histories. PMID:28505209
Yang, Zujun; Zhang, Tao; Bolshoy, Alexander; Beharav, Alexander; Nevo, Eviatar
2009-05-01
'Evolution Canyon' (ECI) at Lower Nahal Oren, Mount Carmel, Israel, is an optimal natural microscale model for unravelling evolution in action highlighting the twin evolutionary processes of adaptation and speciation. A major model organism in ECI is wild barley, Hordeum spontaneum, the progenitor of cultivated barley, which displays dramatic interslope adaptive and speciational divergence on the 'African' dry slope (AS) and the 'European' humid slope (ES), separated on average by 200 m. Here we examined interslope single nucleotide polymorphism (SNP) sequences and the expression diversity of the drought resistant dehydrin 1 gene (Dhn1) between the opposite slopes. We analysed 47 plants (genotypes), 4-10 individuals in each of seven stations (populations) in an area of 7000 m(2), for Dhn1 sequence diversity located in the 5' upstream flanking region of the gene. We found significant levels of Dhn1 genic diversity represented by 29 haplotypes, derived from 45 SNPs in a total of 708 bp sites. Most of the haplotypes, 25 out of 29 (= 86.2%), were represented by one genotype; hence, unique to one population. Only a single haplotype was common to both slopes. Genetic divergence of sequence and haplotype diversity was generally and significantly different among the populations and slopes. Nucleotide diversity was higher on the AS, whereas haplotype diversity was higher on the ES. Interslope divergence was significantly higher than intraslope divergence. The applied Tajima D rejected neutrality of the SNP diversity. The Dhn1 expression under dehydration indicated interslope divergent expression between AS and ES genotypes, reinforcing Dhn1 associated with drought resistance of wild barley at 'Evolution Canyon'. These results are inexplicable by mutation, gene flow, or chance effects, and support adaptive natural microclimatic selection as the major evolutionary divergent driving force.
Morin, Phillip A; Archer, Frederick I; Foote, Andrew D; Vilstrup, Julia; Allen, Eric E; Wade, Paul; Durban, John; Parsons, Kim; Pitman, Robert; Li, Lewyn; Bouffard, Pascal; Abel Nielsen, Sandra C; Rasmussen, Morten; Willerslev, Eske; Gilbert, M Thomas P; Harkins, Timothy
2010-07-01
Killer whales (Orcinus orca) currently comprise a single, cosmopolitan species with a diverse diet. However, studies over the last 30 yr have revealed populations of sympatric "ecotypes" with discrete prey preferences, morphology, and behaviors. Although these ecotypes avoid social interactions and are not known to interbreed, genetic studies to date have found extremely low levels of diversity in the mitochondrial control region, and few clear phylogeographic patterns worldwide. This low level of diversity is likely due to low mitochondrial mutation rates that are common to cetaceans. Using killer whales as a case study, we have developed a method to readily sequence, assemble, and analyze complete mitochondrial genomes from large numbers of samples to more accurately assess phylogeography and estimate divergence times. This represents an important tool for wildlife management, not only for killer whales but for many marine taxa. We used high-throughput sequencing to survey whole mitochondrial genome variation of 139 samples from the North Pacific, North Atlantic, and southern oceans. Phylogenetic analysis indicated that each of the known ecotypes represents a strongly supported clade with divergence times ranging from approximately 150,000 to 700,000 yr ago. We recommend that three named ecotypes be elevated to full species, and that the remaining types be recognized as subspecies pending additional data. Establishing appropriate taxonomic designations will greatly aid in understanding the ecological impacts and conservation needs of these important marine predators. We predict that phylogeographic mitogenomics will become an important tool for improved statistical phylogeography and more precise estimates of divergence times.
Morin, Phillip A.; Archer, Frederick I.; Foote, Andrew D.; Vilstrup, Julia; Allen, Eric E.; Wade, Paul; Durban, John; Parsons, Kim; Pitman, Robert; Li, Lewyn; Bouffard, Pascal; Abel Nielsen, Sandra C.; Rasmussen, Morten; Willerslev, Eske; Gilbert, M. Thomas P.; Harkins, Timothy
2010-01-01
Killer whales (Orcinus orca) currently comprise a single, cosmopolitan species with a diverse diet. However, studies over the last 30 yr have revealed populations of sympatric “ecotypes” with discrete prey preferences, morphology, and behaviors. Although these ecotypes avoid social interactions and are not known to interbreed, genetic studies to date have found extremely low levels of diversity in the mitochondrial control region, and few clear phylogeographic patterns worldwide. This low level of diversity is likely due to low mitochondrial mutation rates that are common to cetaceans. Using killer whales as a case study, we have developed a method to readily sequence, assemble, and analyze complete mitochondrial genomes from large numbers of samples to more accurately assess phylogeography and estimate divergence times. This represents an important tool for wildlife management, not only for killer whales but for many marine taxa. We used high-throughput sequencing to survey whole mitochondrial genome variation of 139 samples from the North Pacific, North Atlantic, and southern oceans. Phylogenetic analysis indicated that each of the known ecotypes represents a strongly supported clade with divergence times ranging from ∼150,000 to 700,000 yr ago. We recommend that three named ecotypes be elevated to full species, and that the remaining types be recognized as subspecies pending additional data. Establishing appropriate taxonomic designations will greatly aid in understanding the ecological impacts and conservation needs of these important marine predators. We predict that phylogeographic mitogenomics will become an important tool for improved statistical phylogeography and more precise estimates of divergence times. PMID:20413674
Resnyk, C W; Carré, W; Wang, X; Porter, T E; Simon, J; Le Bihan-Duval, E; Duclos, M J; Aggrey, S E; Cogburn, L A
2017-08-16
Decades of intensive genetic selection in the domestic chicken (Gallus gallus domesticus) have enabled the remarkable rapid growth of today's broiler (meat-type) chickens. However, this enhanced growth rate was accompanied by several unfavorable traits (i.e., increased visceral fatness, leg weakness, and disorders of metabolism and reproduction). The present descriptive analysis of the abdominal fat transcriptome aimed to identify functional genes and biological pathways that likely contribute to an extreme difference in visceral fatness of divergently selected broiler chickens. We used the Del-Mar 14 K Chicken Integrated Systems microarray to take time-course snapshots of global gene transcription in abdominal fat of juvenile [1-11 weeks of age (wk)] chickens divergently selected on bodyweight at two ages (8 and 36 wk). Further, a RNA sequencing analysis was completed on the same abdominal fat samples taken from high-growth (HG) and low-growth (LG) cockerels at 7 wk, the age with the greatest divergence in body weight (3.2-fold) and visceral fatness (19.6-fold). Time-course microarray analysis revealed 312 differentially expressed genes (FDR ≤ 0.05) as the main effect of genotype (HG versus LG), 718 genes in the interaction of age and genotype, and 2918 genes as the main effect of age. The RNA sequencing analysis identified 2410 differentially expressed genes in abdominal fat of HG versus LG chickens at 7 wk. The HG chickens are fatter and over-express numerous genes that support higher rates of visceral adipogenesis and lipogenesis. In abdominal fat of LG chickens, we found higher expression of many genes involved in hemostasis, energy catabolism and endocrine signaling, which likely contribute to their leaner phenotype and slower growth. Many transcription factors and their direct target genes identified in HG and LG chickens could be involved in their divergence in adiposity and growth rate. The present analyses of the visceral fat transcriptome in chickens divergently selected for a large difference in growth rate and abdominal fatness clearly demonstrate that abdominal fat is a very dynamic metabolic and endocrine organ in the chicken. The HG chickens overexpress many transcription factors and their direct target genes, which should enhance in situ lipogenesis and ultimately adiposity. Our observation of enhanced expression of hemostasis and endocrine-signaling genes in diminished abdominal fat of LG cockerels provides insight into genetic mechanisms involved in divergence of abdominal fatness and somatic growth in avian and perhaps mammalian species, including humans.
Singh, Prashant; Singh, Satya Shila; Elster, Josef; Mishra, Arun Kumar
2013-06-01
In order to assess phylogeny, population genetics, and approximation of future course of cyanobacterial evolution based on nifH gene sequences, 41 heterocystous cyanobacterial strains collected from all over India have been used in the present study. NifH gene sequence analysis data confirm that the heterocystous cyanobacteria are monophyletic while the stigonematales show polyphyletic origin with grave intermixing. Further, analysis of nifH gene sequence data using intricate mathematical extrapolations revealed that the nucleotide diversity and recombination frequency is much greater in Nostocales than the Stigonematales. Similarly, DNA divergence studies showed significant values of divergence with greater gene conversion tracts in the unbranched (Nostocales) than the branched (Stigonematales) strains. Our data strongly support the origin of true branching cyanobacterial strains from the unbranched strains.
Fontanesi, Luca; Bertolini, Francesca; Scotti, Emilio; Schiavo, Giuseppina; Colombo, Michela; Trevisi, Paolo; Ribani, Anisa; Buttazzoni, Luca; Russo, Vincenzo; Dall'Olio, Stefania
2015-01-01
The GPR120 gene (also known as FFAR4 or O3FAR1) encodes for a functional omega-3 fatty acid receptor/sensor that mediates potent insulin sensitizing effects by repressing macrophage-induced tissue inflammation. For its functional role, GPR120 could be considered a potential target gene in animal nutrigenetics. In this work we resequenced the porcine GPR120 gene by high throughput Ion Torrent semiconductor sequencing of amplified fragments obtained from 8 DNA pools derived, on the whole, from 153 pigs of different breeds/populations (two Italian Large White pools, Italian Duroc, Italian Landrace, Casertana, Pietrain, Meishan, and wild boars). Three single nucleotide polymorphisms (SNPs), two synonymous substitutions and one in the putative 3'-untranslated region (g.114765469C > T), were identified and their allele frequencies were estimated by sequencing reads count. The g.114765469C > T SNP was also genotyped by PCR-RFLP confirming estimated frequency in Italian Large White pools. Then, this SNP was analyzed in two Italian Large White cohorts using a selective genotyping approach based on extreme and divergent pigs for back fat thickness (BFT) estimated breeding value (EBV) and average daily gain (ADG) EBV. Significant differences of allele and genotype frequencies distribution was observed between the extreme ADG-EBV groups (P < 0.001) whereas this marker was not associated with BFT-EBV.
USDA-ARS?s Scientific Manuscript database
The complete nucleotide sequence of a recently discovered Florida (FL) isolate of Hibiscus infecting Cilevirus (HiCV) was determined by Sanger sequencing. The movement- and coat- protein gene sequences of the HiCV-FL isolate are more divergent than other genes of the previously sequenced HiCV-HA (Ha...
Accurate read-based metagenome characterization using a hierarchical suite of unique signatures
Freitas, Tracey Allen K.; Li, Po-E; Scholz, Matthew B.; Chain, Patrick S. G.
2015-01-01
A major challenge in the field of shotgun metagenomics is the accurate identification of organisms present within a microbial community, based on classification of short sequence reads. Though existing microbial community profiling methods have attempted to rapidly classify the millions of reads output from modern sequencers, the combination of incomplete databases, similarity among otherwise divergent genomes, errors and biases in sequencing technologies, and the large volumes of sequencing data required for metagenome sequencing has led to unacceptably high false discovery rates (FDR). Here, we present the application of a novel, gene-independent and signature-based metagenomic taxonomic profiling method with significantly and consistently smaller FDR than any other available method. Our algorithm circumvents false positives using a series of non-redundant signature databases and examines Genomic Origins Through Taxonomic CHAllenge (GOTTCHA). GOTTCHA was tested and validated on 20 synthetic and mock datasets ranging in community composition and complexity, was applied successfully to data generated from spiked environmental and clinical samples, and robustly demonstrates superior performance compared with other available tools. PMID:25765641
Dhatt, Sharmistha; Bhattacharyya, Kamal
2012-08-01
Appropriate constructions of Padé approximants are believed to provide reasonable estimates of the asymptotic (large-coupling) amplitude and exponent of an observable, given its weak-coupling expansion to some desired order. In many instances, however, sequences of such approximants are seen to converge very poorly. We outline here a strategy that exploits the idea of fractional calculus to considerably improve the convergence behavior. Pilot calculations on the ground-state perturbative energy series of quartic, sextic, and octic anharmonic oscillators reveal clearly the worth of our endeavor.
Miniprimer PCR, a New Lens for Viewing the Microbial World▿ †
Isenbarger, Thomas A.; Finney, Michael; Ríos-Velázquez, Carlos; Handelsman, Jo; Ruvkun, Gary
2008-01-01
Molecular methods based on the 16S rRNA gene sequence are used widely in microbial ecology to reveal the diversity of microbial populations in environmental samples. Here we show that a new PCR method using an engineered polymerase and 10-nucleotide “miniprimers” expands the scope of detectable sequences beyond those detected by standard methods using longer primers and Taq polymerase. After testing the method in silico to identify divergent ribosomal genes in previously cloned environmental sequences, we applied the method to soil and microbial mat samples, which revealed novel 16S rRNA gene sequences that would not have been detected with standard primers. Deeply divergent sequences were discovered with high frequency and included representatives that define two new division-level taxa, designated CR1 and CR2, suggesting that miniprimer PCR may reveal new dimensions of microbial diversity. PMID:18083877
Plastome data reveal multiple geographic origins of Quercus Group Ilex
Grimm, Guido W.; Papini, Alessio; Vessella, Federico; Cardoni, Simone; Tordoni, Enrico; Piredda, Roberta; Franc, Alain; Denk, Thomas
2016-01-01
Nucleotide sequences from the plastome are currently the main source for assessing taxonomic and phylogenetic relationships in flowering plants and their historical biogeography at all hierarchical levels. One major exception is the large and economically important genus Quercus (oaks). Whereas differentiation patterns of the nuclear genome are in agreement with morphology and the fossil record, diversity patterns in the plastome are at odds with established taxonomic and phylogenetic relationships. However, the extent and evolutionary implications of this incongruence has yet to be fully uncovered. The DNA sequence divergence of four Euro-Mediterranean Group Ilex oak species (Quercus ilex L., Q. coccifera L., Q. aucheri Jaub. & Spach., Q. alnifolia Poech.) was explored at three chloroplast markers (rbcL, trnK/matK, trnH-psbA). Phylogenetic relationships were reconstructed including worldwide members of additional 55 species representing all Quercus subgeneric groups. Family and order sequence data were harvested from gene banks to better frame the observed divergence in larger taxonomic contexts. We found a strong geographic sorting in the focal group and the genus in general that is entirely decoupled from species boundaries. High plastid divergence in members of Quercus Group Ilex, including haplotypes shared with related, but long isolated oak lineages, point towards multiple geographic origins of this group of oaks. The results suggest that incomplete lineage sorting and repeated phases of asymmetrical introgression among ancestral lineages of Group Ilex and two other main Groups of Eurasian oaks (Cyclobalanopsis and Cerris) caused this complex pattern. Comparison with the current phylogenetic synthesis also suggests an initial high- versus mid-latitude biogeographic split within Quercus. High plastome plasticity of Group Ilex reflects geographic area disruptions, possibly linked with high tectonic activity of past and modern distribution ranges, that did not leave imprints in the nuclear genome of modern species and infrageneric lineages. PMID:27123376
Single sample resolution of rare microbial dark matter in a marine invertebrate metagenome
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Ian J.; Weyna, Theodore R.; Fong, Stephen S.
Direct, untargeted sequencing of environmental samples (metagenomics) and de novo genome assembly enable the study of uncultured and phylogenetically divergent organisms. However, separating individual genomes from a mixed community has often relied on the differential-coverage analysis of multiple, deeply sequenced samples. In the metagenomic investigation of the marine bryozoan Bugula neritina, we uncovered seven bacterial genomes associated with a single B. neritina individual that appeared to be transient associates, two of which were unique to one individual and undetectable using certain “universal” 16S rRNA primers and probes. We recovered high quality genome assemblies for several rare instances of “microbial darkmore » matter,” or phylogenetically divergent bacteria lacking genomes in reference databases, from a single tissue sample that was not subjected to any physical or chemical pre-treatment. One of these rare, divergent organisms has a small (593 kbp), poorly annotated genome with low GC content (20.9%) and a 16S rRNA gene with just 65% sequence similarity to the closest reference sequence. Lastly, our findings illustrate the importance of sampling strategy and de novo assembly of metagenomic reads to understand the extent and function of bacterial biodiversity.« less
Single sample resolution of rare microbial dark matter in a marine invertebrate metagenome
Miller, Ian J.; Weyna, Theodore R.; Fong, Stephen S.; ...
2016-09-29
Direct, untargeted sequencing of environmental samples (metagenomics) and de novo genome assembly enable the study of uncultured and phylogenetically divergent organisms. However, separating individual genomes from a mixed community has often relied on the differential-coverage analysis of multiple, deeply sequenced samples. In the metagenomic investigation of the marine bryozoan Bugula neritina, we uncovered seven bacterial genomes associated with a single B. neritina individual that appeared to be transient associates, two of which were unique to one individual and undetectable using certain “universal” 16S rRNA primers and probes. We recovered high quality genome assemblies for several rare instances of “microbial darkmore » matter,” or phylogenetically divergent bacteria lacking genomes in reference databases, from a single tissue sample that was not subjected to any physical or chemical pre-treatment. One of these rare, divergent organisms has a small (593 kbp), poorly annotated genome with low GC content (20.9%) and a 16S rRNA gene with just 65% sequence similarity to the closest reference sequence. Lastly, our findings illustrate the importance of sampling strategy and de novo assembly of metagenomic reads to understand the extent and function of bacterial biodiversity.« less
Evolutionary history of Mexican domesticated and wild Meleagris gallopavo.
Padilla-Jacobo, Gabriela; Cano-Camacho, Horacio; López-Zavala, Rigoberto; Cornejo-Pérez, María E; Zavala-Páramo, María G
2018-04-17
The distribution of the wild turkey (Meleagris gallopavo) extends from Mexico to southeastern Canada and to the eastern and southern regions of the USA. Six subspecies have been described based on morphological characteristics and/or geographical variations in wild and domesticated populations. In this paper, based on DNA sequence data from the mitochondrial D-loop, we investigated the genetic diversity and structure, genealogical relationships, divergence time and demographic history of M. gallopavo populations including domesticated individuals. Analyses of 612 wild and domesticated turkey mitochondrial D-loop sequences, including 187 that were collected for this study and 425 from databases, revealed 64 haplotypes with few mutations, some of which are shared between domesticated and wild turkeys. We found a high level of haplotype and nucleotide diversity, which suggests that the total population of this species is large and stable with an old evolutionary history. The results of genetic differentiation, haplotype network, and genealogical relationships analyses revealed three main genetic groups within the species: mexicana as a population relict (C1), merriami (C2), and mexicana/intermedia/silvestris/osceola (C3). Haplotypes detected in domesticated turkeys belong to group C3. Estimates of divergence times agree with range expansion and diversification events of the relict population of M. gallopavo in northwestern Mexico during the Pliocene-Pleistocene and Pleistocene-Holocene boundaries. Demographic reconstruction showed that an expansion of the population occurred 110,000 to 130,000 years ago (Kya), followed by a stable period 100 Kya and finally a decline ~ 10 Kya (Pleistocene-Holocene boundary). In Mexico, the Trans-Mexican Volcanic Belt may be responsible for the range expansion of the C3 group. Two haplotypes with different divergence times, MGMDgoB/MICH1 and MICH2, are dominant in domesticated and commercial turkeys. During the Pleistocene, a large and stable population of M. gallopavo covered a wide geographic distribution from the north to the center of America (USA and Mexico). The mexicana, merriami, and mexicana/intermedia/silvestris/osceola genetic groups originated after divergence and range expansion from northwestern Mexico during the Pliocene-Pleistocene and Pleistocene-Holocene boundaries. Old and new maternal lines of the mexicana/intermedia/silvestris/osceola genetic group were distributed within the Trans-Mexican Volcanic Belt where individuals were captured for domestication. Two haplotypes are the main founder maternal lines of domesticated turkeys.
Liu, Qing; Triplett, Jimmy K; Wen, Jun; Peterson, Paul M
2011-11-01
Eleusine (Poaceae) is a small genus of the subfamily Chloridoideae exhibiting considerable morphological and ecological diversity in East Africa and the Americas. The interspecific phylogenetic relationships of Eleusine are investigated in order to identify its allotetraploid origin, and a chronogram is estimated to infer temporal relationships between palaeoenvironment changes and divergence of Eleusine in East Africa. Two low-copy nuclear (LCN) markers, Pepc4 and EF-1α, were analysed using parsimony, likelihood and Bayesian approaches. A chronogram of Eleusine was inferred from a combined data set of six plastid DNA markers (ndhA intron, ndhF, rps16-trnK, rps16 intron, rps3, and rpl32-trnL) using the Bayesian dating method. The monophyly of Eleusine is strongly supported by sequence data from two LCN markers. In the cpDNA phylogeny, three tetraploid species (E. africana, E. coracana and E. kigeziensis) share a common ancestor with the E. indica-E. tristachya clade, which is considered a source of maternal parents for allotetraploids. Two homoeologous loci are isolated from three tetraploid species in the Pepc4 phylogeny, and the maternal parents receive further support. The A-type EF-1α sequences possess three characters, i.e. a large number of variations of intron 2; clade E-A distantly diverged from clade E-B and other diploid species; and seven deletions in intron 2, implying a possible derivation through a gene duplication event. The crown age of Eleusine and the allotetraploid lineage are 3·89 million years ago (mya) and 1·40 mya, respectively. The molecular data support independent allotetraploid origins for E. kigeziensis and the E. africana-E. coracana clade. Both events may have involved diploids E. indica and E. tristachya as the maternal parents, but the paternal parents remain unidentified. The habitat-specific hypothesis is proposed to explain the divergence of Eleusine and its allotetraploid lineage.
Liu, Qing; Triplett, Jimmy K.; Wen, Jun; Peterson, Paul M.
2011-01-01
Background and Aims Eleusine (Poaceae) is a small genus of the subfamily Chloridoideae exhibiting considerable morphological and ecological diversity in East Africa and the Americas. The interspecific phylogenetic relationships of Eleusine are investigated in order to identify its allotetraploid origin, and a chronogram is estimated to infer temporal relationships between palaeoenvironment changes and divergence of Eleusine in East Africa. Methods Two low-copy nuclear (LCN) markers, Pepc4 and EF-1α, were analysed using parsimony, likelihood and Bayesian approaches. A chronogram of Eleusine was inferred from a combined data set of six plastid DNA markers (ndhA intron, ndhF, rps16-trnK, rps16 intron, rps3, and rpl32-trnL) using the Bayesian dating method. Key Results The monophyly of Eleusine is strongly supported by sequence data from two LCN markers. In the cpDNA phylogeny, three tetraploid species (E. africana, E. coracana and E. kigeziensis) share a common ancestor with the E. indica–E. tristachya clade, which is considered a source of maternal parents for allotetraploids. Two homoeologous loci are isolated from three tetraploid species in the Pepc4 phylogeny, and the maternal parents receive further support. The A-type EF-1α sequences possess three characters, i.e. a large number of variations of intron 2; clade E-A distantly diverged from clade E-B and other diploid species; and seven deletions in intron 2, implying a possible derivation through a gene duplication event. The crown age of Eleusine and the allotetraploid lineage are 3·89 million years ago (mya) and 1·40 mya, respectively. Conclusions The molecular data support independent allotetraploid origins for E. kigeziensis and the E. africana–E. coracana clade. Both events may have involved diploids E. indica and E. tristachya as the maternal parents, but the paternal parents remain unidentified. The habitat-specific hypothesis is proposed to explain the divergence of Eleusine and its allotetraploid lineage. PMID:21880659
Lavinia, Pablo D; Núñez Bustos, Ezequiel O; Kopuchian, Cecilia; Lijtmaer, Darío A; García, Natalia C; Hebert, Paul D N; Tubaro, Pablo L
2017-01-01
Because the tropical regions of America harbor the highest concentration of butterfly species, its fauna has attracted considerable attention. Much less is known about the butterflies of southern South America, particularly Argentina, where over 1,200 species occur. To advance understanding of this fauna, we assembled a DNA barcode reference library for 417 butterfly species of Argentina, focusing on the Atlantic Forest, a biodiversity hotspot. We tested the efficacy of this library for specimen identification, used it to assess the frequency of cryptic species, and examined geographic patterns of genetic variation, making this study the first large-scale genetic assessment of the butterflies of southern South America. The average sequence divergence to the nearest neighbor (i.e. minimum interspecific distance) was 6.91%, ten times larger than the mean distance to the furthest conspecific (0.69%), with a clear barcode gap present in all but four of the species represented by two or more specimens. As a consequence, the DNA barcode library was extremely effective in the discrimination of these species, allowing a correct identification in more than 95% of the cases. Singletons (i.e. species represented by a single sequence) were also distinguishable in the gene trees since they all had unique DNA barcodes, divergent from those of the closest non-conspecific. The clustering algorithms implemented recognized from 416 to 444 barcode clusters, suggesting that the actual diversity of butterflies in Argentina is 3%-9% higher than currently recognized. Furthermore, our survey added three new records of butterflies for the country (Eurema agave, Mithras hannelore, Melanis hillapana). In summary, this study not only supported the utility of DNA barcoding for the identification of the butterfly species of Argentina, but also highlighted several cases of both deep intraspecific and shallow interspecific divergence that should be studied in more detail.
Núñez Bustos, Ezequiel O.; Kopuchian, Cecilia; Lijtmaer, Darío A.; García, Natalia C.; Hebert, Paul D. N.; Tubaro, Pablo L.
2017-01-01
Because the tropical regions of America harbor the highest concentration of butterfly species, its fauna has attracted considerable attention. Much less is known about the butterflies of southern South America, particularly Argentina, where over 1,200 species occur. To advance understanding of this fauna, we assembled a DNA barcode reference library for 417 butterfly species of Argentina, focusing on the Atlantic Forest, a biodiversity hotspot. We tested the efficacy of this library for specimen identification, used it to assess the frequency of cryptic species, and examined geographic patterns of genetic variation, making this study the first large-scale genetic assessment of the butterflies of southern South America. The average sequence divergence to the nearest neighbor (i.e. minimum interspecific distance) was 6.91%, ten times larger than the mean distance to the furthest conspecific (0.69%), with a clear barcode gap present in all but four of the species represented by two or more specimens. As a consequence, the DNA barcode library was extremely effective in the discrimination of these species, allowing a correct identification in more than 95% of the cases. Singletons (i.e. species represented by a single sequence) were also distinguishable in the gene trees since they all had unique DNA barcodes, divergent from those of the closest non-conspecific. The clustering algorithms implemented recognized from 416 to 444 barcode clusters, suggesting that the actual diversity of butterflies in Argentina is 3%–9% higher than currently recognized. Furthermore, our survey added three new records of butterflies for the country (Eurema agave, Mithras hannelore, Melanis hillapana). In summary, this study not only supported the utility of DNA barcoding for the identification of the butterfly species of Argentina, but also highlighted several cases of both deep intraspecific and shallow interspecific divergence that should be studied in more detail. PMID:29049373
Shih, Kai-Ming; Chang, Chung-Te; Chung, Jeng-Der; Chiang, Yu-Chung; Hwang, Shih-Ying
2018-01-01
Double digest restriction site-associated DNA sequencing (ddRADseq) is a tool for delivering genome-wide single nucleotide polymorphism (SNP) markers for non-model organisms useful in resolving fine-scale population structure and detecting signatures of selection. This study performs population genetic analysis, based on ddRADseq data, of a coniferous species, Keteleeria davidiana var. formosana, disjunctly distributed in northern and southern Taiwan, for investigation of population adaptive divergence in response to environmental heterogeneity. A total of 13,914 SNPs were detected and used to assess genetic diversity, FST outlier detection, population genetic structure, and individual assignments of five populations (62 individuals) of K. davidiana var. formosana. Principal component analysis (PCA), individual assignments, and the neighbor-joining tree were successful in differentiating individuals between northern and southern populations of K. davidiana var. formosana, but apparent gene flow between the southern DW30 population and northern populations was also revealed. Fifteen of 23 highly differentiated SNPs identified were found to be strongly associated with environmental variables, suggesting isolation-by-environment (IBE). However, multiple matrix regression with randomization analysis revealed strong IBE as well as significant isolation-by-distance. Environmental impacts on divergence were found between populations of the North and South regions and also between the two southern neighboring populations. BLASTN annotation of the sequences flanking outlier SNPs gave significant hits for three of 23 markers that might have biological relevance to mitochondrial homeostasis involved in the survival of locally adapted lineages. Species delimitation between K. davidiana var. formosana and its ancestor, K. davidiana, was also examined (72 individuals). This study has produced highly informative population genomic data for the understanding of population attributes, such as diversity, connectivity, and adaptive divergence associated with large- and small-scale environmental heterogeneity in K. davidiana var. formosana. PMID:29449860
Cheng, Ji-Hong; Liu, Wen-Chun; Chang, Ting-Tsung; Hsieh, Sun-Yuan; Tseng, Vincent S
2017-10-01
Many studies have suggested that deletions of Hepatitis B Viral (HBV) are associated with the development of progressive liver diseases, even ultimately resulting in hepatocellular carcinoma (HCC). Among the methods for detecting deletions from next-generation sequencing (NGS) data, few methods considered the characteristics of virus, such as high evolution rates and high divergence among the different HBV genomes. Sequencing high divergence HBV genome sequences using the NGS technology outputs millions of reads. Thus, detecting exact breakpoints of deletions from these big and complex data incurs very high computational cost. We proposed a novel analytical method named VirDelect (Virus Deletion Detect), which uses split read alignment base to detect exact breakpoint and diversity variable to consider high divergence in single-end reads data, such that the computational cost can be reduced without losing accuracy. We use four simulated reads datasets and two real pair-end reads datasets of HBV genome sequence to verify VirDelect accuracy by score functions. The experimental results show that VirDelect outperforms the state-of-the-art method Pindel in terms of accuracy score for all simulated datasets and VirDelect had only two base errors even in real datasets. VirDelect is also shown to deliver high accuracy in analyzing the single-end read data as well as pair-end data. VirDelect can serve as an effective and efficient bioinformatics tool for physiologists with high accuracy and efficient performance and applicable to further analysis with characteristics similar to HBV on genome length and high divergence. The software program of VirDelect can be downloaded at https://sourceforge.net/projects/virdelect/. Copyright © 2017. Published by Elsevier Inc.
Guillet-Claude, Carine; Isabel, Nathalie; Pelgas, Betty; Bousquet, Jean
2004-12-01
Class I knox genes code for transcription factors that play an essential role in plant growth and development as central regulators of meristem cell identity. Based on the analysis of new cDNA sequences from various tissues and genomic DNA sequences, we identified a highly diversified group of class I knox genes in conifers. Phylogenetic analyses of complete amino acid sequences from various seed plants indicated that all conifer sequences formed a monophyletic group. Within conifers, four subgroups here named genes KN1 to KN4 were well delineated, each regrouping pine and spruce sequences. KN4 was sister group to KN3, which was sister group to KN1 and KN2. Genetic mapping on the genomes of two divergent Picea species indicated that KN1 and KN2 are located close to each other on the same linkage group, whereas KN3 and KN4 mapped on different linkage groups, correlating the more ancient divergence of these two genes. The proportion of synonymous and nonsynonymous substitutions suggested intense purifying selection for the four genes. However, rates of substitution per year indicated an evolution in two steps: faster rates were noted after gene duplications, followed subsequently by lower rates. Positive directional selection was detected for most of the internal branches harboring an accelerated rate of evolution. In addition, many sites with highly significant amino acid rate shift were identified between these branches. However, the tightly linked KN1 and KN2 did not diverge as much from each other. The implications of the correlation between phylogenetic, structural, and functional information are discussed in relation to the diversification of the knox-I gene family in conifers.
Pombert, Jean-François; Lemieux, Claude; Turmel, Monique
2006-01-01
Background The phylum Chlorophyta contains the majority of the green algae and is divided into four classes. The basal position of the Prasinophyceae has been well documented, but the divergence order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae is currently debated. The four complete chloroplast DNA (cpDNA) sequences presently available for representatives of these classes have revealed extensive variability in overall structure, gene content, intron composition and gene order. The chloroplast genome of Pseudendoclonium (Ulvophyceae), in particular, is characterized by an atypical quadripartite architecture that deviates from the ancestral type by a large inverted repeat (IR) featuring an inverted rRNA operon and a small single-copy (SSC) region containing 14 genes normally found in the large single-copy (LSC) region. To gain insights into the nature of the events that led to the reorganization of the chloroplast genome in the Ulvophyceae, we have determined the complete cpDNA sequence of Oltmannsiellopsis viridis, a representative of a distinct, early diverging lineage. Results The 151,933 bp IR-containing genome of Oltmannsiellopsis differs considerably from Pseudendoclonium and other chlorophyte cpDNAs in intron content and gene order, but shares close similarities with its ulvophyte homologue at the levels of quadripartite architecture, gene content and gene density. Oltmannsiellopsis cpDNA encodes 105 genes, contains five group I introns, and features many short dispersed repeats. As in Pseudendoclonium cpDNA, the rRNA genes in the IR are transcribed toward the single copy region featuring the genes typically found in the ancestral LSC region, and the opposite single copy region harbours genes characteristic of both the ancestral SSC and LSC regions. The 52 genes that were transferred from the ancestral LSC to SSC region include 12 of those observed in Pseudendoclonium cpDNA. Surprisingly, the overall gene organization of Oltmannsiellopsis cpDNA more closely resembles that of Chlorella (Trebouxiophyceae) cpDNA. Conclusion The chloroplast genome of the last common ancestor of Oltmannsiellopsis and Pseudendoclonium contained a minimum of 108 genes, carried only a few group I introns, and featured a distinctive quadripartite architecture. Numerous changes were experienced by the chloroplast genome in the lineages leading to Oltmannsiellopsis and Pseudendoclonium. Our comparative analyses of chlorophyte cpDNAs support the notion that the Ulvophyceae is sister to the Chlorophyceae. PMID:16472375
Low X/Y divergence in four pairs of papaya sex-linked genes.
Yu, Qingyi; Hou, Shaobin; Feltus, F Alex; Jones, Meghan R; Murray, Jan E; Veatch, Olivia; Lemke, Cornelia; Saw, Jimmy H; Moore, Richard C; Thimmapuram, Jyothi; Liu, Lei; Moore, Paul H; Alam, Maqsudul; Jiang, Jiming; Paterson, Andrew H; Ming, Ray
2008-01-01
Sex chromosomes in flowering plants, in contrast to those in animals, evolved relatively recently and only a few are heteromorphic. The homomorphic sex chromosomes of papaya show features of incipient sex chromosome evolution. We investigated the features of paired X- and Y-specific bacterial artificial chromosomes (BACs), and estimated the time of divergence in four pairs of sex-linked genes. We report the results of a comparative analysis of long contiguous genomic DNA sequences between the X and hermaphrodite Y (Y(h)) chromosomes. Numerous chromosomal rearrangements were detected in the male-specific region of the Y chromosome (MSY), including inversions, deletions, insertions, duplications and translocations, showing the dynamic evolutionary process on the MSY after recombination ceased. DNA sequence expansion was documented in the two regions of the MSY, demonstrating that the cytologically homomorphic sex chromosomes are heteromorphic at the molecular level. Analysis of sequence divergence between four X and Y(h) gene pairs resulted in a estimated age of divergence of between 0.5 and 2.2 million years, supporting a recent origin of the papaya sex chromosomes. Our findings indicate that sex chromosomes did not evolve at the family level in Caricaceae, and reinforce the theory that sex chromosomes evolve at the species level in some lineages.
Mohandesan, Elmira; Fitak, Robert R; Corander, Jukka; Yadamsuren, Adiya; Chuluunbat, Battsetseg; Abdelhadi, Omer; Raziq, Abdul; Nagy, Peter; Stalder, Gabrielle; Walzer, Chris; Faye, Bernard; Burger, Pamela A
2017-08-30
The genus Camelus is an interesting model to study adaptive evolution in the mitochondrial genome, as the three extant Old World camel species inhabit hot and low-altitude as well as cold and high-altitude deserts. We sequenced 24 camel mitogenomes and combined them with three previously published sequences to study the role of natural selection under different environmental pressure, and to advance our understanding of the evolutionary history of the genus Camelus. We confirmed the heterogeneity of divergence across different components of the electron transport system. Lineage-specific analysis of mitochondrial protein evolution revealed a significant effect of purifying selection in the concatenated protein-coding genes in domestic Bactrian camels. The estimated dN/dS < 1 in the concatenated protein-coding genes suggested purifying selection as driving force for shaping mitogenome diversity in camels. Additional analyses of the functional divergence in amino acid changes between species-specific lineages indicated fixed substitutions in various genes, with radical effects on the physicochemical properties of the protein products. The evolutionary time estimates revealed a divergence between domestic and wild Bactrian camels around 1.1 [0.58-1.8] million years ago (mya). This has major implications for the conservation and management of the critically endangered wild species, Camelus ferus.
Thompson, Claudia E; Freitas, Loreta B; Salzano, Francisco M
2018-01-01
Alcohol dehydrogenases belong to the large superfamily of medium-chain dehydrogenases/reductases, which occur throughout the biological world and are involved with many important metabolic routes. We considered the phylogeny of 190 ADH sequences of animals, fungi, and plants. Non-class III Caenorhabditis elegans ADHs were seen closely related to tetrameric fungal ADHs. ADH3 forms a sister group to amphibian, reptilian, avian and mammalian non-class III ADHs. In fishes, two main forms are identified: ADH1 and ADH3, whereas in amphibians there is a new ADH form (ADH8). ADH2 is found in Mammalia and Aves, and they formed a monophyletic group. Additionally, mammalian ADH4 seems to result from an ADH1 duplication, while in Fungi, ADH formed clusters based on types and genera. The plant ADH isoforms constitute a basal clade in relation to ADHs from animals. We identified amino acid residues responsible for functional divergence between ADH types in fungi, mammals, and fishes. In mammals, these differences occur mainly between ADH1/ADH4 and ADH3/ADH5, whereas functional divergence occurred in fungi between ADH1/ADH5, ADH5/ADH4, and ADH5/ADH3. In fishes, the forms also seem to be functionally divergent. The ADH family expansion exemplifies a neofunctionalization process where reiterative duplication events are related to new activities.
Efficient high-throughput sequencing of a laser microdissected chromosome arm
2013-01-01
Background Genomic sequence assemblies are key tools for a broad range of gene function and evolutionary studies. The diploid amphibian Xenopus tropicalis plays a pivotal role in these fields due to its combination of experimental flexibility, diploid genome, and early-branching tetrapod taxonomic position, having diverged from the amniote lineage ~360 million years ago. A genome assembly and a genetic linkage map have recently been made available. Unfortunately, large gaps in the linkage map attenuate long-range integrity of the genome assembly. Results We laser dissected the short arm of X. tropicalis chromosome 7 for next generation sequencing and computational mapping to the reference genome. This arm is of particular interest as it encodes the sex determination locus, but its genetic map contains large gaps which undermine available genome assemblies. Whole genome amplification of 15 laser-microdissected 7p arms followed by next generation sequencing yielded ~35 million reads, over four million of which uniquely mapped to the X. tropicalis genome. Our analysis placed more than 200 previously unmapped scaffolds on the analyzed chromosome arm, providing valuable low-resolution physical map information for de novo genome assembly. Conclusion We present a new approach for improving and validating genetic maps and sequence assemblies. Whole genome amplification of 15 microdissected chromosome arms provided sufficient high-quality material for localizing previously unmapped scaffolds and genes as well as recognizing mislocalized scaffolds. PMID:23714049
Ghouila, Amel; Florent, Isabelle; Guerfali, Fatma Zahra; Terrapon, Nicolas; Laouini, Dhafer; Yahia, Sadok Ben; Gascuel, Olivier; Bréhélin, Laurent
2014-01-01
Identification of protein domains is a key step for understanding protein function. Hidden Markov Models (HMMs) have proved to be a powerful tool for this task. The Pfam database notably provides a large collection of HMMs which are widely used for the annotation of proteins in sequenced organisms. This is done via sequence/HMM comparisons. However, this approach may lack sensitivity when searching for domains in divergent species. Recently, methods for HMM/HMM comparisons have been proposed and proved to be more sensitive than sequence/HMM approaches in certain cases. However, these approaches are usually not used for protein domain discovery at a genome scale, and the benefit that could be expected from their utilization for this problem has not been investigated. Using proteins of P. falciparum and L. major as examples, we investigate the extent to which HMM/HMM comparisons can identify new domain occurrences not already identified by sequence/HMM approaches. We show that although HMM/HMM comparisons are much more sensitive than sequence/HMM comparisons, they are not sufficiently accurate to be used as a standalone complement of sequence/HMM approaches at the genome scale. Hence, we propose to use domain co-occurrence--the general domain tendency to preferentially appear along with some favorite domains in the proteins--to improve the accuracy of the approach. We show that the combination of HMM/HMM comparisons and co-occurrence domain detection boosts protein annotations. At an estimated False Discovery Rate of 5%, it revealed 901 and 1098 new domains in Plasmodium and Leishmania proteins, respectively. Manual inspection of part of these predictions shows that it contains several domain families that were missing in the two organisms. All new domain occurrences have been integrated in the EuPathDomains database, along with the GO annotations that can be deduced.
Ghouila, Amel; Florent, Isabelle; Guerfali, Fatma Zahra; Terrapon, Nicolas; Laouini, Dhafer; Yahia, Sadok Ben; Gascuel, Olivier; Bréhélin, Laurent
2014-01-01
Identification of protein domains is a key step for understanding protein function. Hidden Markov Models (HMMs) have proved to be a powerful tool for this task. The Pfam database notably provides a large collection of HMMs which are widely used for the annotation of proteins in sequenced organisms. This is done via sequence/HMM comparisons. However, this approach may lack sensitivity when searching for domains in divergent species. Recently, methods for HMM/HMM comparisons have been proposed and proved to be more sensitive than sequence/HMM approaches in certain cases. However, these approaches are usually not used for protein domain discovery at a genome scale, and the benefit that could be expected from their utilization for this problem has not been investigated. Using proteins of P. falciparum and L. major as examples, we investigate the extent to which HMM/HMM comparisons can identify new domain occurrences not already identified by sequence/HMM approaches. We show that although HMM/HMM comparisons are much more sensitive than sequence/HMM comparisons, they are not sufficiently accurate to be used as a standalone complement of sequence/HMM approaches at the genome scale. Hence, we propose to use domain co-occurrence — the general domain tendency to preferentially appear along with some favorite domains in the proteins — to improve the accuracy of the approach. We show that the combination of HMM/HMM comparisons and co-occurrence domain detection boosts protein annotations. At an estimated False Discovery Rate of 5%, it revealed 901 and 1098 new domains in Plasmodium and Leishmania proteins, respectively. Manual inspection of part of these predictions shows that it contains several domain families that were missing in the two organisms. All new domain occurrences have been integrated in the EuPathDomains database, along with the GO annotations that can be deduced. PMID:24901648
Meiotic drive impacts expression and evolution of x-linked genes in stalk-eyed flies.
Reinhardt, Josephine A; Brand, Cara L; Paczolt, Kimberly A; Johns, Philip M; Baker, Richard H; Wilkinson, Gerald S
2014-01-01
Although sex chromosome meiotic drive has been observed in a variety of species for over 50 years, the genes causing drive are only known in a few cases, and none of these cases cause distorted sex-ratios in nature. In stalk-eyed flies (Teleopsis dalmanni), driving X chromosomes are commonly found at frequencies approaching 30% in the wild, but the genetic basis of drive has remained elusive due to reduced recombination between driving and non-driving X chromosomes. Here, we used RNAseq to identify transcripts that are differentially expressed between males carrying either a driving X (XSR) or a standard X chromosome (XST), and found hundreds of these, the majority of which are X-linked. Drive-associated transcripts show increased levels of sequence divergence (dN/dS) compared to a control set, and are predominantly expressed either in testes or in the gonads of both sexes. Finally, we confirmed that XSR and XST are highly divergent by estimating sequence differentiation between the RNAseq pools. We found that X-linked transcripts were often strongly differentiated (whereas most autosomal transcripts were not), supporting the presence of a relatively large region of recombination suppression on XSR presumably caused by one or more inversions. We have identified a group of genes that are good candidates for further study into the causes and consequences of sex-chromosome drive, and demonstrated that meiotic drive has had a profound effect on sequence evolution and gene expression of X-linked genes in this species.
Two new species of shovel-jaw carp Onychostoma (Teleostei: Cyprinidae) from southern Vietnam.
Hoang, Huy Duc; Pham, Hung Manh; Tran, Ngan Trong
2015-05-22
Two new species of large shovel-jaw carps in the genus Onychostoma are described from the upper Krong No and middle Dong Nai drainages of the Langbiang Plateau in southern Vietnam. These new species are known from streams in montane mixed pine and evergreen forests between 140 and 1112 m. Their populations are isolated in the headwaters of the upper Sre Pok River of the Mekong basin and in the middle of the Dong Nai basin. Both species are differentiated from their congeners by a combination of the following characters: transverse mouth opening width greater than head width, 14-17 predorsal scales, caudal-peduncle length 3.9-4.2 times in SL, no barbels in adults and juveniles, a strong serrated last simple ray of the dorsal fin, and small eye diameter (20.3-21.5% HL). Onychostoma krongnoensis sp. nov. is differentiated from Onychostoma dongnaiensis sp. nov. by body depth (4.0 vs. 3.2 times in SL), predorsal scale number (14-17 vs. 14-15), dorsal-fin length (4.5 vs. 4.2 times in SL), caudal-peduncle length (3.9 vs. 4.2 times in SL), colour in life (dark vs. bright), and by mitochondrial DNA (0.2% sequence divergence). Molecular evidence indicates that both species are members of Onychostoma and are distinct from all congeners sampled (uncorrected sequence divergences at the 16S rRNA gene of >2.0% for all Onychostoma for which homologous 16S rRNA sequences are available).
2013-01-01
Background Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Results Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li’s D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li’s D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST. Conclusions This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens. PMID:23497218
Cornman, Robert Scott; Boncristiani, Humberto; Dainat, Benjamin; Chen, Yanping; vanEngelsdorp, Dennis; Weaver, Daniel; Evans, Jay D
2013-03-07
Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li's D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li's D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST. This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens.
Ghedotti, Michael J; Davis, Matthew P
2017-04-10
The fossils species †Fundulus detillae, †F. lariversi, and †F. nevadensis from localities in the western United States are represented by well-preserved material with date estimations. We combined morphological data for these fossil taxa with morphological and DNA-sequence data to conduct a phylogenetic analysis and a tip-based divergence-time estimation for the family Fundulidae. The resultant phylogeny is largely concordant with the prior total-evidence phylogeny. The fossil species do not form a monophyletic group, and do not represent a discrete western radiation of Fundulus as previously proposed. The genus Fundulus diverged into subgeneric clades likely in the Eocene or Oligocene (mean age 34.6 mya, 53-23 mya), and all subgeneric and most species-group clades had evolved by the middle Miocene. †Fundulus lariversi is a member of subgenus Fundulus in which all extant species are found only in eastern North America, demonstrating that fundulids had a complicated biogeographic history. We confirmed †Fundulus detillae as a member of the subgenus Plancterus. †F. nevadensis is not classified in a subgenus but likely is related to the subgenera Plancterus and Wileyichthys.
Variable sexually dimorphic gene expression in laboratory strains of Drosophila melanogaster.
Baker, Dean A; Meadows, Lisa A; Wang, Jing; Dow, Julian At; Russell, Steven
2007-12-10
Wild-type laboratory strains of model organisms are typically kept in isolation for many years, with the action of genetic drift and selection on mutational variation causing lineages to diverge with time. Natural populations from which such strains are established, show that gender-specific interactions in particular drive many aspects of sequence level and transcriptional level variation. Here, our goal was to identify genes that display transcriptional variation between laboratory strains of Drosophila melanogaster, and to explore evidence of gender-biased interactions underlying that variability. Transcriptional variation among the laboratory genotypes studied occurs more frequently in males than in females. Qualitative differences are also apparent to suggest that genes within particular functional classes disproportionately display variation in gene expression. Our analysis indicates that genes with reproductive functions are most often divergent between genotypes in both sexes, however a large proportion of female variation can also be attributed to genes without expression in the ovaries. The present study clearly shows that transcriptional variation between common laboratory strains of Drosophila can differ dramatically due to sexual dimorphism. Much of this variation reflects sex-specific challenges associated with divergent physiological trade-offs, morphology and regulatory pathways operating within males and females.
Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis
Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia
2011-01-01
Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358
Sequence analysis of MHC class I α2 from sockeye salmon (Oncorhynchus nerka).
McClelland, Erin K; Ming, Tobi J; Tabata, Amy; Miller, Kristina M
2011-09-01
Most studies assessing adaptive MHC diversity in salmon populations have focused on the classical class II DAB or DAA loci, as these have been most amenable to single PCR amplifications due to their relatively low level of sequence divergence. Herein, we report the characterization of the classical class I UBA α2 locus based on collections taken throughout the species range of sockeye salmon (Oncorhynchus nerka). Through use of multiple lineage-specific primer sets, denaturing gradient gel electrophoresis and sequencing, we identified thirty-four alleles from three highly divergent lineages. Sequence identity between lineages ranged from 30.0% to 56.8% but was relatively high within lineages. Allelic identity within the antigen recognition site (ARS) was greater than for the longer sequence. Global positive selection on UBA was seen at the sequence level (dN:dS = 1.012) with four codons under positive selection and 12 codons under negative selection. Crown Copyright © 2011. Published by Elsevier Ltd. All rights reserved.
Bass, David; Moureau, Gregory; Tang, Shuoya; McAlister, Erica; Culverwell, C. Lorna; Glücksman, Edvard; Wang, Hui; Brown, T. David K.; Gould, Ernest A.; Harbach, Ralph E.; de Lamballerie, Xavier; Firth, Andrew E.
2013-01-01
We investigated whether small RNA (sRNA) sequenced from field-collected mosquitoes and chironomids (Diptera) can be used as a proxy signature of viral prevalence within a range of species and viral groups, using sRNAs sequenced from wild-caught specimens, to inform total RNA deep sequencing of samples of particular interest. Using this strategy, we sequenced from adult Anopheles maculipennis s.l. mosquitoes the apparently nearly complete genome of one previously undescribed virus related to chronic bee paralysis virus, and, from a pool of Ochlerotatus caspius and Oc. detritus mosquitoes, a nearly complete entomobirnavirus genome. We also reconstructed long sequences (1503-6557 nt) related to at least nine other viruses. Crucially, several of the sequences detected were reconstructed from host organisms highly divergent from those in which related viruses have been previously isolated or discovered. It is clear that viral transmission and maintenance cycles in nature are likely to be significantly more complex and taxonomically diverse than previously expected. PMID:24260463
Estimation of primate speciation dates using local molecular clocks.
Yoder, A D; Yang, Z
2000-07-01
Protein-coding genes of the mitochondrial genomes from 31 mammalian species were analyzed to estimate the speciation dates within primates and also between rats and mice. Three calibration points were used based on paleontological data: one at 20-25 MYA for the hominoid/cercopithecoid divergence, one at 53-57 MYA for the cetacean/artiodactyl divergence, and the third at 110-130 MYA for the metatherian/eutherian divergence. Both the nucleotide and the amino acid sequences were analyzed, producing conflicting results. The global molecular clock was clearly violated for both the nucleotide and the amino acid data. Models of local clocks were implemented using maximum likelihood, allowing different evolutionary rates for some lineages while assuming rate constancy in others. Surprisingly, the highly divergent third codon positions appeared to contain phylogenetic information and produced more sensible estimates of primate divergence dates than did the amino acid sequences. Estimated dates varied considerably depending on the data type, the calibration point, and the substitution model but differed little among the four tree topologies used. We conclude that the calibration derived from the primate fossil record is too recent to be reliable; we also point out a number of problems in date estimation when the molecular clock does not hold. Despite these obstacles, we derived estimates of primate divergence dates that were well supported by the data and were generally consistent with the paleontological record. Estimation of the mouse-rat divergence date, however, was problematic.
Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal
2008-07-01
UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request.
The genome sequence of the model ascomycete fungus Podospora anserina
Espagne, Eric; Lespinet, Olivier; Malagnac, Fabienne; Da Silva, Corinne; Jaillon, Olivier; Porcel, Betina M; Couloux, Arnaud; Aury, Jean-Marc; Ségurens, Béatrice; Poulain, Julie; Anthouard, Véronique; Grossetete, Sandrine; Khalili, Hamid; Coppin, Evelyne; Déquard-Chablat, Michelle; Picard, Marguerite; Contamine, Véronique; Arnaise, Sylvie; Bourdais, Anne; Berteaux-Lecellier, Véronique; Gautheret, Daniel; de Vries, Ronald P; Battaglia, Evy; Coutinho, Pedro M; Danchin, Etienne GJ; Henrissat, Bernard; Khoury, Riyad EL; Sainsard-Chanet, Annie; Boivin, Antoine; Pinan-Lucarré, Bérangère; Sellem, Carole H; Debuchy, Robert; Wincker, Patrick; Weissenbach, Jean; Silar, Philippe
2008-01-01
Background The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development. Results We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed sequence tag collection. Similar to higher eukaryotes, the P. anserina transcription/splicing machinery generates numerous non-conventional transcripts. Comparison of the P. anserina genome and orthologous gene set with the one of its close relatives, Neurospora crassa, shows that synteny is poorly conserved, the main result of evolution being gene shuffling in the same chromosome. The P. anserina genome contains fewer repeated sequences and has evolved new genes by duplication since its separation from N. crassa, despite the presence of the repeat induced point mutation mechanism that mutates duplicated sequences. We also provide evidence that frequent gene loss took place in the lineages leading to P. anserina and N. crassa. P. anserina contains a large and highly specialized set of genes involved in utilization of natural carbon sources commonly found in its natural biotope. It includes genes potentially involved in lignin degradation and efficient cellulose breakdown. Conclusion The features of the P. anserina genome indicate a highly dynamic evolution since the divergence of P. anserina and N. crassa, leading to the ability of the former to use specific complex carbon sources that match its needs in its natural biotope. PMID:18460219
Loeza-Quintana, Tzitziki; Adamowicz, Sarah J
2018-02-01
During the past 50 years, the molecular clock has become one of the main tools for providing a time scale for the history of life. In the era of robust molecular evolutionary analysis, clock calibration is still one of the most basic steps needing attention. When fossil records are limited, well-dated geological events are the main resource for calibration. However, biogeographic calibrations have often been used in a simplistic manner, for example assuming simultaneous vicariant divergence of multiple sister lineages. Here, we propose a novel iterative calibration approach to define the most appropriate calibration date by seeking congruence between the dates assigned to multiple allopatric divergences and the geological history. Exploring patterns of molecular divergence in 16 trans-Bering sister clades of echinoderms, we demonstrate that the iterative calibration is predominantly advantageous when using complex geological or climatological events-such as the opening/reclosure of the Bering Strait-providing a powerful tool for clock dating that can be applied to other biogeographic calibration systems and further taxa. Using Bayesian analysis, we observed that evolutionary rate variability in the COI-5P gene is generally distributed in a clock-like fashion for Northern echinoderms. The results reveal a large range of genetic divergences, consistent with multiple pulses of trans-Bering migrations. A resulting rate of 2.8% pairwise Kimura-2-parameter sequence divergence per million years is suggested for the COI-5P gene in Northern echinoderms. Given that molecular rates may vary across latitudes and taxa, this study provides a new context for dating the evolutionary history of Arctic marine life.
Genomic variation among populations of threatened coral: Acropora cervicornis.
Drury, C; Dale, K E; Panlilio, J M; Miller, S V; Lirman, D; Larson, E A; Bartels, E; Crawford, D L; Oleksiak, M F
2016-04-13
Acropora cervicornis, a threatened, keystone reef-building coral has undergone severe declines (>90 %) throughout the Caribbean. These declines could reduce genetic variation and thus hamper the species' ability to adapt. Active restoration strategies are a common conservation approach to mitigate species' declines and require genetic data on surviving populations to efficiently respond to declines while maintaining the genetic diversity needed to adapt to changing conditions. To evaluate active restoration strategies for the staghorn coral, the genetic diversity of A. cervicornis within and among populations was assessed in 77 individuals collected from 68 locations along the Florida Reef Tract (FRT) and in the Dominican Republic. Genotyping by Sequencing (GBS) identified 4,764 single nucleotide polymorphisms (SNPs). Pairwise nucleotide differences (π) within a population are large (~37 %) and similar to π across all individuals. This high level of genetic diversity along the FRT is similar to the diversity within a small, isolated reef. Much of the genetic diversity (>90 %) exists within a population, yet GBS analysis shows significant variation along the FRT, including 300 SNPs with significant FST values and significant divergence relative to distance. There are also significant differences in SNP allele frequencies over small spatial scales, exemplified by the large FST values among corals collected within Miami-Dade county. Large standing diversity was found within each population even after recent declines in abundance, including significant, potentially adaptive divergence over short distances. The data here inform conservation and management actions by uncovering population structure and high levels of diversity maintained within coral collections among sites previously shown to have little genetic divergence. More broadly, this approach demonstrates the power of GBS to resolve differences among individuals and identify subtle genetic structure, informing conservation goals with evolutionary implications.
Senerchia, Natacha; Wicker, Thomas; Felber, François; Parisod, Christian
2013-01-01
Transposable elements (TEs) represent a major fraction of plant genomes and drive their evolution. An improved understanding of genome evolution requires the dynamics of a large number of TE families to be considered. We put forward an approach bypassing the required step of a complete reference genome to assess the evolutionary trajectories of high copy number TE families from genome snapshot with high-throughput sequencing. Low coverage sequencing of the complex genomes of Aegilops cylindrica and Ae. geniculata using 454 identified more than 70% of the sequences as known TEs, mainly long terminal repeat (LTR) retrotransposons. Comparing the abundance of reads as well as patterns of sequence diversity and divergence within and among genomes assessed the dynamics of 44 major LTR retrotransposon families of the 165 identified. In particular, molecular population genetics on individual TE copies distinguished recently active from quiescent families and highlighted different evolutionary trajectories of retrotransposons among related species. This work presents a suite of tools suitable for current sequencing data, allowing to address the genome-wide evolutionary dynamics of TEs at the family level and advancing our understanding of the evolution of nonmodel genomes.
Burgess, Diane; Freeling, Michael
2014-01-01
In vertebrates, conserved noncoding elements (CNEs) are functionally constrained sequences that can show striking conservation over >400 million years of evolutionary distance and frequently are located megabases away from target developmental genes. Conserved noncoding sequences (CNSs) in plants are much shorter, and it has been difficult to detect conservation among distantly related genomes. In this article, we show not only that CNS sequences can be detected throughout the eudicot clade of flowering plants, but also that a subset of 37 CNSs can be found in all flowering plants (diverging ∼170 million years ago). These CNSs are functionally similar to vertebrate CNEs, being highly associated with transcription factor and development genes and enriched in transcription factor binding sites. Some of the most highly conserved sequences occur in genes encoding RNA binding proteins, particularly the RNA splicing–associated SR genes. Differences in sequence conservation between plants and animals are likely to reflect differences in the biology of the organisms, with plants being much more able to tolerate genomic deletions and whole-genome duplication events due, in part, to their far greater fecundity compared with vertebrates. PMID:24681619
Picard, François J.; Ke, Danbing; Boudreau, Dominique K.; Boissinot, Maurice; Huletsky, Ann; Richard, Dave; Ouellette, Marc; Roy, Paul H.; Bergeron, Michel G.
2004-01-01
A 761-bp portion of the tuf gene (encoding the elongation factor Tu) from 28 clinically relevant streptococcal species was obtained by sequencing amplicons generated using broad-range PCR primers. These tuf sequences were used to select Streptococcus-specific PCR primers and to perform phylogenetic analysis. The specificity of the PCR assay was verified using 102 different bacterial species, including the 28 streptococcal species. Genomic DNA purified from all streptococcal species was efficiently detected, whereas there was no amplification with DNA from 72 of the 74 nonstreptococcal bacterial species tested. There was cross-amplification with DNAs from Enterococcus durans and Lactococcus lactis. However, the 15 to 31% nucleotide sequence divergence in the 761-bp tuf portion of these two species compared to any streptococcal tuf sequence provides ample sequence divergence to allow the development of internal probes specific to streptococci. The Streptococcus-specific assay was highly sensitive for all 28 streptococcal species tested (i.e., detection limit of 1 to 10 genome copies per PCR). The tuf sequence data was also used to perform extensive phylogenetic analysis, which was generally in agreement with phylogeny determined on the basis of 16S rRNA gene data. However, the tuf gene provided a better discrimination at the streptococcal species level that should be particularly useful for the identification of very closely related species. In conclusion, tuf appears more suitable than the 16S ribosomal RNA gene for the development of diagnostic assays for the detection and identification of streptococcal species because of its higher level of species-specific genetic divergence. PMID:15297518
Marques, A C P B; Franco, A C S; Salgueiro, F; García-Berthou, E; Santos, L N
2016-12-01
This study used the hypervariable domain of the mitochondrial DNA (mtDNA) control region (CR) to assess the genetic divergence among native and invasive populations of Cichla kelberi, which is considered the first peacock cichlid introduced and established throughout Brazil and is among the most invasive populations of this genus worldwide. The maximum likelihood tree based on 53 CR sequences with strong bootstrap support revealed that C. kelberi forms a monophyletic clade, confirming that all 30 C. kelberi studied belong to this morphotype. Additionally, the haplotype analysis of the C. kelberi sequences from 11 sampling sites revealed that invasive populations are much less diverse than native ones and largely dominated by a single haplotype that prevailed in reservoirs at the Paraíba do Sul River basin. Two haplotypes were recorded exclusively in an invasive population at Porto Rico, southern Brazil, and one private haplotype was detected in two reservoirs from Paraíba do Sul (Pereira Passos and Paracambi), suggesting more than one introduction event and that native populations should be better evaluated to encompass the entire genetic diversity of native C. kelberi. The possible route and pathways of C. kelberi introduction are also briefly discussed. © 2016 The Fisheries Society of the British Isles.
Genome structure and primitive sex chromosome revealed in Populus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tuskan, Gerald A; Yin, Tongming; Gunter, Lee E
We constructed a comprehensive genetic map for Populus and ordered 332 Mb of sequence scaffolds along the 19 haploid chromosomes in order to compare chromosomal regions among diverse members of the genus. These efforts lead us to conclude that chromosome XIX in Populus is evolving into a sex chromosome. Consistent segregation distortion in favor of the sub-genera Tacamahaca alleles provided evidence of divergent selection among species, particularly at the proximal end of chromosome XIX. A large microsatellite marker (SSR) cluster was detected in the distorted region even though the genome-wide distribute SSR sites was uniform across the physical map. Themore » differences between the genetic map and physical sequence data suggested recombination suppression was occurring in the distorted region. A gender-determination locus and an overabundance of NBS-LRR genes were also co-located to the distorted region and were put forth as the cause for divergent selection and recombination suppression. This hypothesis was verified by using fine-scale mapping of an integrated scaffold in the vicinity of the gender-determination locus. As such it appears that chromosome XIX in Populus is in the process of evolving from an autosome into a sex chromosome and that NBS-LRR genes may play important role in the chromosomal diversification process in Populus.« less
Saito, Yuichi; Mekuchi, Miyuki; Kobayashi, Noriaki; Kimura, Makoto; Aoki, Yasuhiro; Masuda, Tomohiro; Azuma, Teruo; Fukami, Motohiro; Iigo, Masayuki; Yanagisawa, Tadashi
2011-11-01
Molecular cloning of thyrotropin-releasing hormone receptors (TRHR) was performed in a teleost, the sockeye salmon (Oncorhynchus nerka). Four different TRHR cDNAs were cloned and named TRHR1, TRHR2a, TRHR2b and TRHR3 based on their similarity to known TRHR subtypes in vertebrates. Important residues for TRH binding were conserved in deduced amino acid sequences of the three TRHR subtypes except for the TRHR2b. Seven transmembrane domains were predicted for TRHR1, TRHR2a and TRHR3 proteins but only five for TRHR2b which appears to be truncated. In silico database analysis identified putative TRHR sequences including invertebrate TRHR and reptilian, avian and mammalian TRHR3. Phylogenetic analyses predicted the molecular evolution of TRHR in vertebrates: from the common ancestral TRHR (i.e. invertebrate TRHR), the TRHR2 subtype diverged first and then TRHR1 and TRHR3 diverged. Reverse transcription-polymerase chain reaction analyses revealed TRHR1 transcripts in the brain (hypothalamus), retina, pituitary gland and large intestine; TRHR2a in the brain (telencephalon and hypothalamus); and TRHR3 in the brain (olfactory bulbs) and retina. Copyright © 2011 Elsevier Inc. All rights reserved.
[Divergence of paralogous growth-hormone-encoding genes and their promoters in Salmonidae].
Kamenskaya, D N; Pankova, M V; Atopkin, D M; Brykov, V A
2017-01-01
In many fish species, including salmonids, the growth-hormone is encoded by two duplicated paralogous genes, gh1 and gh2. Both genes were already in place at the time of divergence of species in this group. A comparison of the entire sequence of these genes of salmonids has shown that their conserved regions are associated with exons, while their most variable regions correspond to introns. Introns C and D include putative regulatory elements (sites Pit-1, CRE, and ERE), that are also conserved. In chars, the degree of polymorphism of gh2 gene is 2-3 times as large as that in gh1 gene. However, a comparison across all Salmonidae species would not extent this observation to other species. In both these chars' genes, the promoters are conserved mainly because they correspond to putative regulatory sequences (TATA box, binding sites for the pituitary transcription factor Pit-1 (F1-F4), CRE, GRE and RAR/RXR elements). The promoter of gh2 gene has a greater degree of polymorphism compared with gh1 gene promoter in all investigated species of salmonids. The observed differences in the rates of accumulation of changes in growth hormone encoding paralogs could be explained by differences in the intensity of selection.
Xie, Yan-Ping; Meng, Ying; Sun, Hang; Nie, Ze-Long
2016-01-01
Tibetia and Gueldenstaedtia are two morphologically similar and small genera in Fabaceae, with distributions largely corresponding to the Sino-Himalayan and Sino-Japanese subkingdoms in eastern Asia, respectively. These two genera have confusing relationships based on morphology; therefore, we aimed to provide a clear understanding of their phylogenetic and biogeographic evolution within eastern Asia. In our investigations we included 88 samples representing five Gueldenstaedtia species, five Tibetia species, and outgroup species were sequenced using five markers (nuclear: ITS; chloroplast: matK, trnL-F, psbA-trnH and rbcL). Our phylogenetic results support (1) the monophyly of Tibetia and of Gueldenstaedtia, respectively; and (2) that Tibetia and Gueldenstaedtia are sister genera. Additionally, our data identified that Tibetia species had much higher sequence variation than Gueldenstaedtia species. Our results suggest that the two genera were separated from each other about 17.23 million years ago, which is congruent with the Himalayan orogeny and the uplift of the Tibetan Plateau in the mid Miocene. The divergence of Tibetia and Gueldenstaedtia is strongly supported by the separation of the Sino-Himalayan and Sino-Japanese region within eastern Asia. In addition, the habitat heterogeneity may accelerate the molecular divergence of Tibetia in the Sino-Himalayan region. PMID:27632535
Xie, Yan-Ping; Meng, Ying; Sun, Hang; Nie, Ze-Long
2016-01-01
Tibetia and Gueldenstaedtia are two morphologically similar and small genera in Fabaceae, with distributions largely corresponding to the Sino-Himalayan and Sino-Japanese subkingdoms in eastern Asia, respectively. These two genera have confusing relationships based on morphology; therefore, we aimed to provide a clear understanding of their phylogenetic and biogeographic evolution within eastern Asia. In our investigations we included 88 samples representing five Gueldenstaedtia species, five Tibetia species, and outgroup species were sequenced using five markers (nuclear: ITS; chloroplast: matK, trnL-F, psbA-trnH and rbcL). Our phylogenetic results support (1) the monophyly of Tibetia and of Gueldenstaedtia, respectively; and (2) that Tibetia and Gueldenstaedtia are sister genera. Additionally, our data identified that Tibetia species had much higher sequence variation than Gueldenstaedtia species. Our results suggest that the two genera were separated from each other about 17.23 million years ago, which is congruent with the Himalayan orogeny and the uplift of the Tibetan Plateau in the mid Miocene. The divergence of Tibetia and Gueldenstaedtia is strongly supported by the separation of the Sino-Himalayan and Sino-Japanese region within eastern Asia. In addition, the habitat heterogeneity may accelerate the molecular divergence of Tibetia in the Sino-Himalayan region.
Spatially restricted G protein-coupled receptor activity via divergent endocytic compartments.
Jean-Alphonse, Frederic; Bowersox, Shanna; Chen, Stanford; Beard, Gemma; Puthenveedu, Manojkumar A; Hanyaloglu, Aylin C
2014-02-14
Postendocytic sorting of G protein-coupled receptors (GPCRs) is driven by their interactions between highly diverse receptor sequence motifs with their interacting proteins, such as postsynaptic density protein (PSD95), Drosophila disc large tumor suppressor (Dlg1), zonula occludens-1 protein (zo-1) (PDZ) domain proteins. However, whether these diverse interactions provide an underlying functional specificity, in addition to driving sorting, is unknown. Here we identify GPCRs that recycle via distinct PDZ ligand/PDZ protein pairs that exploit their recycling machinery primarily for targeted endosomal localization and signaling specificity. The luteinizing hormone receptor (LHR) and β2-adrenergic receptor (B2AR), two GPCRs sorted to the regulated recycling pathway, underwent divergent trafficking to distinct endosomal compartments. Unlike B2AR, which traffics to early endosomes (EE), LHR internalizes to distinct pre-early endosomes (pre-EEs) for its recycling. Pre-EE localization required interactions of the LHR C-terminal tail with the PDZ protein GAIP-interacting protein C terminus, inhibiting its traffic to EEs. Rerouting the LHR to EEs, or EE-localized GPCRs to pre-EEs, spatially reprograms MAPK signaling. Furthermore, LHR-mediated activation of MAPK signaling requires internalization and is maintained upon loss of the EE compartment. We propose that combinatorial specificity between GPCR sorting sequences and interacting proteins dictates an unprecedented spatiotemporal control in GPCR signal activity.
Resolving Evolutionary Relationships in Closely Related Species with Whole-Genome Sequencing Data
Nater, Alexander; Burri, Reto; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans
2015-01-01
Using genetic data to resolve the evolutionary relationships of species is of major interest in evolutionary and systematic biology. However, reconstructing the sequence of speciation events, the so-called species tree, in closely related and potentially hybridizing species is very challenging. Processes such as incomplete lineage sorting and interspecific gene flow result in local gene genealogies that differ in their topology from the species tree, and analyses of few loci with a single sequence per species are likely to produce conflicting or even misleading results. To study these phenomena on a full phylogenomic scale, we use whole-genome sequence data from 200 individuals of four black-and-white flycatcher species with so far unresolved phylogenetic relationships to infer gene tree topologies and visualize genome-wide patterns of gene tree incongruence. Using phylogenetic analysis in nonoverlapping 10-kb windows, we show that gene tree topologies are extremely diverse and change on a very small physical scale. Moreover, we find strong evidence for gene flow among flycatcher species, with distinct patterns of reduced introgression on the Z chromosome. To resolve species relationships on the background of widespread gene tree incongruence, we used four complementary coalescent-based methods for species tree reconstruction, including complex modeling approaches that incorporate post-divergence gene flow among species. This allowed us to infer the most likely species tree with high confidence. Based on this finding, we show that regions of reduced effective population size, which have been suggested as particularly useful for species tree inference, can produce positively misleading species tree topologies. Our findings disclose the pitfalls of using loci potentially under selection as phylogenetic markers and highlight the potential of modeling approaches to disentangle species relationships in systems with large effective population sizes and post-divergence gene flow. PMID:26187295
Ekblom, Robert; Farrell, Lindsay L; Lank, David B; Burke, Terry
2012-01-01
By next generation transcriptome sequencing, it is possible to obtain data on both nucleotide sequence variation and gene expression. We have used this approach (RNA-Seq) to investigate the genetic basis for differences in plumage coloration and mating strategies in a non-model bird species, the ruff (Philomachus pugnax). Ruff males show enormous variation in the coloration of ornamental feathers, used for individual recognition. This polymorphism is linked to reproductive strategies, with dark males (Independents) defending territories on leks against other Independents, whereas white morphs (Satellites) co-occupy Independent's courts without agonistic interactions. Previous work found a strong genetic component for mating strategy, but the genes involved were not identified. We present feather transcriptome data of more than 6,000 de-novo sequenced ruff genes (although with limited coverage for many of them). None of the identified genes showed significant expression divergence between males, but many genetic markers showed nucleotide differentiation between different color morphs and mating strategies. These include several feather keratin genes, splicing factors, and the Xg blood-group gene. Many of the genes with significant genetic structure between mating strategies have not yet been annotated and their functions remain to be elucidated. We also conducted in-depth investigations of 28 pre-identified coloration candidate genes. Two of these (EDNRB and TYR) were specifically expressed in black- and rust-colored males, respectively. We have demonstrated the utility of next generation transcriptome sequencing for identifying and genotyping large number of genetic markers in a non-model species without previous genomic resources, and highlight the potential of this approach for addressing the genetic basis of ecologically important variation. PMID:23145334
Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content.
Hughes, Jennifer F; Skaletsky, Helen; Pyntikova, Tatyana; Graves, Tina A; van Daalen, Saskia K M; Minx, Patrick J; Fulton, Robert S; McGrath, Sean D; Locke, Devin P; Friedman, Cynthia; Trask, Barbara J; Mardis, Elaine R; Warren, Wesley C; Repping, Sjoerd; Rozen, Steve; Wilson, Richard K; Page, David C
2010-01-28
The human Y chromosome began to evolve from an autosome hundreds of millions of years ago, acquiring a sex-determining function and undergoing a series of inversions that suppressed crossing over with the X chromosome. Little is known about the recent evolution of the Y chromosome because only the human Y chromosome has been fully sequenced. Prevailing theories hold that Y chromosomes evolve by gene loss, the pace of which slows over time, eventually leading to a paucity of genes, and stasis. These theories have been buttressed by partial sequence data from newly emergent plant and animal Y chromosomes, but they have not been tested in older, highly evolved Y chromosomes such as that of humans. Here we finished sequencing of the male-specific region of the Y chromosome (MSY) in our closest living relative, the chimpanzee, achieving levels of accuracy and completion previously reached for the human MSY. By comparing the MSYs of the two species we show that they differ radically in sequence structure and gene content, indicating rapid evolution during the past 6 million years. The chimpanzee MSY contains twice as many massive palindromes as the human MSY, yet it has lost large fractions of the MSY protein-coding genes and gene families present in the last common ancestor. We suggest that the extraordinary divergence of the chimpanzee and human MSYs was driven by four synergistic factors: the prominent role of the MSY in sperm production, 'genetic hitchhiking' effects in the absence of meiotic crossing over, frequent ectopic recombination within the MSY, and species differences in mating behaviour. Although genetic decay may be the principal dynamic in the evolution of newly emergent Y chromosomes, wholesale renovation is the paramount theme in the continuing evolution of chimpanzee, human and perhaps other older MSYs.
Fungal Genes in Context: Genome Architecture Reflects Regulatory Complexity and Function
Noble, Luke M.; Andrianopoulos, Alex
2013-01-01
Gene context determines gene expression, with local chromosomal environment most influential. Comparative genomic analysis is often limited in scope to conserved or divergent gene and protein families, and fungi are well suited to this approach with low functional redundancy and relatively streamlined genomes. We show here that one aspect of gene context, the amount of potential upstream regulatory sequence maintained through evolution, is highly predictive of both molecular function and biological process in diverse fungi. Orthologs with large upstream intergenic regions (UIRs) are strongly enriched in information processing functions, such as signal transduction and sequence-specific DNA binding, and, in the genus Aspergillus, include the majority of experimentally studied, high-level developmental and metabolic transcriptional regulators. Many uncharacterized genes are also present in this class and, by implication, may be of similar importance. Large intergenic regions also share two novel sequence characteristics, currently of unknown significance: they are enriched for plus-strand polypyrimidine tracts and an information-rich, putative regulatory motif that was present in the last common ancestor of the Pezizomycotina. Systematic consideration of gene UIR in comparative genomics, particularly for poorly characterized species, could help reveal organisms’ regulatory priorities. PMID:23699226
Phylotranscriptomic consolidation of the jawed vertebrate timetree.
Irisarri, Iker; Baurain, Denis; Brinkmann, Henner; Delsuc, Frédéric; Sire, Jean-Yves; Kupfer, Alexander; Petersen, Jörn; Jarek, Michael; Meyer, Axel; Vences, Miguel; Philippe, Hervé
2017-09-01
Phylogenomics is extremely powerful but introduces new challenges as no agreement exists on "standards" for data selection, curation and tree inference. We use jawed vertebrates (Gnathostomata) as model to address these issues. Despite considerable efforts in resolving their evolutionary history and macroevolution, few studies have included a full phylogenetic diversity of gnathostomes and some relationships remain controversial. We tested a novel bioinformatic pipeline to assemble large and accurate phylogenomic datasets from RNA sequencing and find this phylotranscriptomic approach successful and highly cost-effective. Increased sequencing effort up to ca. 10Gbp allows recovering more genes, but shallower sequencing (1.5Gbp) is sufficient to obtain thousands of full-length orthologous transcripts. We reconstruct a robust and strongly supported timetree of jawed vertebrates using 7,189 nuclear genes from 100 taxa, including 23 new transcriptomes from previously unsampled key species. Gene jackknifing of genomic data corroborates the robustness of our tree and allows calculating genome-wide divergence times by overcoming gene sampling bias. Mitochondrial genomes prove insufficient to resolve the deepest relationships because of limited signal and among-lineage rate heterogeneity. Our analyses emphasize the importance of large curated nuclear datasets to increase the accuracy of phylogenomics and provide a reference framework for the evolutionary history of jawed vertebrates.
Wang, Xihong; Zheng, Zhuqing; Cai, Yudong; Chen, Ting; Li, Chao; Fu, Weiwei; Jiang, Yu
2017-12-01
The increasing amount of sequencing data available for a wide variety of species can be theoretically used for detecting copy number variations (CNVs) at the population level. However, the growing sample sizes and the divergent complexity of nonhuman genomes challenge the efficiency and robustness of current human-oriented CNV detection methods. Here, we present CNVcaller, a read-depth method for discovering CNVs in population sequencing data. The computational speed of CNVcaller was 1-2 orders of magnitude faster than CNVnator and Genome STRiP for complex genomes with thousands of unmapped scaffolds. CNV detection of 232 goats required only 1.4 days on a single compute node. Additionally, the Mendelian consistency of sheep trios indicated that CNVcaller mitigated the influence of high proportions of gaps and misassembled duplications in the nonhuman reference genome assembly. Furthermore, multiple evaluations using real sheep and human data indicated that CNVcaller achieved the best accuracy and sensitivity for detecting duplications. The fast generalized detection algorithms included in CNVcaller overcome prior computational barriers for detecting CNVs in large-scale sequencing data with complex genomic structures. Therefore, CNVcaller promotes population genetic analyses of functional CNVs in more species. © The Authors 2017. Published by Oxford University Press.
Wang, Xihong; Zheng, Zhuqing; Cai, Yudong; Chen, Ting; Li, Chao; Fu, Weiwei
2017-01-01
Abstract Background The increasing amount of sequencing data available for a wide variety of species can be theoretically used for detecting copy number variations (CNVs) at the population level. However, the growing sample sizes and the divergent complexity of nonhuman genomes challenge the efficiency and robustness of current human-oriented CNV detection methods. Results Here, we present CNVcaller, a read-depth method for discovering CNVs in population sequencing data. The computational speed of CNVcaller was 1–2 orders of magnitude faster than CNVnator and Genome STRiP for complex genomes with thousands of unmapped scaffolds. CNV detection of 232 goats required only 1.4 days on a single compute node. Additionally, the Mendelian consistency of sheep trios indicated that CNVcaller mitigated the influence of high proportions of gaps and misassembled duplications in the nonhuman reference genome assembly. Furthermore, multiple evaluations using real sheep and human data indicated that CNVcaller achieved the best accuracy and sensitivity for detecting duplications. Conclusions The fast generalized detection algorithms included in CNVcaller overcome prior computational barriers for detecting CNVs in large-scale sequencing data with complex genomic structures. Therefore, CNVcaller promotes population genetic analyses of functional CNVs in more species. PMID:29220491
Maruyama, Sandra Regina; Castro-Jorge, Luiza Antunes; Ribeiro, José Marcos Chaves; Gardinassi, Luiz Gustavo; Garcia, Gustavo Rocha; Brandão, Lucinda Giampietro; Rodrigues, Aline Rezende; Okada, Marcos Ituo; Abrão, Emiliana Pereira; Ferreira, Beatriz Rossetti; da Fonseca, Benedito Antonio Lopes; de Miranda-Santos, Isabel Kinney Ferreira
2013-01-01
Transcripts similar to those that encode the nonstructural (NS) proteins NS3 and NS5 from flaviviruses were found in a salivary gland (SG) complementary DNA (cDNA) library from the cattle tick Rhipicephalus microplus. Tick extracts were cultured with cells to enable the isolation of viruses capable of replicating in cultured invertebrate and vertebrate cells. Deep sequencing of the viral RNA isolated from culture supernatants provided the complete coding sequences for the NS3 and NS5 proteins and their molecular characterisation confirmed similarity with the NS3 and NS5 sequences from other flaviviruses. Despite this similarity, phylogenetic analyses revealed that this potentially novel virus may be a highly divergent member of the genus Flavivirus. Interestingly, we detected the divergent NS3 and NS5 sequences in ticks collected from several dairy farms widely distributed throughout three regions of Brazil. This is the first report of flavivirus-like transcripts in R. microplus ticks. This novel virus is a potential arbovirus because it replicated in arthropod and mammalian cells; furthermore, it was detected in a cDNA library from tick SGs and therefore may be present in tick saliva. It is important to determine whether and by what means this potential virus is transmissible and to monitor the virus as a potential emerging tick-borne zoonotic pathogen. PMID:24626302
LinkFinder: An expert system that constructs phylogenic trees
NASA Technical Reports Server (NTRS)
Inglehart, James; Nelson, Peter C.
1991-01-01
An expert system has been developed using the C Language Integrated Production System (CLIPS) that automates the process of constructing DNA sequence based phylogenies (trees or lineages) that indicate evolutionary relationships. LinkFinder takes as input homologous DNA sequences from distinct individual organisms. It measures variations between the sequences, selects appropriate proportionality constants, and estimates the time that has passed since each pair of organisms diverged from a common ancestor. It then designs and outputs a phylogenic map summarizing these results. LinkFinder can find genetic relationships between different species, and between individuals of the same species, including humans. It was designed to take advantage of the vast amount of sequence data being produced by the Genome Project, and should be of value to evolution theorists who wish to utilize this data, but who have no formal training in molecular genetics. Evolutionary theory holds that distinct organisms carrying a common gene inherited that gene from a common ancestor. Homologous genes vary from individual to individual and species to species, and the amount of variation is now believed to be directly proportional to the time that has passed since divergence from a common ancestor. The proportionality constant must be determined experimentally; it varies considerably with the types of organisms and DNA molecules under study. Given an appropriate constant, and the variation between two DNA sequences, a simple linear equation gives the divergence time.
Govindarajulu, Rajanikanth; Hughes, Colin E; Alexander, Patrick J; Bailey, C Donovan
2011-12-01
The evolutionary history of Leucaena has been impacted by polyploidy, hybridization, and divergent allopatric species diversification, suggesting that this is an ideal group to investigate the evolutionary tempo of polyploidy and the complexities of reticulation and divergence in plant diversification. Parsimony- and ML-based phylogenetic approaches were applied to 105 accessions sequenced for six sequence characterized amplified region-based nuclear encoded loci, nrDNA ITS, and four cpDNA regions. Hypotheses for the origin of tetraploid species were inferred using results derived from a novel species tree and established gene tree methods and from data on genome sizes and geographic distributions. The combination of comprehensively sampled multilocus DNA sequence data sets and a novel methodology provide strong resolution and support for the origins of all five tetraploid species. A minimum of four allopolyploidization events are required to explain the origins of these species. The origin(s) of one tetraploid pair (L. involucrata/L. pallida) can be equally explained by two unique allopolyploidizations or a single event followed by divergent speciation. Alongside other recent findings, a comprehensive picture of the complex evolutionary dynamics of polyploidy in Leucaena is emerging that includes paleotetraploidization, diploidization of the last common ancestor to Leucaena, allopatric divergence among diploids, and recent allopolyploid origins for tetraploid species likely associated with human translocation of seed. These results provide insights into the role of divergence and reticulation in a well-characterized angiosperm lineage and into traits of diploid parents and derived tetraploids (particularly self-compatibility and year-round flowering) favoring the formation and establishment of novel tetraploids combinations.
Stenglein, Mark D.; Sanders, Chris; Kistler, Amy L.; Ruby, J. Graham; Franco, Jessica Y.; Reavill, Drury R.; Dunker, Freeland; DeRisi, Joseph L.
2012-01-01
ABSTRACT Inclusion body disease (IBD) is an infectious fatal disease of snakes typified by behavioral abnormalities, wasting, and secondary infections. At a histopathological level, the disease is identified by the presence of large eosinophilic cytoplasmic inclusions in multiple tissues. To date, no virus or other pathogen has been definitively characterized or associated with the disease. Using a metagenomic approach to search for candidate etiologic agents in snakes with confirmed IBD, we identified and de novo assembled the complete genomic sequences of two viruses related to arenaviruses, and a third arenavirus-like sequence was discovered by screening an additional set of samples. A continuous boa constrictor cell line was established and used to propagate and isolate one of the viruses in culture. Viral nucleoprotein was localized and concentrated within large cytoplasmic inclusions in infected cells in culture and tissues from diseased snakes. In total, viral RNA was detected in 6/8 confirmed IBD cases and 0/18 controls. These viruses have a typical arenavirus genome organization but are highly divergent, belonging to a lineage separate from that of the Old and New World arenaviruses. Furthermore, these viruses encode envelope glycoproteins that are more similar to those of filoviruses than to those of other arenaviruses. These findings implicate these viruses as candidate etiologic agents of IBD. The presence of arenaviruses outside mammals reveals that these viruses infect an unexpectedly broad range of species and represent a new reservoir of potential human pathogens. PMID:22893382
Hornok, Sándor; Wang, Yuanzhi; Otranto, Domenico; Keskin, Adem; Lia, Riccardo Paolo; Kontschán, Jenő; Takács, Nóra; Farkas, Róbert; Sándor, Attila D
2016-12-15
Haemaphysalis erinacei is one of the few ixodid tick species for which valid names of subspecies exist. Despite their disputed taxonomic status in the literature, these subspecies have not yet been compared with molecular methods. The aim of the present study was to investigate the phylogenetic relationships of H. erinacei subspecies, in the context of the first finding of this tick species in Romania. After morphological identification, DNA was extracted from five adults of H. e. taurica (from Romania and Turkey), four adults of H. e. erinacei (from Italy) and 17 adults of H. e. turanica (from China). From these samples fragments of the cytochrome c oxidase subunit 1 (cox1) and 16S rRNA genes were amplified via PCR and sequenced. Results showed that cox1 and 16S rRNA gene sequence divergences between H. e. taurica from Romania and H. e. erinacei from Italy were below 2%. However, the sequence divergences between H. e. taurica from Romania and H. e. turanica from China were high (up to 7.3% difference for the 16S rRNA gene), exceeding the reported level of sequence divergence between closely related tick species. At the same time, two adults of H. e. taurica from Turkey had higher 16S rRNA gene similarity to H. e. turanica from China (up to 97.5%) than to H. e. taurica from Romania (96.3%), but phylogenetically clustered more closely to H. e. taurica than to H. e. turanica. This is the first finding of H. erinacei in Romania, and the first (although preliminary) phylogenetic comparison of H. erinacei subspecies. Phylogenetic analyses did not support that the three H. erinacei subspecies evaluated here are of equal taxonomic rank, because the genetic divergence between H. e. turanica from China and H. e. taurica from Romania exceeded the usual level of sequence divergence between closely related tick species, suggesting that they might represent different species. Therefore, the taxonomic status of the subspecies of H. erinacei needs to be revised based on a larger number of specimens collected throughout its geographical range.
Phylogenetic analysis of Demodex caprae based on mitochondrial 16S rDNA sequence.
Zhao, Ya-E; Hu, Li; Ma, Jun-Xian
2013-11-01
Demodex caprae infests the hair follicles and sebaceous glands of goats worldwide, which not only seriously impairs goat farming, but also causes a big economic loss. However, there are few reports on the DNA level of D. caprae. To reveal the taxonomic position of D. caprae within the genus Demodex, the present study conducted phylogenetic analysis of D. caprae based on mt16S rDNA sequence data. D. caprae adults and eggs were obtained from a skin nodule of the goat suffering demodicidosis. The mt16S rDNA sequences of individual mite were amplified using specific primers, and then cloned, sequenced, and aligned. The sequence divergence, genetic distance, and transition/transversion rate were computed, and the phylogenetic trees in Demodex were reconstructed. Results revealed the 339-bp partial sequences of six D. caprae isolates were obtained, and the sequence identity was 100% among isolates. The pairwise divergences between D. caprae and Demodex canis or Demodex folliculorum or Demodex brevis were 22.2-24.0%, 24.0-24.9%, and 22.9-23.2%, respectively. The corresponding average genetic distances were 2.840, 2.926, and 2.665, and the average transition/transversion rates were 0.70, 0.55, and 0.54, respectively. The divergences, genetic distances, and transition/transversion rates of D. caprae versus the other three species all reached interspecies level. The five phylogenetic trees all presented that D. caprae clustered with D. brevis first, and then with D. canis, D. folliculorum, and Demodex injai in sequence. In conclusion, D. caprae is an independent species, and it is closer to D. brevis than to D. canis, D. folliculorum, or D. injai.
Xu, Qifang; Dunbrack, Roland L
2012-11-01
Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM-HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly.
Full-genome sequence and analysis of a novel human rhinovirus strain within a divergent HRV-A clade.
Rathe, Jennifer A; Liu, Xinyue; Tallon, Luke J; Gern, James E; Liggett, Stephen B
2010-01-01
Genome sequences of human rhinoviruses (HRV) have primarily been from stocks collected in the 1960s, with genomes and phylogeny of modern HRVs remaining undefined. Here, two modern isolates (hrv-A101 and hrv-A101-v1) collected approximately 8 years apart were sequenced in their entirety. Incorporation into our full-genome HRV alignment with subsequent phylogenetic network inference indicated that these represent a unique HRV-A, localized within a distinct divergent clade. They appear to have resulted from recombination of the hrv-65 and hrv-78 lineages. These results support our contention that there are unrecognized distinct HRV-A strains, and that recombination is evident in currently circulating strains.
Laskar, Boni A.; Bhattacharjee, Maloyjo J.; Dhar, Bishal; Mahadani, Pradosh; Kundu, Shantanu; Ghosh, Sankar K.
2013-01-01
Background The taxonomic validity of Northeast Indian endemic Mahseer species, Tor progeneius and Neolissochilus hexastichus, has been argued repeatedly. This is mainly due to disagreements in recognizing the species based on morphological characters. Consequently, both the species have been concealed for many decades. DNA barcoding has become a promising and an independent technique for accurate species level identification. Therefore, utilization of such technique in association with the traditional morphotaxonomic description can resolve the species dilemma of this important group of sport fishes. Methodology/Principal Findings Altogether, 28 mahseer specimens including paratypes were studied from different locations in Northeast India, and 24 morphometric characters were measured invariably. The Principal Component Analysis with morphometric data revealed five distinct groups of sample that were taxonomically categorized into 4 species, viz., Tor putitora, T. progeneius, Neolissochilus hexagonolepis and N. hexastichus. Analysis with a dataset of 76 DNA barcode sequences of different mahseer species exhibited that the queries of T. putitora and N. hexagonolepis clustered cohesively with the respective conspecific database sequences maintaining 0.8% maximum K2P divergence. The closest congeneric divergence was 3 times higher than the mean conspecific divergence and was considered as barcode gap. The maximum divergence among the samples of T. progeneius and T. putitora was 0.8% that was much below the barcode gap, indicating them being synonymous. The query sequences of N. hexastichus invariably formed a discrete and a congeneric clade with the database sequences and maintained the interspecific divergence that supported its distinct species status. Notably, N. hexastichus was encountered in a single site and seemed to be under threat. Conclusion This study substantiated the identification of N. hexastichus to be a true species, and tentatively regarded T. progeneius to be a synonym of T. putitora. It would guide the conservationists to initiate priority conservation of N. hexastichus and T. putitora. PMID:23341979
Nilsson, Maria A; Härlid, Anna; Kullberg, Morgan; Janke, Axel
2010-05-01
The native rodents are the most species-rich placental mammal group on the Australian continent. Fossils of native Australian rodents belonging to the group Conilurini are known from Northern Australia at 4.5Ma. These fossil assemblages already display a rich diversity of rodents, but the exact timing of their arrival on the Australian continent is not yet established. The complete mitochondrial genomes of two native Australian rodents, Leggadina lakedownensis (Lakeland Downs mouse) and Pseudomys chapmani (Western Pebble-mound mouse) were sequenced for investigating their evolutionary history. The molecular data were used for studying the phylogenetic position and divergence times of the Australian rodents, using 12 calibration points and various methods. Phylogenetic analyses place the native Australian rodents as the sister-group to the genus Mus. The Mus-Conilurini calibration point (7.3-11.0Ma) is highly critical for estimating rodent divergence times, while the influence of the different algorithms on estimating divergence times is negligible. The influence of the data type was investigated, indicating that amino acid data are more likely to reflect the correct divergence times than nucleotide sequences. The study on the problems related to estimating divergence times in fast-evolving lineages such as rodents, emphasize the choice of data and calibration points as being critical. Furthermore, it is essential to include accurate calibration points for fast-evolving groups, because the divergence times can otherwise be estimated to be significantly older. The divergence times of the Australian rodents are highly congruent and are estimated to 6.5-7.2Ma, a date that is compatible with their fossil record.
2009-01-01
Background The full power of modern genetics has been applied to the study of speciation in only a small handful of genetic model species - all of which speciated allopatrically. Here we report the first large expressed sequence tag (EST) study of a candidate for ecological sympatric speciation, the apple maggot Rhagoletis pomonella, using massively parallel pyrosequencing on the Roche 454-FLX platform. To maximize transcript diversity we created and sequenced separate libraries from larvae, pupae, adult heads, and headless adult bodies. Results We obtained 239,531 sequences which assembled into 24,373 contigs. A total of 6810 unique protein coding genes were identified among the contigs and long singletons, corresponding to 48% of all known Drosophila melanogaster protein-coding genes. Their distribution across GO classes suggests that we have obtained a representative sample of the transcriptome. Among these sequences are many candidates for potential R. pomonella "speciation genes" (or "barrier genes") such as those controlling chemosensory and life-history timing processes. Furthermore, we identified important marker loci including more than 40,000 single nucleotide polymorphisms (SNPs) and over 100 microsatellites. An initial search for SNPs at which the apple and hawthorn host races differ suggested at least 75 loci warranting further work. We also determined that developmental expression differences remained even after normalization; transcripts expected to show different expression levels between larvae and pupae in D. melanogaster also did so in R. pomonella. Preliminary comparative analysis of transcript presences and absences revealed evidence of gene loss in Drosophila and gain in the higher dipteran clade Schizophora. Conclusions These data provide a much needed resource for exploring mechanisms of divergence in this important model for sympatric ecological speciation. Our description of ESTs from a substantial portion of the R. pomonella transcriptome will facilitate future functional studies of candidate genes for olfaction and diapause-related life history timing, and will enable large scale expression studies. Similarly, the identification of new SNP and microsatellite markers will facilitate future population and quantitative genetic studies of divergence between the apple and hawthorn-infesting host races. PMID:20035631
Redwan, R M; Saidin, A; Kumar, S V
2015-08-12
Pineapple (Ananas comosus var. comosus) is known as the king of fruits for its crown and is the third most important tropical fruit after banana and citrus. The plant, which is indigenous to South America, is the most important species in the Bromeliaceae family and is largely traded for fresh fruit consumption. Here, we report the complete chloroplast sequence of the MD-2 pineapple that was sequenced using the PacBio sequencing technology. In this study, the high error rate of PacBio long sequence reads of A. comosus's total genomic DNA were improved by leveraging on the high accuracy but short Illumina reads for error-correction via the latest error correction module from Novocraft. Error corrected long PacBio reads were assembled by using a single tool to produce a contig representing the pineapple chloroplast genome. The genome of 159,636 bp in length is featured with the conserved quadripartite structure of chloroplast containing a large single copy region (LSC) with a size of 87,482 bp, a small single copy region (SSC) with a size of 18,622 bp and two inverted repeat regions (IRA and IRB) each with the size of 26,766 bp. Overall, the genome contained 117 unique coding regions and 30 were repeated in the IR region with its genes contents, structure and arrangement similar to its sister taxon, Typha latifolia. A total of 35 repeats structure were detected in both the coding and non-coding regions with a majority being tandem repeats. In addition, 205 SSRs were detected in the genome with six protein-coding genes contained more than two SSRs. Comparative chloroplast genomes from the subclass Commelinidae revealed a conservative protein coding gene albeit located in a highly divergence region. Analysis of selection pressure on protein-coding genes using Ka/Ks ratio showed significant positive selection exerted on the rps7 gene of the pineapple chloroplast with P less than 0.05. Phylogenetic analysis confirmed the recent taxonomical relation among the member of commelinids which support the monophyly relationship between Arecales and Dasypogonaceae and between Zingiberales to the Poales, which includes the A. comosus. The complete sequence of the chloroplast of pineapple provides insights to the divergence of genic chloroplast sequences from the members of the subclass Commelinidae. The complete pineapple chloroplast will serve as a reference for in-depth taxonomical studies in the Bromeliaceae family when more species under the family are sequenced in the future. The genetic sequence information will also make feasible other molecular applications of the pineapple chloroplast for plant genetic improvement.
Fugong virus, a novel hantavirus harbored by the small oriental vole (Eothenomys eleusis) in China.
Ge, Xing-Yi; Yang, Wei-Hong; Pan, Hong; Zhou, Ji-Hua; Han, Xi; Zhu, Guang-Jian; Desmond, James S; Daszak, Peter; Shi, Zheng-Li; Zhang, Yun-Zhi
2016-02-16
Rodents are natural reservoirs of hantaviruses, which cause two disease types: hemorrhagic fever with renal syndrome in Eurasia and hantavirus pulmonary syndrome in North America. Hantaviruses related human cases have been observed throughout Asia, Europe, Africa, and North America. To date, 23 distinct species of hantaviruses, hosted by reservoir, have been identified. However, the diversity and number of hantaviruses are likely underestimated in China, and hantavirus species that cause disease in many regions, including Yunnan province, are unknown. In August 2012, we collected tissue samples from 189 captured animals, including 15 species belonging to 10 genera, 5 families, and 4 orders in Fugong county, Yunnan province, China. Seven species were positive for hantavirus: Eothenomys eleusis (42/94), Apodemus peninsulae (3/25), Niviventer eha (3/27), Cryptotis montivaga (2/8), Anourosorex squamipes (1/1), Sorex araneus (1/1), and Mustela sibirica (1/2). We characterized one full-length genomic sequence of the virus (named fugong virus, FUGV) from a small oriental vole (Eothenomys eleusis). The full-length sequences of the small, medium, and large segments of FUGV were 1813, 3630, and 6531 nt, respectively. FUGV was most closely related to hantavirus LX309, a previously reported species detected in the red-backed vole in Luxi county, Yunnan province, China. However, the amino acid sequences of nucleocapsid (N), glycoprotein (G), and large protein (L) were highly divergent from those of Hantavirus LX309, with amino acid differences of 11.2, 15.3, and 12.7 %, respectively. In phylogenetic trees, FUGV clustered in the lineage corresponding to hantaviruses carried by rodents in the subfamily Arvicolinae. High prevalence of hantavirus infection in small mammals was found in Fugong county, Yunnan province, China. A novel hantavirus species FUGV was identified from the small oriental vole. This virus is phylogenetic clustering with another hantavirus LX309, but shows highly genomic divergence.
Ancient wolf lineages in India.
Sharma, Dinesh K; Maldonado, Jesus E; Jhala, Yadrendradev V; Fleischer, Robert C
2004-01-01
All previously obtained wolf (Canis lupus) and dog (Canis familiaris) mitochondrial (mt) DNA sequences fall within an intertwined and shallow clade (the 'wolf-dog' clade). We sequenced mtDNA of recent and historical samples from 45 wolves from throughout lowland peninsular India and 23 wolves from the Himalayas and Tibetan Plateau and compared these sequences with all available wolf and dog sequences. All 45 lowland Indian wolves have one of four closely related haplotypes that form a well-supported, divergent sister lineage to the wolf-dog clade. This unique lineage may have been independent for more than 400,000 years. Although seven Himalayan wolves from western and central Kashmir fall within the widespread wolf-dog clade, one from Ladakh in eastern Kashmir, nine from Himachal Pradesh, four from Nepal and two from Tibet form a very different basal clade. This lineage contains five related haplotypes that probably diverged from other canids more than 800,000 years ago, but we find no evidence of current barriers to admixture. Thus, the Indian subcontinent has three divergent, ancient and apparently parapatric mtDNA lineages within the morphologically delineated wolf. No haplotypes of either novel lineage are found within a sample of 37 Indian (or other) dogs. Thus, we find no evidence that these two taxa played a part in the domestication of canids. PMID:15101402
Ancient wolf lineages in India.
Sharma, Dinesh K; Maldonado, Jesus E; Jhala, Yadrendradev V; Fleischer, Robert C
2004-02-07
All previously obtained wolf (Canis lupus) and dog (Canis familiaris) mitochondrial (mt) DNA sequences fall within an intertwined and shallow clade (the 'wolf-dog' clade). We sequenced mtDNA of recent and historical samples from 45 wolves from throughout lowland peninsular India and 23 wolves from the Himalayas and Tibetan Plateau and compared these sequences with all available wolf and dog sequences. All 45 lowland Indian wolves have one of four closely related haplotypes that form a well-supported, divergent sister lineage to the wolf-dog clade. This unique lineage may have been independent for more than 400,000 years. Although seven Himalayan wolves from western and central Kashmir fall within the widespread wolf-dog clade, one from Ladakh in eastern Kashmir, nine from Himachal Pradesh, four from Nepal and two from Tibet form a very different basal clade. This lineage contains five related haplotypes that probably diverged from other canids more than 800,000 years ago, but we find no evidence of current barriers to admixture. Thus, the Indian subcontinent has three divergent, ancient and apparently parapatric mtDNA lineages within the morphologically delineated wolf. No haplotypes of either novel lineage are found within a sample of 37 Indian (or other) dogs. Thus, we find no evidence that these two taxa played a part in the domestication of canids.
NASA Astrophysics Data System (ADS)
Xu, Jiajie; Jiang, Bo; Chai, Sanming; He, Yuan; Zhu, Jianyi; Shen, Zonggen; Shen, Songdong
2016-09-01
Filamentous Bangia, which are distributed extensively throughout the world, have simple and similar morphological characteristics. Scientists can classify these organisms using molecular markers in combination with morphology. We successfully sequenced the complete nuclear ribosomal DNA, approximately 13 kb in length, from a marine Bangia population. We further analyzed the small subunit ribosomal DNA gene (nrSSU) and the internal transcribed spacer (ITS) sequence regions along with nine other marine, and two freshwater Bangia samples from China. Pairwise distances of the nrSSU and 5.8S ribosomal DNA gene sequences show the marine samples grouping together with low divergences (00.003; 0-0.006, respectively) from each other, but high divergences (0.123-0.126; 0.198, respectively) from freshwater samples. An exception is the marine sample collected from Weihai, which shows high divergence from both other marine samples (0.063-0.065; 0.129, respectively) and the freshwater samples (0.097; 0.120, respectively). A maximum likelihood phylogenetic tree based on a combined SSU-ITS dataset with maximum likelihood method shows the samples divided into three clades, with the two marine sample clades containing Bangia spp. from North America, Europe, Asia, and Australia; and one freshwater clade, containing Bangia atropurpurea from North America and China.
Ren, Jindong; Du, Xue; Zeng, Tao; Chen, Li; Shen, Junda; Lu, Lizhi; Hu, Jianhong
2017-10-01
Long noncoding RNAs (lncRNAs) and divergently expressed genes exist widely in different tissues of mammals and birds, in which they are involved in various biological processes. However, there is limited information on their role in the regulation of normal biological processes during differentiation, development, and reproduction in birds. In this study, whole transcriptome strand-specific RNA sequencing of the ovary from young ducks (60days), first-laying ducks (160days), and old ducks, i.e., ducks that stopped laying eggs (490days) was performed. The lncRNAs and mRNAs from these ducks were systematically analyzed and identified by duck genome sequencing in the three study groups. The transcriptome from the duck ovary comprised 15,011 protein-coding genes and 2905 lncRNAs; all the lncRNAs were identified as novel long noncoding transcripts. The comparison of transcriptome data from different study groups identified 2240 divergent transcription genes and 135 divergently expressed lncRNAs, which differed among the groups; most of them were significantly downregulated with age. Among the divergent genes, 38 genes were related to the reproductive process and 6 genes were upregulated. Further prediction analysis revealed that 52 lncRNAs were closely correlated with divergent reproductive mRNAs. More importantly, 6 remarkable lncRNAs were correlated significantly with the conversion of the ovary in different phases. Our results aid in the understanding of the divergent transcriptome of duck ovary in different phases and the underlying mechanisms that drive the specificity of protein-coding genes and lncRNAs in duck ovary. Copyright © 2017. Published by Elsevier B.V.
USDA-ARS?s Scientific Manuscript database
Porcine reproductive and respiratory syndrome virus (PRRSV) is widespread with a high variation in sequence and virulence among the divergent strains and causes an economically destructive disease. A viral ovarian domain protease (vOTU) has been previously identified within the nonstructural protein...
New genes from old: asymmetric divergence of gene duplicates and the evolution of development.
Holland, Peter W H; Marlétaz, Ferdinand; Maeso, Ignacio; Dunwell, Thomas L; Paps, Jordi
2017-02-05
Gene duplications and gene losses have been frequent events in the evolution of animal genomes, with the balance between these two dynamic processes contributing to major differences in gene number between species. After gene duplication, it is common for both daughter genes to accumulate sequence change at approximately equal rates. In some cases, however, the accumulation of sequence change is highly uneven with one copy radically diverging from its paralogue. Such 'asymmetric evolution' seems commoner after tandem gene duplication than after whole-genome duplication, and can generate substantially novel genes. We describe examples of asymmetric evolution in duplicated homeobox genes of moths, molluscs and mammals, in each case generating new homeobox genes that were recruited to novel developmental roles. The prevalence of asymmetric divergence of gene duplicates has been underappreciated, in part, because the origin of highly divergent genes can be difficult to resolve using standard phylogenetic methods.This article is part of the themed issue 'Evo-devo in the genomics era, and the origins of morphological diversity'. © 2016 The Author(s).
Xu, Jianping; Yan, Zhun; Guo, Hong
2009-06-01
The inheritance of mitochondrial genes and genomes are uniparental in most sexual eukaryotes. This pattern of inheritance makes mitochondrial genomes in natural populations effectively clonal. Here, we examined the mitochondrial population genetics of the emerging human pathogenic fungus Cryptococcus gattii. The DNA sequences for five mitochondrial DNA fragments were obtained from each of 50 isolates belonging to two evolutionary divergent lineages, VGI and VGII. Our analyses revealed a greater sequence diversity within VGI than that within VGII, consistent with observations of the nuclear genes. The combined analyses of all five gene fragments indicated significant divergence between VGI and VGII. However, the five individual genealogies showed different relationships among the isolates, consistent with recent hybridization and mitochondrial gene transfer between the two lineages. Population genetic analyses of the multilocus data identified evidence for predominantly clonal mitochondrial population structures within both lineages. Interestingly, there were clear signatures of recombination among mitochondrial genes within the VGII lineage. Our analyses suggest historical mitochondrial genome divergence within C. gattii, but there is evidence for recent hybridization and recombination in the mitochondrial genome of this important human yeast pathogen.
Evaluating, Comparing, and Interpreting Protein Domain Hierarchies
2014-01-01
Abstract Arranging protein domain sequences hierarchically into evolutionarily divergent subgroups is important for investigating evolutionary history, for speeding up web-based similarity searches, for identifying sequence determinants of protein function, and for genome annotation. However, whether or not a particular hierarchy is optimal is often unclear, and independently constructed hierarchies for the same domain can often differ significantly. This article describes methods for statistically evaluating specific aspects of a hierarchy, for probing the criteria underlying its construction and for direct comparisons between hierarchies. Information theoretical notions are used to quantify the contributions of specific hierarchical features to the underlying statistical model. Such features include subhierarchies, sequence subgroups, individual sequences, and subgroup-associated signature patterns. Underlying properties are graphically displayed in plots of each specific feature's contributions, in heat maps of pattern residue conservation, in “contrast alignments,” and through cross-mapping of subgroups between hierarchies. Together, these approaches provide a deeper understanding of protein domain functional divergence, reveal uncertainties caused by inconsistent patterns of sequence conservation, and help resolve conflicts between competing hierarchies. PMID:24559108
Genome Sequences of Akhmeta Virus, an Early Divergent Old World Orthopoxvirus.
Gao, Jinxin; Gigante, Crystal; Khmaladze, Ekaterine; Liu, Pengbo; Tang, Shiyuyun; Wilkins, Kimberly; Zhao, Kun; Davidson, Whitni; Nakazawa, Yoshinori; Maghlakelidze, Giorgi; Geleishvili, Marika; Kokhreidze, Maka; Carroll, Darin S; Emerson, Ginny; Li, Yu
2018-05-12
Annotated whole genome sequences of three isolates of the Akhmeta virus (AKMV), a novel species of orthopoxvirus (OPXV), isolated from the Akhmeta and Vani regions of the country Georgia, are presented and discussed. The AKMV genome is similar in genomic content and structure to that of the cowpox virus (CPXV), but a lower sequence identity was found between AKMV and Old World OPXVs than between other known species of Old World OPXVs. Phylogenetic analysis showed that AKMV diverged prior to other Old World OPXV. AKMV isolates formed a monophyletic clade in the OPXV phylogeny, yet the sequence variability between AKMV isolates was higher than between the monkeypox virus strains in the Congo basin and West Africa. An AKMV isolate from Vani contained approximately six kb sequence in the left terminal region that shared a higher similarity with CPXV than with other AKMV isolates, whereas the rest of the genome was most similar to AKMV, suggesting recombination between AKMV and CPXV in a region containing several host range and virulence genes.
Evolution of nuclear rDNA ITS sequences in the Cladophora albida/sericea clade (Chlorophyta).
Bakker, F T; Olsen, J L; Stam, W T
1995-06-01
Ribosomal DNA ITS sequences were compared among 13 different species and biogeographic isolates from the monophyletic "albida/sericea clade" in the green algal genus Cladophora. Six distinct ITS sequence types were found, characterized by multiple insertions and deletions and high levels of nucleotide substitution. Conserved domains within the ITS regions indicate the presence of ITS secondary structure. Low transition/transversion ratios among the six types and nearly symmetrical tree-length frequency distributions indicate some saturation, and low phylogenetic signal. Although branching order among five of the six ITS sequence types could not be resolved, estimates of ITS sequence divergence as compared with 18S divergence in a subset of the taxa suggests that the origin of the different ITS types is probably in the mid-Miocene (12 Ma ago) but that biogeographic isolates within a single ITS type (including both Pacific and Atlantic representatives) have probably dispersed on a time scale of thousands rather than millions of years.
Catalano, Sarah R; Whittington, Ian D; Donnellan, Stephen C; Bertozzi, Terry; Gillanders, Bronwyn M
2015-07-01
Dicyemids, poorly known parasites of benthic cephalopods, are one of the few phyla in which mitochondrial (mt) genome architecture departs from the typical ~16 kb circular metazoan genome. In addition to a putative circular genome, a series of mt minicircles that each comprises the mt encoded units (I-III) of the cytochrome c oxidase complex have been reported. Whether the structure of the mt minicircles is a consistent feature among dicyemid species is unknown. Here we analyse the complete cytochrome c oxidase subunit I (COI) minicircle molecule, containing the COI gene and an associated non-coding region (NCR), for ten dicyemid species, allowing for first time comparisons between species of minicircle architecture, NCR function and inferences of minicircle replication. Divergence in COI nucleotide sequences between dicyemid species was high (average net divergence = 31.6%) while within species diversity was lower (average net divergence = 0.2%). The NCR and putative 5' section of the COI gene were highly divergent between dicyemid species (average net nucleotide divergence of putative 5' COI section = 61.1%). No tRNA genes were found in the NCR, although palindrome sequences with the potential to form stem-loop structures were identified in some species, which may play a role in transcription or other biological processes.
CNL Disease Resistance Genes in Soybean and Their Evolutionary Divergence
Nepal, Madhav P; Benson, Benjamin V
2015-01-01
Disease resistance genes (R-genes) encode proteins involved in detecting pathogen attack and activating downstream defense molecules. Recent availability of soybean genome sequences makes it possible to examine the diversity of gene families including disease-resistant genes. The objectives of this study were to identify coiled-coil NBS-LRR (= CNL) R-genes in soybean, infer their evolutionary relationships, and assess structural as well as functional divergence of the R-genes. Profile hidden Markov models were used for sequence identification and model-based maximum likelihood was used for phylogenetic analysis, and variation in chromosomal positioning, gene clustering, and functional divergence were assessed. We identified 188 soybean CNL genes nested into four clades consistent to their orthologs in Arabidopsis. Gene clustering analysis revealed the presence of 41 gene clusters located on 13 different chromosomes. Analyses of the Ks-values and chromosomal positioning suggest duplication events occurring at varying timescales, and an extrapericentromeric positioning may have facilitated their rapid evolution. Each of the four CNL clades exhibited distinct patterns of gene expression. Phylogenetic analysis further supported the extrapericentromeric positioning effect on the divergence and retention of the CNL genes. The results are important for understanding the diversity and divergence of CNL genes in soybean, which would have implication in soybean crop improvement in future. PMID:25922568
CNL Disease Resistance Genes in Soybean and Their Evolutionary Divergence.
Nepal, Madhav P; Benson, Benjamin V
2015-01-01
Disease resistance genes (R-genes) encode proteins involved in detecting pathogen attack and activating downstream defense molecules. Recent availability of soybean genome sequences makes it possible to examine the diversity of gene families including disease-resistant genes. The objectives of this study were to identify coiled-coil NBS-LRR (= CNL) R-genes in soybean, infer their evolutionary relationships, and assess structural as well as functional divergence of the R-genes. Profile hidden Markov models were used for sequence identification and model-based maximum likelihood was used for phylogenetic analysis, and variation in chromosomal positioning, gene clustering, and functional divergence were assessed. We identified 188 soybean CNL genes nested into four clades consistent to their orthologs in Arabidopsis. Gene clustering analysis revealed the presence of 41 gene clusters located on 13 different chromosomes. Analyses of the K s-values and chromosomal positioning suggest duplication events occurring at varying timescales, and an extrapericentromeric positioning may have facilitated their rapid evolution. Each of the four CNL clades exhibited distinct patterns of gene expression. Phylogenetic analysis further supported the extrapericentromeric positioning effect on the divergence and retention of the CNL genes. The results are important for understanding the diversity and divergence of CNL genes in soybean, which would have implication in soybean crop improvement in future.
Horn, T; Chang, C A; Urdea, M S
1997-12-01
The divergent synthesis of bDNA structures is described. This new type of branched DNA contains one unique oligonucleotide, the primary sequence, covalently attached through a comb-like branching network to many identical copies of a different oligonucleotide, the secondary sequence. The bDNA comb molecules were assembled on a solid support using parameters optimized for bDNA synthesis. The chemistry was used to synthesize bDNA comb molecules containing 15 secondary sequences. The bDNA comb molecules were elaborated by enzymatic ligation into branched amplification multimers, large bDNA molecules (a total of 1068 nt) containing an average of 36 repeated DNA oligomer sequences, each capable of hybridizing specifically to an alkaline phosphatase-labeled oligonucleotide. The bDNA comb molecules were characterized by electrophoretic methods and by controlled cleavage at periodate-cleavable moieties incorporated during synthesis. The branched amplification multimers have been used as signal amplifiers in nucleic acid quantification assays for detection of viral infection. It is possible to detect as few as 50 molecules with bDNA technology.
Horn, T; Chang, C A; Urdea, M S
1997-01-01
The divergent synthesis of bDNA structures is described. This new type of branched DNA contains one unique oligonucleotide, the primary sequence, covalently attached through a comb-like branching network to many identical copies of a different oligonucleotide, the secondary sequence. The bDNA comb molecules were assembled on a solid support using parameters optimized for bDNA synthesis. The chemistry was used to synthesize bDNA comb molecules containing 15 secondary sequences. The bDNA comb molecules were elaborated by enzymatic ligation into branched amplification multimers, large bDNA molecules (a total of 1068 nt) containing an average of 36 repeated DNA oligomer sequences, each capable of hybridizing specifically to an alkaline phosphatase-labeled oligonucleotide. The bDNA comb molecules were characterized by electrophoretic methods and by controlled cleavage at periodate-cleavable moieties incorporated during synthesis. The branched amplification multimers have been used as signal amplifiers in nucleic acid quantification assays for detection of viral infection. It is possible to detect as few as 50 molecules with bDNA technology. PMID:9365266
Young, Robert S
2016-07-01
Frequent evolutionary birth and death events have created a large quantity of biologically important, lineage-specific DNA within mammalian genomes. The birth and death of DNA sequences is so frequent that the total number of these insertions and deletions in the human population remains unknown, although there are differences between these groups, e.g. transposable elements contribute predominantly to sequence insertion. Functional turnover - where the activity of a locus is specific to one lineage, but the underlying DNA remains conserved - can also drive birth and death. However, this does not appear to be a major driver of divergent transcriptional regulation. Both sequence and functional turnover have contributed to the birth and death of thousands of functional promoters in the human and mouse genomes. These findings reveal the pervasive nature of evolutionary birth and death and suggest that lineage-specific regions may play an important but previously underappreciated role in human biology and disease. © 2016 The Authors BioEssays Published by WILEY Periodicals, Inc.
PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data
NASA Astrophysics Data System (ADS)
Deneke, Carlus; Rentzsch, Robert; Renard, Bernhard Y.
2017-01-01
The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key challenge for microbial diagnostics. Current computational tools usually rely on sequence similarity and often fail to detect novel species when closely related genomes are unavailable or missing from the reference database. Here we present the machine learning based approach PaPrBaG (Pathogenicity Prediction for Bacterial Genomes). PaPrBaG overcomes genetic divergence by training on a wide range of species with known pathogenicity phenotype. To that end we compiled a comprehensive list of pathogenic and non-pathogenic bacteria with human host, using various genome metadata in conjunction with a rule-based protocol. A detailed comparative study reveals that PaPrBaG has several advantages over sequence similarity approaches. Most importantly, it always provides a prediction whereas other approaches discard a large number of sequencing reads with low similarity to currently known reference genomes. Furthermore, PaPrBaG remains reliable even at very low genomic coverages. CombiningPaPrBaG with existing approaches further improves prediction results.
A unique chromatin complex occupies young α-satellite arrays of human centromeres
Henikoff, Jorja G.; Thakur, Jitendra; Kasinathan, Sivakanthan; Henikoff, Steven
2015-01-01
The intractability of homogeneous α-satellite arrays has impeded understanding of human centromeres. Artificial centromeres are produced from higher-order repeats (HORs) present at centromere edges, although the exact sequences and chromatin conformations of centromere cores remain unknown. We use high-resolution chromatin immunoprecipitation (ChIP) of centromere components followed by clustering of sequence data as an unbiased approach to identify functional centromere sequences. We find that specific dimeric α-satellite units shared by multiple individuals dominate functional human centromeres. We identify two recently homogenized α-satellite dimers that are occupied by precisely positioned CENP-A (cenH3) nucleosomes with two ~100–base pair (bp) DNA wraps in tandem separated by a CENP-B/CENP-C–containing linker, whereas pericentromeric HORs show diffuse positioning. Precise positioning is largely maintained, whereas abundance decreases exponentially with divergence, which suggests that young α-satellite dimers with paired ~100-bp particles mediate evolution of functional human centromeres. Our unbiased strategy for identifying functional centromeric sequences should be generally applicable to tandem repeat arrays that dominate the centromeres of most eukaryotes. PMID:25927077
Functionally conserved enhancers with divergent sequences in distant vertebrates
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Song; Oksenberg, Nir; Takayama, Sachiko
To examine the contributions of sequence and function conservation in the evolution of enhancers, we systematically identified enhancers whose sequences are not conserved among distant groups of vertebrate species, but have homologous function and are likely to be derived from a common ancestral sequence. In conclusion, our approach combined comparative genomics and epigenomics to identify potential enhancer sequences in the genomes of three groups of distantly related vertebrate species.
Functionally conserved enhancers with divergent sequences in distant vertebrates
Yang, Song; Oksenberg, Nir; Takayama, Sachiko; ...
2015-10-30
To examine the contributions of sequence and function conservation in the evolution of enhancers, we systematically identified enhancers whose sequences are not conserved among distant groups of vertebrate species, but have homologous function and are likely to be derived from a common ancestral sequence. In conclusion, our approach combined comparative genomics and epigenomics to identify potential enhancer sequences in the genomes of three groups of distantly related vertebrate species.
Gutierrez-Gonzalez, Juan J; Garvin, David F
2016-11-01
Vitamin E is essential for humans and thus must be a component of a healthy diet. Among the cereal grains, hexaploid oats (Avena sativa L.) have high vitamin E content. To date, no gene sequences in the vitamin E biosynthesis pathway have been reported for oats. Using deep sequencing and orthology-guided assembly, coding sequences of genes for each step in vitamin E synthesis in oats were reconstructed, including resolution of the sequences of homeologs. Three homeologs, presumably representing each of the three oat subgenomes, were identified for the main steps of the pathway. Partial sequences, likely representing pseudogenes, were recovered in some instances as well. Pairwise comparisons among homeologs revealed that two of the three putative subgenome-specific homeologs are almost identical for each gene. Synonymous substitution rates indicate the time of divergence of the two more similar subgenomes from the distinct one at 7.9-8.7 MYA, and a divergence between the similar subgenomes from a common ancestor 1.1 MYA. A new proposed evolutionary model for hexaploid oat formation is discussed. Homeolog-specific gene expression was quantified during oat seed development and compared with vitamin E accumulation. Homeolog expression largely appears to be similar for most of genes; however, for some genes, homoeolog-specific transcriptional bias was observed. The expression of HPPD, as well as certain homoeologs of VTE2 and VTE4, is highly correlated with seed vitamin E accumulation. Our findings expand our understanding of oat genome evolution and will assist efforts to modify vitamin E content and composition in oats. Published 2016. This article is a U.S. Government work and is in the public domain in the USA. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Sun, Cheng; Wyngaard, Grace; Walton, D Brian; Wichman, Holly A; Mueller, Rachel Lockridge
2014-03-11
Chromatin diminution is the programmed deletion of DNA from presomatic cell or nuclear lineages during development, producing single organisms that contain two different nuclear genomes. Phylogenetically diverse taxa undergo chromatin diminution--some ciliates, nematodes, copepods, and vertebrates. In cyclopoid copepods, chromatin diminution occurs in taxa with massively expanded germline genomes; depending on species, germline genome sizes range from 15 - 75 Gb, 12-74 Gb of which are lost from pre-somatic cell lineages at germline--soma differentiation. This is more than an order of magnitude more sequence than is lost from other taxa. To date, the sequences excised from copepods have not been analyzed using large-scale genomic datasets, and the processes underlying germline genomic gigantism in this clade, as well as the functional significance of chromatin diminution, have remained unknown. Here, we used high-throughput genomic sequencing and qPCR to characterize the germline and somatic genomes of Mesocyclops edax, a freshwater cyclopoid copepod with a germline genome of ~15 Gb and a somatic genome of ~3 Gb. We show that most of the excised DNA consists of repetitive sequences that are either 1) verifiable transposable elements (TEs), or 2) non-simple repeats of likely TE origin. Repeat elements in both genomes are skewed towards younger (i.e. less divergent) elements. Excised DNA is a non-random sample of the germline repeat element landscape; younger elements, and high frequency DNA transposons and LINEs, are disproportionately eliminated from the somatic genome. Our results suggest that germline genome expansion in M. edax reflects explosive repeat element proliferation, and that billions of base pairs of such repeats are deleted from the somatic genome every generation. Thus, we hypothesize that chromatin diminution is a mechanism that controls repeat element load, and that this load can evolve to be divergent between tissue types within single organisms.
Buhler, Stéphane; Sanchez-Mazas, Alicia
2011-01-01
Molecular differences between HLA alleles vary up to 57 nucleotides within the peptide binding coding region of human Major Histocompatibility Complex (MHC) genes, but it is still unclear whether this variation results from a stochastic process or from selective constraints related to functional differences among HLA molecules. Although HLA alleles are generally treated as equidistant molecular units in population genetic studies, DNA sequence diversity among populations is also crucial to interpret the observed HLA polymorphism. In this study, we used a large dataset of 2,062 DNA sequences defined for the different HLA alleles to analyze nucleotide diversity of seven HLA genes in 23,500 individuals of about 200 populations spread worldwide. We first analyzed the HLA molecular structure and diversity of these populations in relation to geographic variation and we further investigated possible departures from selective neutrality through Tajima's tests and mismatch distributions. All results were compared to those obtained by classical approaches applied to HLA allele frequencies. Our study shows that the global patterns of HLA nucleotide diversity among populations are significantly correlated to geography, although in some specific cases the molecular information reveals unexpected genetic relationships. At all loci except HLA-DPB1, populations have accumulated a high proportion of very divergent alleles, suggesting an advantage of heterozygotes expressing molecularly distant HLA molecules (asymmetric overdominant selection model). However, both different intensities of selection and unequal levels of gene conversion may explain the heterogeneous mismatch distributions observed among the loci. Also, distinctive patterns of sequence divergence observed at the HLA-DPB1 locus suggest current neutrality but old selective pressures on this gene. We conclude that HLA DNA sequences advantageously complement HLA allele frequencies as a source of data used to explore the genetic history of human populations, and that their analysis allows a more thorough investigation of human MHC molecular evolution. PMID:21408106
2014-01-01
Background Chromatin diminution is the programmed deletion of DNA from presomatic cell or nuclear lineages during development, producing single organisms that contain two different nuclear genomes. Phylogenetically diverse taxa undergo chromatin diminution — some ciliates, nematodes, copepods, and vertebrates. In cyclopoid copepods, chromatin diminution occurs in taxa with massively expanded germline genomes; depending on species, germline genome sizes range from 15 – 75 Gb, 12–74 Gb of which are lost from pre-somatic cell lineages at germline – soma differentiation. This is more than an order of magnitude more sequence than is lost from other taxa. To date, the sequences excised from copepods have not been analyzed using large-scale genomic datasets, and the processes underlying germline genomic gigantism in this clade, as well as the functional significance of chromatin diminution, have remained unknown. Results Here, we used high-throughput genomic sequencing and qPCR to characterize the germline and somatic genomes of Mesocyclops edax, a freshwater cyclopoid copepod with a germline genome of ~15 Gb and a somatic genome of ~3 Gb. We show that most of the excised DNA consists of repetitive sequences that are either 1) verifiable transposable elements (TEs), or 2) non-simple repeats of likely TE origin. Repeat elements in both genomes are skewed towards younger (i.e. less divergent) elements. Excised DNA is a non-random sample of the germline repeat element landscape; younger elements, and high frequency DNA transposons and LINEs, are disproportionately eliminated from the somatic genome. Conclusions Our results suggest that germline genome expansion in M. edax reflects explosive repeat element proliferation, and that billions of base pairs of such repeats are deleted from the somatic genome every generation. Thus, we hypothesize that chromatin diminution is a mechanism that controls repeat element load, and that this load can evolve to be divergent between tissue types within single organisms. PMID:24618421
Horn, Susanne; Durka, Walter; Wolf, Ronny; Ermala, Aslak; Stubbe, Annegret; Stubbe, Michael; Hofreiter, Michael
2011-01-01
Background Beavers are one of the largest and ecologically most distinct rodent species. Little is known about their evolution and even their closest phylogenetic relatives have not yet been identified with certainty. Similarly, little is known about the timing of divergence events within the genus Castor. Methodology/Principal Findings We sequenced complete mitochondrial genomes from both extant beaver species and used these sequences to place beavers in the phylogenetic tree of rodents and date their divergence from other rodents as well as the divergence events within the genus Castor. Our analyses support the phylogenetic position of beavers as a sister lineage to the scaly tailed squirrel Anomalurus within the mouse related clade. Molecular dating places the divergence time of the lineages leading to beavers and Anomalurus as early as around 54 million years ago (mya). The living beaver species, Castor canadensis from North America and Castor fiber from Eurasia, although similar in appearance, appear to have diverged from a common ancestor more than seven mya. This result is consistent with the hypothesis that a migration of Castor from Eurasia to North America as early as 7.5 mya could have initiated their speciation. We date the common ancestor of the extant Eurasian beaver relict populations to around 210,000 years ago, much earlier than previously thought. Finally, the substitution rate of Castor mitochondrial DNA is considerably lower than that of other rodents. We found evidence that this is correlated with the longer life span of beavers compared to other rodents. Conclusions/Significance A phylogenetic analysis of mitochondrial genome sequences suggests a sister-group relationship between Castor and Anomalurus, and allows molecular dating of species divergence in congruence with paleontological data. The implementation of a relaxed molecular clock enabled us to estimate mitochondrial substitution rates and to evaluate the effect of life history traits on it. PMID:21307956
Field, Mark C.; Adung’a, Vincent; Obado, Samson; Chait, Brian T.; Rout, Michael P.
2014-01-01
SUMMERY Trypanosomatids represent the causative agents of major diseases in humans, livestock and plants, with inevitable suffering and economic hardship as a result. They are also evolutionarily highly divergent organisms, and the many unique aspects of trypanosome biology provide opportunities in terms of identification of drug targets, the challenge of exploiting these putative targets, and at the same time significant scope for exploration of novel and divergent cell biology. We can estimate from genome sequences that the degree of divergence of trypanosomes from animals and fungi is extreme, with perhaps one third to one half of predicted trypanosome proteins having no known function based on homology or recognizable protein domains/architecture. Two highly important aspects of trypanosome biology are the flagellar pocket and the nuclear envelope, where in silico analysis clearly suggests great potential divergence in the proteome. The flagellar pocket is the sole site of endo- and exocytosis in trypanosomes and plays important roles in immune evasion via variant surface glycoprotein (VSG) trafficking and providing a location for sequestration of various invariant receptors. The trypanosome nuclear envelope has been largely unexplored, but by analogy with higher eukaryotes, roles in the regulation of chromatin and most significantly, in controlling VSG gene expression are expected. Here we discuss recent successful proteomics-based approaches towards characterization of the nuclear envelope and the endocytic apparatus, the identification of conserved and novel trypanosomatid-specific features, and the implications of these findings. PMID:22309600
Fernandes, Noemi M; Vizzoni, Vinicius F; Borges, Bárbara do N; A G Soares, Carlos; Silva-Neto, Inácio D da; S Paiva, Thiago da
2018-04-18
The odontostomatids are among the least studied ciliates, possibly due to their small sizes, restriction to anaerobic environments and difficulty in culturing. Consequently, their phylogenetic affinities to other ciliate taxa are still poorly understood. In the present study, we analyzed newly obtained ribosomal gene sequences of the odontostomatids Discomorphella pedroeneasi and Saprodinium dentatum, together with sequences from the literature, including Epalxella antiquorum and a large assemblage of ciliate sequences representing the major recognized classes. The results show that D. pedroeneasi and S. dentatum form a deep-diverging branch related to metopid and clevelandellid armophoreans, corroborating the old literature. However E. antiquorum clustered with the morphologically discrepant plagiopylids, indicating that either the complex odontostomatid body architecture evolved convergently, or the positioning of E. antiquorum as a plagiopylid is artifactual. A new ciliate class, Odontostomatea n. cl., is proposed based on molecular analyses and comparative morphology of odontostomatids with related taxa. Copyright © 2018. Published by Elsevier Inc.
Complete genome sequence of a divergent strain of Japanese yam mosaic virus from China
USDA-ARS?s Scientific Manuscript database
A novel strain of Japanese yam mosaic virus (JYMV-CN) was identified in a yam plant with foliar mottle symptoms in China. The complete genomic sequence of JYMV-CN was determined. Its genomic sequence of 9701 nucleotides encodes a polyprotein of 3247 amino acids. Its organization was virtually identi...
DNA barcoding for molecular identification of Demodex based on mitochondrial genes.
Hu, Li; Yang, YuanJun; Zhao, YaE; Niu, DongLing; Yang, Rui; Wang, RuiLing; Lu, Zhaohui; Li, XiaoQi
2017-12-01
There has been no widely accepted DNA barcode for species identification of Demodex. In this study, we attempted to solve this issue. First, mitochondrial cox1-5' and 12S gene fragments of Demodex folloculorum, D. brevis, D. canis, and D. caprae were amplified, cloned, and sequenced for the first time; intra/interspecific divergences were computed and phylogenetic trees were reconstructed. Then, divergence frequency distribution plots of those two gene fragments were drawn together with mtDNA cox1-middle region and 16S obtained in previous studies. Finally, their identification efficiency was evaluated by comparing barcoding gap. Results indicated that 12S had the higher identification efficiency. Specifically, for cox1-5' region of the four Demodex species, intraspecific divergences were less than 2.0%, and interspecific divergences were 21.1-31.0%; for 12S, intraspecific divergences were less than 1.4%, and interspecific divergences were 20.8-26.9%. The phylogenetic trees demonstrated that the four Demodex species clustered separately, and divergence frequency distribution plot showed that the largest intraspecific divergence of 12S (1.4%) was less than cox1-5' region (2.0%), cox1-middle region (3.1%), and 16S (2.8%). The barcoding gap of 12S was 19.4%, larger than cox1-5' region (19.1%), cox1-middle region (11.3%), and 16S (13.0%); the interspecific divergence span of 12S was 6.2%, smaller than cox1-5' region (10.0%), cox1-middle region (14.1%), and 16S (11.4%). Moreover, 12S has a moderate length (517 bp) for sequencing at once. Therefore, we proposed mtDNA 12S was more suitable than cox1 and 16S to be a DNA barcode for classification and identification of Demodex at lower category level.
Rebelling for a Reason: Protein Structural “Outliers”
Arumugam, Gandhimathi; Nair, Anu G.; Hariharaputran, Sridhar; Ramanathan, Sowdhamini
2013-01-01
Analysis of structural variation in domain superfamilies can reveal constraints in protein evolution which aids protein structure prediction and classification. Structure-based sequence alignment of distantly related proteins, organized in PASS2 database, provides clues about structurally conserved regions among different functional families. Some superfamily members show large structural differences which are functionally relevant. This paper analyses the impact of structural divergence on function for multi-member superfamilies, selected from the PASS2 superfamily alignment database. Functional annotations within superfamilies, with structural outliers or ‘rebels’, are discussed in the context of structural variations. Overall, these data reinforce the idea that functional similarities cannot be extrapolated from mere structural conservation. The implication for fold-function prediction is that the functional annotations can only be inherited with very careful consideration, especially at low sequence identities. PMID:24073209
A Draft Sequence of the Neandertal Genome
Green, Richard E.; Li, Heng; Zhai, Weiwei; Fritz, Markus Hsi-Yang; Hansen, Nancy F.; Durand, Eric Y.; Malaspinas, Anna-Sapfo; Jensen, Jeffrey D.; Marques-Bonet, Tomas; Alkan, Can; Prüfer, Kay; Meyer, Matthias; Burbano, Hernán A.; Good, Jeffrey M.; Schultz, Rigo; Aximu-Petri, Ayinuer; Butthof, Anne; Höber, Barbara; Höffner, Barbara; Siegemund, Madlen; Weihmann, Antje; Nusbaum, Chad; Lander, Eric S.; Russ, Carsten; Novod, Nathaniel; Affourtit, Jason; Egholm, Michael; Verna, Christine; Rudan, Pavao; Brajkovic, Dejana; Kucan, Željko; Gušic, Ivan; Doronichev, Vladimir B.; Golovanova, Liubov V.; Lalueza-Fox, Carles; de la Rasilla, Marco; Fortea, Javier; Rosas, Antonio; Schmitz, Ralf W.; Johnson, Philip L. F.; Eichler, Evan E.; Falush, Daniel; Birney, Ewan; Mullikin, James C.; Slatkin, Montgomery; Nielsen, Rasmus; Kelso, Janet; Lachmann, Michael; Reich, David; Pääbo, Svante
2016-01-01
Neandertals, the closest evolutionary relatives of present-day humans, lived in large parts of Europe and western Asia before disappearing 30,000 years ago. We present a draft sequence of the Neandertal genome composed of more than 4 billion nucleotides from three individuals. Comparisons of the Neandertal genome to the genomes of five present-day humans from different parts of the world identify a number of genomic regions that may have been affected by positive selection in ancestral modern humans, including genes involved in metabolism and in cognitive and skeletal development. We show that Neandertals shared more genetic variants with present-day humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other. PMID:20448178
Keskin, Emre; Atar, Hasan Huseyin
2012-04-01
Mitochondrial DNA sequence variation in 655 bpfragments of the cytochrome oxidase c subunit I gene, known as the DNA barcode, of European anchovy (Engraulis encrasicolus) was evaluated by analyzing 1529 individuals representing 16 populations from the Black Sea, through the Marmara Sea and the Aegean Sea to the Mediterranean Sea. A total of 19 (2.9%) variable sites were found among individuals, and these defined 10 genetically diverged populations with an overall mean distance of 1.2%. The highest nucleotide divergence was found between samples of eastern Mediterranean and northern Aegean (2.2%). Evolutionary history analysis among 16 populations clustered the Mediterranean Sea clades in one main branch and the other clades in another branch. Diverging pattern of the European anchovy populations correlated with geographic dispersion supports the genetic structuring through the Black Sea-Marmara Sea-Aegean Sea-Mediterranean Sea quad.
Conservation and Divergence of Mediator Structure and Function: Insights from Plants.
Dolan, Whitney L; Chapple, Clint
2017-01-01
The Mediator complex is a large, multisubunit, transcription co-regulator that is conserved across eukaryotes. Studies of the Arabidopsis Mediator complex and its subunits have shown that it functions in nearly every aspect of plant development and fitness. In addition to revealing mechanisms of regulation of plant-specific pathways, studies of plant Mediator complexes have the potential to shed light on the conservation and divergence of Mediator structure and function across Kingdoms and plant lineages. The majority of insights into plant Mediator function have come from Arabidopsis because it is the only plant from which Mediator has been purified and from which an array of Mediator mutants have been isolated by forward and reverse genetics. So far, these studies indicate that, despite low sequence similarity between many orthologous subunits, the overall structure and function of Mediator is well conserved between Kingdoms. Several studies have also expanded our knowledge of Mediator to other plant species, opening avenues of investigation into the role of Mediator in plant adaptation and fitness.
Seeing chordate evolution through the Ciona genome sequence
Cañestro, Cristian; Bassham, Susan; Postlethwait, John H
2003-01-01
A draft sequence of the compact genome of the sea squirt Ciona intestinalis, a non-vertebrate chordate that diverged very early from other chordates, including vertebrates, illuminates how chordates originated and how vertebrate developmental innovations evolved. PMID:12620098
Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal
2008-01-01
Motivation: UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. Application: We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. Results: We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. Availability: A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request. Contact: lonshy@cs.huji.ac.il PMID:18586742
Akın, Ciğdem; Bilgin, C Can; Beerli, Peter; Westaway, Rob; Ohst, Torsten; Litvinchuk, Spartak N; Uzzell, Thomas; Bilgin, Metin; Hotz, Hansjürg; Guex, Gaston-Denis; Plötner, Jörg
2010-11-01
AIM: Our aims were to assess the phylogeographic patterns of genetic diversity in eastern Mediterranean water frogs and to estimate divergence times using different geological scenarios. We related divergence times to past geological events and discuss the relevance of our data for the systematics of eastern Mediterranean water frogs. LOCATION: The eastern Mediterranean region. METHODS: Genetic diversity and divergence were calculated using sequences of two protein-coding mitochondrial (mt) genes: ND2 (1038 bp, 119 sequences) and ND3 (340 bp, 612 sequences). Divergence times were estimated in a Bayesian framework under four geological scenarios representing alternative possible geological histories for the eastern Mediterranean. We then compared the different scenarios using Bayes factors and additional geological data. RESULTS: Extensive genetic diversity in mtDNA divides eastern Mediterranean water frogs into six main haplogroups (MHG). Three MHGs were identified on the Anatolian mainland; the most widespread MHG with the highest diversity is distributed from western Anatolia to the northern shore of the Caspian Sea, including the type locality of Pelophylax ridibundus. The other two Anatolian MHGs are restricted to south-eastern Turkey, occupying localities west and east of the Amanos mountain range. One of the remaining three MHGs is restricted to Cyprus; a second to the Levant; the third was found in the distribution area of European lake frogs (P. ridibundus group), including the Balkans. MAIN CONCLUSIONS: Based on geological evidence and estimates of genetic divergence we hypothesize that the water frogs of Cyprus have been isolated from the Anatolian mainland populations since the end of the Messinian salinity crisis (MSC), i.e. since c. 5.5-5.3 Ma, while our divergence time estimates indicate that the isolation of Crete from the mainland populations (Peloponnese, Anatolia) most likely pre-dates the MSC. The observed rates of divergence imply a time window of c. 1.6-1.1 million years for diversification of the largest Anatolian MHG; divergence between the two other Anatolian MHGs may have begun about 3.0 Ma, apparently as a result of uplift of the Amanos Mountains. Our mtDNA data suggest that the Anatolian water frogs and frogs from Cyprus represent several undescribed species.
Poortvliet, Marloes; Olsen, Jeanine L; Croll, Donald A; Bernardi, Giacomo; Newton, Kelly; Kollias, Spyros; O'Sullivan, John; Fernando, Daniel; Stevens, Guy; Galván Magaña, Felipe; Seret, Bernard; Wintner, Sabine; Hoarau, Galice
2015-02-01
Manta and devil rays are an iconic group of globally distributed pelagic filter feeders, yet their evolutionary history remains enigmatic. We employed next generation sequencing of mitogenomes for nine of the 11 recognized species and two outgroups; as well as additional Sanger sequencing of two mitochondrial and two nuclear genes in an extended taxon sampling set. Analysis of the mitogenome coding regions in a Maximum Likelihood and Bayesian framework provided a well-resolved phylogeny. The deepest divergences distinguished three clades with high support, one containing Manta birostris, Manta alfredi, Mobula tarapacana, Mobula japanica and Mobula mobular; one containing Mobula kuhlii, Mobula eregoodootenkee and Mobula thurstoni; and one containing Mobula munkiana, Mobula hypostoma and Mobula rochebrunei. Mobula remains paraphyletic with the inclusion of Manta, a result that is in agreement with previous studies based on molecular and morphological data. A fossil-calibrated Bayesian random local clock analysis suggests that mobulids diverged from Rhinoptera around 30 Mya. Subsequent divergences are characterized by long internodes followed by short bursts of speciation extending from an initial episode of divergence in the Early and Middle Miocene (19-17 Mya) to a second episode during the Pliocene and Pleistocene (3.6 Mya - recent). Estimates of divergence dates overlap significantly with periods of global warming, during which upwelling intensity - and related high primary productivity in upwelling regions - decreased markedly. These periods are hypothesized to have led to fragmentation and isolation of feeding regions leading to possible regional extinctions, as well as the promotion of allopatric speciation. The closely shared evolutionary history of mobulids in combination with ongoing threats from fisheries and climate change effects on upwelling and food supply, reinforces the case for greater protection of this charismatic family of pelagic filter feeders. Copyright © 2014 Elsevier Inc. All rights reserved.
Sex Chromosome Turnover Contributes to Genomic Divergence between Incipient Stickleback Species
Yoshida, Kohta; Makino, Takashi; Yamaguchi, Katsushi; Shigenobu, Shuji; Hasebe, Mitsuyasu; Kawata, Masakado; Kume, Manabu; Mori, Seiichi; Peichel, Catherine L.; Toyoda, Atsushi; Fujiyama, Asao; Kitano, Jun
2014-01-01
Sex chromosomes turn over rapidly in some taxonomic groups, where closely related species have different sex chromosomes. Although there are many examples of sex chromosome turnover, we know little about the functional roles of sex chromosome turnover in phenotypic diversification and genomic evolution. The sympatric pair of Japanese threespine stickleback (Gasterosteus aculeatus) provides an excellent system to address these questions: the Japan Sea species has a neo-sex chromosome system resulting from a fusion between an ancestral Y chromosome and an autosome, while the sympatric Pacific Ocean species has a simple XY sex chromosome system. Furthermore, previous quantitative trait locus (QTL) mapping demonstrated that the Japan Sea neo-X chromosome contributes to phenotypic divergence and reproductive isolation between these sympatric species. To investigate the genomic basis for the accumulation of genes important for speciation on the neo-X chromosome, we conducted whole genome sequencing of males and females of both the Japan Sea and the Pacific Ocean species. No substantial degeneration has yet occurred on the neo-Y chromosome, but the nucleotide sequence of the neo-X and the neo-Y has started to diverge, particularly at regions near the fusion. The neo-sex chromosomes also harbor an excess of genes with sex-biased expression. Furthermore, genes on the neo-X chromosome showed higher non-synonymous substitution rates than autosomal genes in the Japan Sea lineage. Genomic regions of higher sequence divergence between species, genes with divergent expression between species, and QTL for inter-species phenotypic differences were found not only at the regions near the fusion site, but also at other regions along the neo-X chromosome. Neo-sex chromosomes can therefore accumulate substitutions causing species differences even in the absence of substantial neo-Y degeneration. PMID:24625862
Jia, Yi; Huan, Jun; Buhr, Vincent; Zhang, Jintao; Carayannopoulos, Leonidas N
2009-01-01
Background Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the "twilight-" or "midnight-" zones where pair-wise sequence identities to known sequences fall below 25% and sequence-based functional annotations often fail. Results Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in "immunoevasins", proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. Conclusion We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty. PMID:19208148
Verwaaijen, Bart; Wibberg, Daniel; Nelkner, Johanna; Gordin, Miriam; Rupp, Oliver; Winkler, Anika; Bremges, Andreas; Blom, Jochen; Grosch, Rita; Pühler, Alfred; Schlüter, Andreas
2018-02-10
Lettuce (Lactuca sativa, L.) is an important annual plant of the family Asteraceae (Compositae). The commercial lettuce cultivar Tizian has been used in various scientific studies investigating the interaction of the plant with phytopathogens or biological control agents. Here, we present the de novo draft genome sequencing and gene prediction for this specific cultivar derived from transcriptome sequence data. The assembled scaffolds amount to a size of 2.22 Gb. Based on RNAseq data, 31,112 transcript isoforms were identified. Functional predictions for these transcripts were determined within the GenDBE annotation platform. Comparison with the cv. Salinas reference genome revealed a high degree of sequence similarity on genome and transcriptome levels, with an average amino acid identity of 99%. Furthermore, it was observed that two large regions are either missing or are highly divergent within the cv. Tizian genome compared to cv. Salinas. One of these regions covers the major resistance complex 1 region of cv. Salinas. The cv. Tizian draft genome sequence provides a valuable resource for future functional and transcriptome analyses focused on this lettuce cultivar. Copyright © 2017 Elsevier B.V. All rights reserved.
A Phylogenomic Assessment of Ancient Polyploidy and Genome Evolution across the Poales
McKain, Michael R.; Tang, Haibao; McNeal, Joel R.; Ayyampalayam, Saravanaraj; Davis, Jerrold I.; dePamphilis, Claude W.; Givnish, Thomas J.; Pires, J. Chris; Stevenson, Dennis Wm.; Leebens-Mack, James H.
2016-01-01
Comparisons of flowering plant genomes reveal multiple rounds of ancient polyploidy characterized by large intragenomic syntenic blocks. Three such whole-genome duplication (WGD) events, designated as rho (ρ), sigma (σ), and tau (τ), have been identified in the genomes of cereal grasses. Precise dating of these WGD events is necessary to investigate how they have influenced diversification rates, evolutionary innovations, and genomic characteristics such as the GC profile of protein-coding sequences. The timing of these events has remained uncertain due to the paucity of monocot genome sequence data outside the grass family (Poaceae). Phylogenomic analysis of protein-coding genes from sequenced genomes and transcriptome assemblies from 35 species, including representatives of all families within the Poales, has resolved the timing of rho and sigma relative to speciation events and placed tau prior to divergence of Asparagales and the commelinids but after divergence with eudicots. Examination of gene family phylogenies indicates that rho occurred just prior to the diversification of Poaceae and sigma occurred before early diversification of Poales lineages but after the Poales-commelinid split. Additional lineage-specific WGD events were identified on the basis of the transcriptome data. Gene families exhibiting high GC content are underrepresented among those with duplicate genes that persisted following these genome duplications. However, genome duplications had little overall influence on lineage-specific changes in the GC content of coding genes. Improved resolution of the timing of WGD events in monocot history provides evidence for the influence of polyploidization on functional evolution and species diversification. PMID:26988252
Parente, Daniel J; Ray, J Christian J; Swint-Kruse, Liskin
2015-12-01
As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for coevolution between pairs of positions. Coevolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of coevolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly-used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6-bisphosphate aldolase protein families as models. Calculations used unthresholded coevolution scores from which column-specific properties such as sequence entropy and random noise were subtracted; "central" positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise coevolution scores: instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints-detectable by divergent algorithms--that occur at key protein locations. Finally, we discuss the fact that multiple patterns coexist in evolutionary data that, together, give rise to emergent protein functions. © 2015 Wiley Periodicals, Inc.
Kakioka, Ryo; Kokita, Tomoyuki; Kumada, Hiroki; Watanabe, Katsutoshi; Okuda, Noboru
2015-08-01
Evolution of ecomorphologically relevant traits such as body shapes is important to colonize and persist in a novel environment. Habitat-related adaptive divergence of these traits is therefore common among animals. We studied the genomic architecture of habitat-related divergence in the body shape of Gnathopogon fishes, a novel example of lake-stream ecomorphological divergence, and tested for the action of directional selection on body shape differentiation. Compared to stream-dwelling Gnathopogon elongatus, the sister species Gnathopogon caerulescens, exclusively inhabiting a large ancient lake, had an elongated body, increased proportion of the caudal region and small head, which would be advantageous in the limnetic environment. Using an F2 interspecific cross between the two Gnathopogon species (195 individuals), quantitative trait locus (QTL) analysis with geometric morphometric quantification of body shape and restriction-site associated DNA sequencing-derived markers (1622 loci) identified 26 significant QTLs associated with the interspecific differences of body shape-related traits. These QTLs had small to moderate effects, supporting polygenic inheritance of the body shape-related traits. Each QTL was mostly located on different genomic regions, while colocalized QTLs were detected for some ecomorphologically relevant traits that are proxy of body and caudal peduncle depths, suggesting different degree of modularity among traits. The directions of the body shape QTLs were mostly consistent with the interspecific difference, and QTL sign test suggested a genetic signature of directional selection in the body shape divergence. Thus, we successfully elucidated the genomic architecture underlying the adaptive changes of the quantitative and complex morphological trait in a novel system. © 2015 John Wiley & Sons Ltd.
Ferchaud, Anne-Laure; Hansen, Michael M
2016-01-01
Heterogeneous genomic divergence between populations may reflect selection, but should also be seen in conjunction with gene flow and drift, particularly population bottlenecks. Marine and freshwater three-spine stickleback (Gasterosteus aculeatus) populations often exhibit different lateral armour plate morphs. Moreover, strikingly parallel genomic footprints across different marine-freshwater population pairs are interpreted as parallel evolution and gene reuse. Nevertheless, in some geographic regions like the North Sea and Baltic Sea, different patterns are observed. Freshwater populations in coastal regions are often dominated by marine morphs, suggesting that gene flow overwhelms selection, and genomic parallelism may also be less pronounced. We used RAD sequencing for analysing 28 888 SNPs in two marine and seven freshwater populations in Denmark, Europe. Freshwater populations represented a variety of environments: river populations accessible to gene flow from marine sticklebacks and large and small isolated lakes with and without fish predators. Sticklebacks in an accessible river environment showed minimal morphological and genomewide divergence from marine populations, supporting the hypothesis of gene flow overriding selection. Allele frequency spectra suggested bottlenecks in all freshwater populations, and particularly two small lake populations. However, genomic footprints ascribed to selection could nevertheless be identified. No genomic regions were consistent freshwater-marine outliers, and parallelism was much lower than in other comparable studies. Two genomic regions previously described to be under divergent selection in freshwater and marine populations were outliers between different freshwater populations. We ascribe these patterns to stronger environmental heterogeneity among freshwater populations in our study as compared to most other studies, although the demographic history involving bottlenecks should also be considered in the interpretation of results. © 2015 John Wiley & Sons Ltd.
Lukoschek, V; Waycott, M; Keogh, J S
2008-07-01
Polymorphic microsatellites are widely considered more powerful for resolving population structure than mitochondrial DNA (mtDNA) markers, particularly for recently diverged lineages or geographically proximate populations. Weaker population subdivision for biparentally inherited nuclear markers than maternally inherited mtDNA may signal male-biased dispersal but can also be attributed to marker-specific evolutionary characteristics and sampling properties. We discriminated between these competing explanations with a population genetic study on olive sea snakes, Aipysurus laevis. A previous mtDNA study revealed strong regional population structure for A. laevis around northern Australia, where Pleistocene sea-level fluctuations have influenced the genetic signatures of shallow-water marine species. Divergences among phylogroups dated to the Late Pleistocene, suggesting recent range expansions by previously isolated matrilines. Fine-scale population structure within regions was, however, poorly resolved for mtDNA. In order to improve estimates of fine-scale genetic divergence and to compare population structure between nuclear and mtDNA, 354 olive sea snakes (previously sequenced for mtDNA) were genotyped for five microsatellite loci. F statistics and Bayesian multilocus genotype clustering analyses found similar regional population structure as mtDNA and, after standardizing microsatellite F statistics for high heterozygosities, regional divergence estimates were quantitatively congruent between marker classes. Over small spatial scales, however, microsatellites recovered almost no genetic structure and standardized F statistics were orders of magnitude smaller than for mtDNA. Three tests for male-biased dispersal were not significant, suggesting that recent demographic expansions to the typically large population sizes of A. laevis have prevented microsatellites from reaching mutation-drift equilibrium and local populations may still be diverging.
Hellberg, M E; Moy, G W; Vacquier, V D
2000-03-01
Male-specific proteins have increasingly been reported as targets of positive selection and are of special interest because of the role they may play in the evolution of reproductive isolation. We report the rapid interspecific divergence of cDNA encoding a major acrosomal protein of unknown function (TMAP) of sperm from five species of teguline gastropods. A mitochondrial DNA clock (calibrated by congeneric species divided by the Isthmus of Panama) estimates that these five species diverged 2-10 MYA. Inferred amino acid sequences reveal a propeptide that has diverged rapidly between species. The mature protein has diverged faster still due to high nonsynonymous substitution rates (> 25 nonsynonymous substitutions per site per 10(9) years). cDNA encoding the mature protein (89-100 residues) shows evidence of positive selection (Dn/Ds > 1) for 4 of 10 pairwise species comparisons. cDNA and predicted secondary-structure comparisons suggest that TMAP is neither orthologous nor paralogous to abalone lysin, and thus marks a second, phylogenetically independent, protein subject to strong positive selection in free-spawning marine gastropods. In addition, an internal repeat in one species (Tegula aureotincta) produces a duplicated cleavage site which results in two alternatively processed mature proteins differing by nine amino acid residues. Such alternative processing may provide a mechanism for introducing novel amino acid sequence variation at the amino-termini of proteins. Highly divergent TMAP N-termini from two other tegulines (Tegula regina and Norrisia norrisii) may have originated by such a mechanism.
Ward, Ben J; van Oosterhout, Cock
2016-03-01
HYBRIDCHECK is a software package to visualize the recombination signal in large DNA sequence data set, and it can be used to analyse recombination, genetic introgression, hybridization and horizontal gene transfer. It can scan large (multiple kb) contigs and whole-genome sequences of three or more individuals. HYBRIDCHECK is written in the r software for OS X, Linux and Windows operating systems, and it has a simple graphical user interface. In addition, the r code can be readily incorporated in scripts and analysis pipelines. HYBRIDCHECK implements several ABBA-BABA tests and visualizes the effects of hybridization and the resulting mosaic-like genome structure in high-density graphics. The package also reports the following: (i) the breakpoint positions, (ii) the number of mutations in each introgressed block, (iii) the probability that the identified region is not caused by recombination and (iv) the estimated age of each recombination event. The divergence times between the donor and recombinant sequence are calculated using a JC, K80, F81, HKY or GTR correction, and the dating algorithm is exceedingly fast. By estimating the coalescence time of introgressed blocks, it is possible to distinguish between hybridization and incomplete lineage sorting. HYBRIDCHECK is libré software and it and its manual are free to download from http://ward9250.github.io/HybridCheck/. © 2015 John Wiley & Sons Ltd.
Yang, Yilong
2017-01-01
Abstract The subgenomic compositions of the octoploid (2n = 8× = 56) strawberry (Fragaria) species, including the economically important cultivated species Fragaria x ananassa, have been a topic of long-standing interest. Phylogenomic approaches utilizing next-generation sequencing technologies offer a new window into species relationships and the subgenomic compositions of polyploids. We have conducted a large-scale phylogenetic analysis of Fragaria (strawberry) species using the Fluidigm Access Array system and 454 sequencing platform. About 24 single-copy or low-copy nuclear genes distributed across the genome were amplified and sequenced from 96 genomic DNA samples representing 16 Fragaria species from diploid (2×) to decaploid (10×), including the most extensive sampling of octoploid taxa yet reported. Individual gene trees were constructed by different tree-building methods. Mosaic genomic structures of diploid Fragaria species consisting of sequences at different phylogenetic positions were observed. Our findings support the presence in octoploid species of genetic signatures from at least five diploid ancestors (F. vesca, F. iinumae, F. bucharica, F. viridis, and at least one additional allele contributor of unknown identity), and questions the extent to which distinct subgenomes are preserved over evolutionary time in the allopolyploid Fragaria species. In addition, our data support divergence between the two wild octoploid species, F. virginiana and F. chiloensis. PMID:29045639
Reference-guided assembly of four diverse Arabidopsis thaliana genomes
Schneeberger, Korbinian; Ossowski, Stephan; Ott, Felix; Klein, Juliane D.; Wang, Xi; Lanz, Christa; Smith, Lisa M.; Cao, Jun; Fitz, Joffrey; Warthmann, Norman; Henz, Stefan R.; Huson, Daniel H.; Weigel, Detlef
2011-01-01
We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html. PMID:21646520
Zapata, Luis; Ding, Jia; Willing, Eva-Maria; Hartwig, Benjamin; Bezdan, Daniela; Jiao, Wen-Biao; Patel, Vipul; Velikkakam James, Geo; Koornneef, Maarten; Ossowski, Stephan; Schneeberger, Korbinian
2016-07-12
Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thaliana accessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of the A. thaliana Landsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ∼3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.
rpoB-Based Identification of Nonpigmented and Late-Pigmenting Rapidly Growing Mycobacteria
Adékambi, Toïdi; Colson, Philippe; Drancourt, Michel
2003-01-01
Nonpigmented and late-pigmenting rapidly growing mycobacteria (RGM) are increasingly isolated in clinical microbiology laboratories. Their accurate identification remains problematic because classification is labor intensive work and because new taxa are not often incorporated into classification databases. Also, 16S rRNA gene sequence analysis underestimates RGM diversity and does not distinguish between all taxa. We determined the complete nucleotide sequence of the rpoB gene, which encodes the bacterial β subunit of the RNA polymerase, for 20 RGM type strains. After using in-house software which analyzes and graphically represents variability stretches of 60 bp along the nucleotide sequence, our analysis focused on a 723-bp variable region exhibiting 83.9 to 97% interspecies similarity and 0 to 1.7% intraspecific divergence. Primer pair Myco-F-Myco-R was designed as a tool for both PCR amplification and sequencing of this region for molecular identification of RGM. This tool was used for identification of 63 RGM clinical isolates previously identified at the species level on the basis of phenotypic characteristics and by 16S rRNA gene sequence analysis. Of 63 clinical isolates, 59 (94%) exhibited <2% partial rpoB gene sequence divergence from 1 of 20 species under study and were regarded as correctly identified at the species level. Mycobacterium abscessus and Mycobacterium mucogenicum isolates were clearly distinguished from Mycobacterium chelonae; Mycobacterium mageritense isolates were clearly distinguished from “Mycobacterium houstonense.” Four isolates were not identified at the species level because they exhibited >3% partial rpoB gene sequence divergence from the corresponding type strain; they belonged to three taxa related to M. mucogenicum, Mycobacterium smegmatis, and Mycobacterium porcinum. For M. abscessus and M. mucogenicum, this partial sequence yielded a high genetic heterogeneity within the clinical isolates. We conclude that molecular identification by analysis of the 723-bp rpoB sequence is a rapid and accurate tool for identification of RGM. PMID:14662964
Webb, Kristen M; Rosenthal, Benjamin M
2011-01-01
The mitochondrial genome's non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been defined for 12 species and genotypes of parasites in the genus Trichinella, but its adequacy in representing the mitochondrial genome as a whole remains unclear, as the complete coding sequence has been characterized only for Trichinella spiralis. Here, we sought to comprehensively describe the extent and nature of divergence between the mitochondrial genomes of T. spiralis (which poses the most appreciable zoonotic risk owing to its capacity to establish persistent infections in domestic pigs) and Trichinella murrelli (which is the most prevalent species in North American wildlife hosts, but which poses relatively little risk to the safety of pork). Next generation sequencing methodologies and scaffold and de novo assembly strategies were employed. The entire protein-coding region was sequenced (13,917 bp), along with a portion of the highly repetitive non-coding region (1524 bp) of the mitochondrial genome of T. murrelli with a combined average read depth of 250 reads. The accuracy of base calling, estimated from coding region sequence was found to exceed 99.3%. Genome content and gene order was not found to be significantly different from that of T. spiralis. An overall inter-species sequence divergence of 9.5% was estimated. Significant variation was identified when the amount of variation between species at each gene is compared to the average amount of variation between species across the coding region. Next generation sequencing is a highly effective means to obtain previously unknown mitochondrial genome sequence. Particular to parasites, the extremely deep coverage achieved through this method allows for the detection of sequence heterogeneity between the multiple individuals that necessarily comprise such templates. Copyright © 2010 Elsevier B.V. All rights reserved.
Extraordinary Sequence Divergence at Tsga8, an X-linked Gene Involved in Mouse Spermiogenesis
Good, Jeffrey M.; Vanderpool, Dan; Smith, Kimberly L.; Nachman, Michael W.
2011-01-01
The X chromosome plays an important role in both adaptive evolution and speciation. We used a molecular evolutionary screen of X-linked genes potentially involved in reproductive isolation in mice to identify putative targets of recurrent positive selection. We then sequenced five very rapidly evolving genes within and between several closely related species of mice in the genus Mus. All five genes were involved in male reproduction and four of the genes showed evidence of recurrent positive selection. The most remarkable evolutionary patterns were found at Testis-specific gene a8 (Tsga8), a spermatogenesis-specific gene expressed during postmeiotic chromatin condensation and nuclear transformation. Tsga8 was characterized by extremely high levels of insertion–deletion variation of an alanine-rich repetitive motif in natural populations of Mus domesticus and M. musculus, differing in length from the reference mouse genome by up to 89 amino acids (27% of the total protein length). This population-level variation was coupled with striking divergence in protein sequence and length between closely related mouse species. Although no clear orthologs had previously been described for Tsga8 in other mammalian species, we have identified a highly divergent hypothetical gene on the rat X chromosome that shares clear orthology with the 5′ and 3′ ends of Tsga8. Further inspection of this ortholog verified that it is expressed in rat testis and shares remarkable similarity with mouse Tsga8 across several general features of the protein sequence despite no conservation of nucleotide sequence across over 60% of the rat-coding domain. Overall, Tsga8 appears to be one of the most rapidly evolving genes to have been described in rodents. We discuss the potential evolutionary causes and functional implications of this extraordinary divergence and the possible contribution of Tsga8 and the other four genes we examined to reproductive isolation in mice. PMID:21186189
Yao, Qiu-Yang; Xia, En-Hua; Liu, Fei-Hu; Gao, Li-Zhi
2015-02-15
WRKY transcription factors (TFs), one of the ten largest TF families in higher plants, play important roles in regulating plant development and resistance. To date, little is known about the WRKY TF family in Brassica oleracea. Recently, the completed genome sequence of cabbage (B. oleracea var. capitata) allows us to systematically analyze WRKY genes in this species. A total of 148 WRKY genes were characterized and classified into seven subgroups that belong to three major groups. Phylogenetic and synteny analyses revealed that the repertoire of cabbage WRKY genes was derived from a common ancestor shared with Arabidopsis thaliana. The B. oleracea WRKY genes were found to be preferentially retained after the whole-genome triplication (WGT) event in its recent ancestor, suggesting that the WGT event had largely contributed to a rapid expansion of the WRKY gene family in B. oleracea. The analysis of RNA-Seq data from various tissues (i.e., roots, stems, leaves, buds, flowers and siliques) revealed that most of the identified WRKY genes were positively expressed in cabbage, and a large portion of them exhibited patterns of differential and tissue-specific expression, demonstrating that these gene members might play essential roles in plant developmental processes. Comparative analysis of the expression level among duplicated genes showed that gene expression divergence was evidently presented among cabbage WRKY paralogs, indicating functional divergence of these duplicated WRKY genes. Copyright © 2014 Elsevier B.V. All rights reserved.
Three Divergent Subpopulations of the Malaria Parasite Plasmodium knowlesi
Lin, Lee C.; Rovie-Ryan, Jeffrine J.; Kadir, Khamisah A.; Anderios, Fread; Hisam, Shamilah; Sharma, Reuben S.K.; Singh, Balbir; Conway, David J.
2017-01-01
Multilocus microsatellite genotyping of Plasmodium knowlesi isolates previously indicated 2 divergent parasite subpopulations in humans on the island of Borneo, each associated with a different macaque reservoir host species. Geographic divergence was also apparent, and independent sequence data have indicated particularly deep divergence between parasites from mainland Southeast Asia and Borneo. To resolve the overall population structure, multilocus microsatellite genotyping was conducted on a new sample of 182 P. knowlesi infections (obtained from 134 humans and 48 wild macaques) from diverse areas of Malaysia, first analyzed separately and then in combination with previous data. All analyses confirmed 2 divergent clusters of human cases in Malaysian Borneo, associated with long-tailed macaques and pig-tailed macaques, and a third cluster in humans and most macaques in peninsular Malaysia. High levels of pairwise divergence between each of these sympatric and allopatric subpopulations have implications for the epidemiology and control of this zoonotic species. PMID:28322705
Gonzalez, P; Barroso, G; Labarère, J
1998-10-05
The Basidiomycota Agrocybe aegerita (Aa) mitochondrial cox1 gene (6790 nucleotides), encoding a protein of 527aa (58377Da), is split by four large subgroup IB introns possessing site-specific endonucleases assumed to be involved in intron mobility. When compared to other fungal COX1 proteins, the Aa protein is closely related to the COX1 one of the Basidiomycota Schizophyllum commune (Sc). This clade reveals a relationship with the studied Ascomycota ones, with the exception of Schizosaccharomyces pombe (Sp) which ranges in an out-group position compared with both higher fungi divisions. When comparison is extended to other kingdoms, fungal COX1 sequences are found to be more related to algae and plant ones (more than 57.5% aa similarity) than to animal sequences (53.6% aa similarity), contrasting with the previously established close relationship between fungi and animals, based on comparisons of nuclear genes. The four Aa cox1 introns are homologous to Ascomycota or algae cox1 introns sharing the same location within the exonic sequences. The percentages of identity of the intronic nucleotide sequences suggest a possible acquisition by lateral transfers of ancestral copies or of their derived sequences. These identities extend over the whole intronic sequences, arguing in favor of a transfer of the complete intron rather than a transfer limited to the encoded ORF. The intron i4 shares 74% of identity, at the nucleotidic level, with the Podospora anserina (Pa) intron i14, and up to 90.5% of aa similarity between the encoded proteins, i.e. the highest values reported to date between introns of two phylogenetically distant species. This low divergence argues for a recent lateral transfer between the two species. On the contrary, the low sequence identities (below 36%) observed between Aa i1 and the homologous Sp i1 or Prototheca wickeramii (Pw) i1 suggest a long evolution time after the separation of these sequences. The introns i2 and i3 possessed intermediate percentages of identity with their homologous Ascomycota introns. This is the first report of the complete nucleotide sequence and molecular organization of a mitochondrial cox1 gene of any member of the Basidiomycota division.
USDA-ARS?s Scientific Manuscript database
The Noctuid moth, Spodoptera frugiperda (the fall armyworm), is endemic to the Western Hemisphere and appears to be undergoing sympatric speciation to produce two subpopulations that differ in their choice of host plants. The diverging “rice strain” and “corn strain” are morphologically indistinguis...
Dynamics of actin evolution in dinoflagellates.
Kim, Sunju; Bachvaroff, Tsvetan R; Handy, Sara M; Delwiche, Charles F
2011-04-01
Dinoflagellates have unique nuclei and intriguing genome characteristics with very high DNA content making complete genome sequencing difficult. In dinoflagellates, many genes are found in multicopy gene families, but the processes involved in the establishment and maintenance of these gene families are poorly understood. Understanding the dynamics of gene family evolution in dinoflagellates requires comparisons at different evolutionary scales. Studies of closely related species provide fine-scale information relative to species divergence, whereas comparisons of more distantly related species provides broad context. We selected the actin gene family as a highly expressed conserved gene previously studied in dinoflagellates. Of the 142 sequences determined in this study, 103 were from the two closely related species, Dinophysis acuminata and D. caudata, including full length and partial cDNA sequences as well as partial genomic amplicons. For these two Dinophysis species, at least three types of sequences could be identified. Most copies (79%) were relatively similar and in nucleotide trees, the sequences formed two bushy clades corresponding to the two species. In comparisons within species, only eight to ten nucleotide differences were found between these copies. The two remaining types formed clades containing sequences from both species. One type included the most similar sequences in between-species comparisons with as few as 12 nucleotide differences between species. The second type included the most divergent sequences in comparisons between and within species with up to 93 nucleotide differences between sequences. In all the sequences, most variation occurred in synonymous sites or the 5' UnTranslated Region (UTR), although there was still limited amino acid variation between most sequences. Several potential pseudogenes were found (approximately 10% of all sequences depending on species) with incomplete open reading frames due to frameshifts or early stop codons. Overall, variation in the actin gene family fits best with the "birth and death" model of evolution based on recent duplications, pseudogenes, and incomplete lineage sorting. Divergence between species was similar to variation within species, so that actin may be too conserved to be useful for phylogenetic estimation of closely related species.
Eirin, Maria E; Dilernia, Dario A; Berini, Carolina A; Jones, Leandro R; Pando, Maria A; Biglione, Mirna M
2008-10-01
HTLV-1 Cosmopolitan subtype Transcontinental subgroup A has been described among aboriginal communities from the northwest endemic area of Argentina. Moreover, Transcontinental subgroup A and the Japanese subgroup B were reported among blood donors from the nonendemic central region of the country. We carried out the first HTLV-1 phylogenetic study in individuals residing in Buenos Aires capital city. Phylogenetic analysis performed on the LTR region showed that all 44 new strains clustered within the Cosmopolitan subtype, with 42 (95.4%) belonging to Transcontinental subgroup A. Of them, 20 (45.5%) strains grouped in the large Latin American cluster and 4 (9.1%) in the small Latin American cluster. The majority of them belonged to individuals of nonblack origin, grouped with Amerindian strains. Three (6.8%) were closely related to South African references and two monophyletic clusters including only HIV/HTLV-1 coinfected individuals were observed. Interestingly, two (4.5%) new sequences (divergent strains) branched off from all five known Cosmopolitan subgroups in a well-supported clade. In summary, these findings show that HTLV-1 Cosmopolitan subtype Transcontinental subgroup A is infecting residents of Buenos Aires, a nonendemic area of Argentina, and confirm the introduction of divergent strains in the country.
Shi, Lei; Hu, Enzhi; Wang, Zhenbo; Liu, Jiewei; Li, Jin; Li, Ming; Chen, Hua; Yu, Chunshui; Jiang, Tianzi; Su, Bing
2017-02-01
Human evolution is marked by a continued enlargement of the brain. Previous studies on human brain evolution focused on identifying sequence divergences of brain size regulating genes between humans and nonhuman primates. However, the evolutionary pattern of the brain size regulating genes during recent human evolution is largely unknown. We conducted a comprehensive analysis of the brain size regulating gene CASC5 and found that in recent human evolution, CASC5 has accumulated many modern human specific amino acid changes, including two fixed changes and six polymorphic changes. Among human populations, 4 of the 6 amino acid polymorphic sites have high frequencies of derived alleles in East Asians, but are rare in Europeans and Africans. We proved that this between-population allelic divergence was caused by regional Darwinian positive selection in East Asians. Further analysis of brain image data of Han Chinese showed significant associations of the amino acid polymorphic sites with gray matter volume. Hence, CASC5 may contribute to the morphological and structural changes of the human brain during recent evolution. The observed between-population divergence of CASC5 variants was driven by natural selection that tends to favor a larger gray matter volume in East Asians.
Lerner, Heather R L; Meyer, Matthias; James, Helen F; Hofreiter, Michael; Fleischer, Robert C
2011-11-08
Evolutionary theory has gained tremendous insight from studies of adaptive radiations. High rates of speciation, morphological divergence, and hybridization, combined with low sequence variability, however, have prevented phylogenetic reconstruction for many radiations. The Hawaiian honeycreepers are an exceptional adaptive radiation, with high phenotypic diversity and speciation that occurred within the geologically constrained setting of the Hawaiian Islands. Here we analyze a new data set of 13 nuclear loci and pyrosequencing of mitochondrial genomes that resolves the Hawaiian honeycreeper phylogeny. We show that they are a sister taxon to Eurasian rosefinches (Carpodacus) and probably came to Hawaii from Asia. We use island ages to calibrate DNA substitution rates, which vary substantially among gene regions, and calculate divergence times, showing that the radiation began roughly when the oldest of the current large Hawaiian Islands (Kauai and Niihau) formed, ~5.7 million years ago (mya). We show that most of the lineages that gave rise to distinctive morphologies diverged after Oahu emerged (4.0-3.7 mya) but before the formation of Maui and adjacent islands (2.4-1.9 mya). Thus, the formation of Oahu, and subsequent cycles of colonization and speciation between Kauai and Oahu, played key roles in generating the morphological diversity of the extant honeycreepers. Copyright © 2011 Elsevier Ltd. All rights reserved.
Evolution of Genome Size and Complexity in Pinus
Morse, Alison M.; Peterson, Daniel G.; Islam-Faridi, M. Nurul; Smith, Katherine E.; Magbanua, Zenaida; Garcia, Saul A.; Kubisiak, Thomas L.; Amerson, Henry V.; Carlson, John E.; Nelson, C. Dana; Davis, John M.
2009-01-01
Background Genome evolution in the gymnosperm lineage of seed plants has given rise to many of the most complex and largest plant genomes, however the elements involved are poorly understood. Methodology/Principal Findings Gymny is a previously undescribed retrotransposon family in Pinus that is related to Athila elements in Arabidopsis. Gymny elements are dispersed throughout the modern Pinus genome and occupy a physical space at least the size of the Arabidopsis thaliana genome. In contrast to previously described retroelements in Pinus, the Gymny family was amplified or introduced after the divergence of pine and spruce (Picea). If retrotransposon expansions are responsible for genome size differences within the Pinaceae, as they are in angiosperms, then they have yet to be identified. In contrast, molecular divergence of Gymny retrotransposons together with other families of retrotransposons can account for the large genome complexity of pines along with protein-coding genic DNA, as revealed by massively parallel DNA sequence analysis of Cot fractionated genomic DNA. Conclusions/Significance Most of the enormous genome complexity of pines can be explained by divergence of retrotransposons, however the elements responsible for genome size variation are yet to be identified. Genomic resources for Pinus including those reported here should assist in further defining whether and how the roles of retrotransposons differ in the evolution of angiosperm and gymnosperm genomes. PMID:19194510
Weininger, Arthur; Weininger, Susan
2015-01-01
The ability to identify the functional correlates of structural and sequence variation in proteins is a critical capability. We related structures of influenza A N10 and N11 proteins that have no established function to structures of proteins with known function by identifying spatially conserved atoms. We identified atoms with common distributed spatial occupancy in PDB structures of N10 protein, N11 protein, an influenza A neuraminidase, an influenza B neuraminidase, and a bacterial neuraminidase. By superposing these spatially conserved atoms, we aligned the structures and associated molecules. We report spatially and sequence invariant residues in the aligned structures. Spatially invariant residues in the N6 and influenza B neuraminidase active sites were found in previously unidentified spatially equivalent sites in the N10 and N11 proteins. We found the corresponding secondary and tertiary structures of the aligned proteins to be largely identical despite significant sequence divergence. We found structural precedent in known non-neuraminidase structures for residues exhibiting structural and sequence divergence in the aligned structures. In N10 protein, we identified staphylococcal enterotoxin I-like domains. In N11 protein, we identified hepatitis E E2S-like domains, SARS spike protein-like domains, and toxin components shared by alpha-bungarotoxin, staphylococcal enterotoxin I, anthrax lethal factor, clostridium botulinum neurotoxin, and clostridium tetanus toxin. The presence of active site components common to the N6, influenza B, and S. pneumoniae neuraminidases in the N10 and N11 proteins, combined with the absence of apparent neuraminidase function, suggests that the role of neuraminidases in H17N10 and H18N11 emerging influenza A viruses may have changed. The presentation of E2S-like, SARS spike protein-like, or toxin-like domains by the N10 and N11 proteins in these emerging viruses may indicate that H17N10 and H18N11 sialidase-facilitated cell entry has been supplemented or replaced by sialidase-independent receptor binding to an expanded cell population that may include neurons and T-cells. PMID:25706124
BLAST and FASTA similarity searching for multiple sequence alignment.
Pearson, William R
2014-01-01
BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.
Asamizu, Erika; Nakamura, Yasukazu; Sato, Shusei; Tabata, Satoshi
2004-02-01
To perform a comprehensive analysis of genes expressed in a model legume, Lotus japonicus, a total of 74472 3'-end expressed sequence tags (EST) were generated from cDNA libraries produced from six different organs. Clustering of sequences was performed with an identity criterion of 95% for 50 bases, and a total of 20457 non-redundant sequences, 8503 contigs and 11954 singletons were generated. EST sequence coverage was analyzed by using the annotated L. japonicus genomic sequence and 1093 of the 1889 predicted protein-encoding genes (57.9%) were hit by the EST sequence(s). Gene content was compared to several plant species. Among the 8503 contigs, 471 were identified as sequences conserved only in leguminous species and these included several disease resistance-related genes. This suggested that in legumes, these genes may have evolved specifically to resist pathogen attack. The rate of gene sequence divergence was assessed by comparing similarity level and functional category based on the Gene Ontology (GO) annotation of Arabidopsis genes. This revealed that genes encoding ribosomal proteins, as well as those related to translation, photosynthesis, and cellular structure were more abundantly represented in the highly conserved class, and that genes encoding transcription factors and receptor protein kinases were abundantly represented in the less conserved class. To make the sequence information and the cDNA clones available to the research community, a Web database with useful services was created at http://www.kazusa.or.jp/en/plant/lotus/EST/.
2012-01-01
Background Gene duplications play an important role in the evolution of functional protein diversity. Some models of duplicate gene evolution predict complex forms of paralog divergence; orthologous proteins may diverge as well, further complicating patterns of divergence among and within gene families. Consequently, studying the link between protein sequence evolution and duplication requires the use of flexible substitution models that can accommodate multiple shifts in selection across a phylogeny. Here, we employed a variety of codon substitution models, primarily Clade models, to explore how selective constraint evolved following the duplication of a green-sensitive (RH2a) visual pigment protein (opsin) in African cichlids. Past studies have linked opsin divergence to ecological and sexual divergence within the African cichlid adaptive radiation. Furthermore, biochemical and regulatory differences between the RH2aα and RH2aβ paralogs have been documented. It thus seems likely that selection varies in complex ways throughout this gene family. Results Clade model analysis of African cichlid RH2a opsins revealed a large increase in the nonsynonymous-to-synonymous substitution rate ratio (ω) following the duplication, as well as an even larger increase, one consistent with positive selection, for Lake Tanganyikan cichlid RH2aβ opsins. Analysis using the popular Branch-site models, by contrast, revealed no such alteration of constraint. Several amino acid sites known to influence spectral and non-spectral aspects of opsin biochemistry were found to be evolving divergently, suggesting that orthologous RH2a opsins may vary in terms of spectral sensitivity and response kinetics. Divergence appears to be occurring despite intronic gene conversion among the tandemly-arranged duplicates. Conclusions Our findings indicate that variation in selective constraint is associated with both gene duplication and divergence among orthologs in African cichlid RH2a opsins. At least some of this variation may reflect an adaptive response to differences in light environment. Interestingly, these patterns only became apparent through the use of Clade models, not through the use of the more widely employed Branch-site models; we suggest that this difference stems from the increased flexibility associated with Clade models. Our results thus bear both on studies of cichlid visual system evolution and on studies of gene family evolution in general. PMID:23078361
Dunbrack, Roland L.
2012-01-01
Motivation: Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. Results: We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM–HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. Availability: The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly. Contact: Roland.Dunbracks@fccc.edu PMID:22942020
Perfus-Barbeoch, Laetitia; Da Rocha, Martine; Sallet, Erika; Bailly-Bechet, Marc; Castagnone-Sereno, Philippe; Flot, Jean-François; Kozlowski, Djampa K.; Cazareth, Julie; Couloux, Arnaud; Da Silva, Corinne; Guy, Julie; Kim-Jo, Yu-Jin; Rancurel, Corinne; Abad, Pierre; Wincker, Patrick
2017-01-01
Root-knot nematodes (genus Meloidogyne) exhibit a diversity of reproductive modes ranging from obligatory sexual to fully asexual reproduction. Intriguingly, the most widespread and devastating species to global agriculture are those that reproduce asexually, without meiosis. To disentangle this surprising parasitic success despite the absence of sex and genetic exchanges, we have sequenced and assembled the genomes of three obligatory ameiotic and asexual Meloidogyne. We have compared them to those of relatives able to perform meiosis and sexual reproduction. We show that the genomes of ameiotic asexual Meloidogyne are large, polyploid and made of duplicated regions with a high within-species average nucleotide divergence of ~8%. Phylogenomic analysis of the genes present in these duplicated regions suggests that they originated from multiple hybridization events and are thus homoeologs. We found that up to 22% of homoeologous gene pairs were under positive selection and these genes covered a wide spectrum of predicted functional categories. To biologically assess functional divergence, we compared expression patterns of homoeologous gene pairs across developmental life stages using an RNAseq approach in the most economically important asexually-reproducing nematode. We showed that >60% of homoeologous gene pairs display diverged expression patterns. These results suggest a substantial functional impact of the genome structure. Contrasting with high within-species nuclear genome divergence, mitochondrial genome divergence between the three ameiotic asexuals was very low, signifying that these putative hybrids share a recent common maternal ancestor. Transposable elements (TE) cover a ~1.7 times higher proportion of the genomes of the ameiotic asexual Meloidogyne compared to the sexual relative and might also participate in their plasticity. The intriguing parasitic success of asexually-reproducing Meloidogyne species could be partly explained by their TE-rich composite genomes, resulting from allopolyploidization events, and promoting plasticity and functional divergence between gene copies in the absence of sex and meiosis. PMID:28594822
Multilocus sequence analysis and rpoB sequencing of Mycobacterium abscessus (sensu lato) strains.
Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate
2011-02-01
Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536(T), M. massiliense CIP 108297(T), and M. bolletii CIP 108541(T)) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the clustering of strains. We found 10/120 (8.3%) isolates for which the concatenated MLSA gene sequence and rpoB sequence were discordant (e.g., M. massiliense MLSA sequence and M. abscessus rpoB sequence), suggesting the intergroup lateral transfers of rpoB. In conclusion, our study strongly supports the recent proposal that M. abscessus, M. massiliense, and M. bolletii should constitute a single species. Our findings also indicate that there has been a horizontal transfer of rpoB sequences between these subgroups, precluding the use of rpoB sequencing alone for the accurate identification of the two proposed M. abscessus subspecies.
Multilocus Sequence Analysis and rpoB Sequencing of Mycobacterium abscessus (Sensu Lato) Strains▿
Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate
2011-01-01
Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536T, M. massiliense CIP 108297T, and M. bolletii CIP 108541T) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the clustering of strains. We found 10/120 (8.3%) isolates for which the concatenated MLSA gene sequence and rpoB sequence were discordant (e.g., M. massiliense MLSA sequence and M. abscessus rpoB sequence), suggesting the intergroup lateral transfers of rpoB. In conclusion, our study strongly supports the recent proposal that M. abscessus, M. massiliense, and M. bolletii should constitute a single species. Our findings also indicate that there has been a horizontal transfer of rpoB sequences between these subgroups, precluding the use of rpoB sequencing alone for the accurate identification of the two proposed M. abscessus subspecies. PMID:21106786
Mouse Vk gene classification by nucleic acid sequence similarity.
Strohal, R; Helmberg, A; Kroemer, G; Kofler, R
1989-01-01
Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.
Previously unknown and highly divergent ssDNA viruses populate the oceans.
Labonté, Jessica M; Suttle, Curtis A
2013-11-01
Single-stranded DNA (ssDNA) viruses are economically important pathogens of plants and animals, and are widespread in oceans; yet, the diversity and evolutionary relationships among marine ssDNA viruses remain largely unknown. Here we present the results from a metagenomic study of composite samples from temperate (Saanich Inlet, 11 samples; Strait of Georgia, 85 samples) and subtropical (46 samples, Gulf of Mexico) seawater. Most sequences (84%) had no evident similarity to sequenced viruses. In total, 608 putative complete genomes of ssDNA viruses were assembled, almost doubling the number of ssDNA viral genomes in databases. These comprised 129 genetically distinct groups, each represented by at least one complete genome that had no recognizable similarity to each other or to other virus sequences. Given that the seven recognized families of ssDNA viruses have considerable sequence homology within them, this suggests that many of these genetic groups may represent new viral families. Moreover, nearly 70% of the sequences were similar to one of these genomes, indicating that most of the sequences could be assigned to a genetically distinct group. Most sequences fell within 11 well-defined gene groups, each sharing a common gene. Some of these encoded putative replication and coat proteins that had similarity to sequences from viruses infecting eukaryotes, suggesting that these were likely from viruses infecting eukaryotic phytoplankton and zooplankton.
Robertson, Helen E; Lapraz, François; Egger, Bernhard; Telford, Maximilian J; Schiffer, Philipp H
2017-05-12
Acoels are small, ubiquitous - but understudied - marine worms with a very simple body plan. Their internal phylogeny is still not fully resolved, and the position of their proposed phylum Xenacoelomorpha remains debated. Here we describe mitochondrial genome sequences from the acoels Paratomella rubra and Isodiametra pulchra, and the complete mitochondrial genome of the acoel Archaphanostoma ylvae. The P. rubra and A. ylvae sequences are typical for metazoans in size and gene content. The larger I. pulchra mitochondrial genome contains both ribosomal genes, 21 tRNAs, but only 11 protein-coding genes. We find evidence suggesting a duplicated sequence in the I. pulchra mitochondrial genome. The P. rubra, I. pulchra and A. ylvae mitochondria have a unique genome organisation in comparison to other metazoan mitochondrial genomes. We found a large degree of protein-coding gene and tRNA overlap with little non-coding sequence in the compact P. rubra genome. Conversely, the A. ylvae and I. pulchra genomes have many long non-coding sequences between genes, likely driving genome size expansion in the latter. Phylogenetic trees inferred from mitochondrial genes retrieve Xenacoelomorpha as an early branching taxon in the deuterostomes. Sequence divergence analysis between P. rubra sampled in England and Spain indicates cryptic diversity.
J.B. Whittall; J. Syring; M. Parks; J. Buenrostro; C. Dick; A. Liston; R. Cronn
2010-01-01
Critical to conservation efforts and other investigations at low taxonomic levels, DNA sequence data offer important insights into the distinctiveness, biogeographic partitioning, and evolutionary histories of species. The resolving power of DNA sequences is often limited by insufficient variability at the intraspecific level. This is particularly true of studies...
Genome Sequence of the Yeast Clavispora lusitaniae Type Strain CBS 6936.
Durrens, Pascal; Klopp, Christophe; Biteau, Nicolas; Fitton-Ouhabi, Valérie; Dementhon, Karine; Accoceberry, Isabelle; Sherman, David J; Noël, Thierry
2017-08-03
Clavispora lusitaniae , an environmental saprophytic yeast belonging to the CTG clade of Candida , can behave occasionally as an opportunistic pathogen in humans. We report here the genome sequence of the type strain CBS 6936. Comparison with sequences of strain ATCC 42720 indicates conservation of chromosomal structure but significant nucleotide divergence. Copyright © 2017 Durrens et al.
Genome Sequence of the Yeast Clavispora lusitaniae Type Strain CBS 6936
Klopp, Christophe; Biteau, Nicolas; Fitton-Ouhabi, Valérie; Dementhon, Karine; Accoceberry, Isabelle; Sherman, David J.; Noël, Thierry
2017-01-01
ABSTRACT Clavispora lusitaniae, an environmental saprophytic yeast belonging to the CTG clade of Candida, can behave occasionally as an opportunistic pathogen in humans. We report here the genome sequence of the type strain CBS 6936. Comparison with sequences of strain ATCC 42720 indicates conservation of chromosomal structure but significant nucleotide divergence. PMID:28774979
Gaudeul, Myriam; Gardner, Martin F; Thomas, Philip; Ennos, Richard A; Hollingsworth, Pete M
2014-09-05
New Caledonia harbours a highly diverse and endemic flora, and 13 (out of the 19 worldwide) species of Araucaria are endemic to this territory. Their phylogenetic relationships remain largely unresolved. Using nuclear microsatellites and chloroplast DNA sequencing, we focused on five closely related Araucaria species to investigate among-species relationships and the distribution of within-species genetic diversity across New Caledonia. The species could be clearly distinguished here, except A. montana and A. laubenfelsii that were not differentiated and, at most, form a genetic cline. Given their apparent morphological and ecological similarity, we suggested that these two species may be considered as a single evolutionary unit. We observed cases of nuclear admixture and incongruence between nuclear and chloroplast data, probably explained by introgression and shared ancestral polymorphism. Ancient hybridization was evidenced between A. biramulata and A. laubenfelsii in Mt Do, and is strongly suspected between A. biramulata and A. rulei in Mt Tonta. In both cases, extensive asymmetrical backcrossing eliminated the influence of one parent in the nuclear DNA composition. Shared ancestral polymorphism was also observed for cpDNA, suggesting that species diverged recently, have large effective sizes and/or that cpDNA experienced slow rates of molecular evolution. Within-species genetic structure was pronounced, probably because of low gene flow and significant inbreeding, and appeared clearly influenced by geography. This may be due to survival in distinct refugia during Quaternary climatic oscillations. The study species probably diverged recently and/or are characterized by a slow rate of cpDNA sequence evolution, and introgression is strongly suspected. Within-species genetic structure is tightly linked with geography. We underline the conservation implications of our results, and highlight several perspectives.
Extensive Local Gene Duplication and Functional Divergence among Paralogs in Atlantic Salmon
Warren, Ian A.; Ciborowski, Kate L.; Casadei, Elisa; Hazlerigg, David G.; Martin, Sam; Jordan, William C.; Sumner, Seirian
2014-01-01
Many organisms can generate alternative phenotypes from the same genome, enabling individuals to exploit diverse and variable environments. A prevailing hypothesis is that such adaptation has been favored by gene duplication events, which generate redundant genomic material that may evolve divergent functions. Vertebrate examples of recent whole-genome duplications are sparse although one example is the salmonids, which have undergone a whole-genome duplication event within the last 100 Myr. The life-cycle of the Atlantic salmon, Salmo salar, depends on the ability to produce alternating phenotypes from the same genome, to facilitate migration and maintain its anadromous life history. Here, we investigate the hypothesis that genome-wide and local gene duplication events have contributed to the salmonid adaptation. We used high-throughput sequencing to characterize the transcriptomes of three key organs involved in regulating migration in S. salar: Brain, pituitary, and olfactory epithelium. We identified over 10,000 undescribed S. salar sequences and designed an analytic workflow to distinguish between paralogs originating from local gene duplication events or from whole-genome duplication events. These data reveal that substantial local gene duplications took place shortly after the whole-genome duplication event. Many of the identified paralog pairs have either diverged in function or become noncoding. Future functional genomics studies will reveal to what extent this rich source of divergence in genetic sequence is likely to have facilitated the evolution of extreme phenotypic plasticity required for an anadromous life-cycle. PMID:24951567
Biological function in the twilight zone of sequence conservation.
Ponting, Chris P
2017-08-16
Strong DNA conservation among divergent species is an indicator of enduring functionality. With weaker sequence conservation we enter a vast 'twilight zone' in which sequence subject to transient or lower constraint cannot be distinguished easily from neutrally evolving, non-functional sequence. Twilight zone functional sequence is illuminated instead by principles of selective constraint and positive selection using genomic data acquired from within a species' population. Application of these principles reveals that despite being biochemically active, most twilight zone sequence is not functional.
A highly divergent Puumala virus lineage in southern Poland.
Rosenfeld, Ulrike M; Drewes, Stephan; Ali, Hanan Sheikh; Sadowska, Edyta T; Mikowska, Magdalena; Heckel, Gerald; Koteja, Paweł; Ulrich, Rainer G
2017-05-01
Puumala virus (PUUV) represents one of the most important hantaviruses in Central Europe. Phylogenetic analyses of PUUV strains indicate a strong genetic structuring of this hantavirus. Recently, PUUV sequences were identified in the natural reservoir, the bank vole (Myodes glareolus), collected in the northern part of Poland. The objective of this study was to evaluate the presence of PUUV in bank voles from southern Poland. A total of 72 bank voles were trapped in 2009 at six sites in this part of Poland. RT-PCR and IgG-ELISA analyses detected three PUUV positive voles at one trapping site. The PUUV-infected animals were identified by cytochrome b gene analysis to belong to the Carpathian and Eastern evolutionary lineages of bank vole. The novel PUUV S, M and L segment nucleotide sequences showed the closest similarity to sequences of the Russian PUUV lineage from Latvia, but were highly divergent to those previously found in northern Poland, Slovakia and Austria. In conclusion, the detection of a highly divergent PUUV lineage in southern Poland indicates the necessity of further bank vole monitoring in this region allowing rational public health measures to prevent human infections.
Testing the molecular clock using mechanistic models of fossil preservation and molecular evolution
2017-01-01
Molecular sequence data provide information about relative times only, and fossil-based age constraints are the ultimate source of information about absolute times in molecular clock dating analyses. Thus, fossil calibrations are critical to molecular clock dating, but competing methods are difficult to evaluate empirically because the true evolutionary time scale is never known. Here, we combine mechanistic models of fossil preservation and sequence evolution in simulations to evaluate different approaches to constructing fossil calibrations and their impact on Bayesian molecular clock dating, and the relative impact of fossil versus molecular sampling. We show that divergence time estimation is impacted by the model of fossil preservation, sampling intensity and tree shape. The addition of sequence data may improve molecular clock estimates, but accuracy and precision is dominated by the quality of the fossil calibrations. Posterior means and medians are poor representatives of true divergence times; posterior intervals provide a much more accurate estimate of divergence times, though they may be wide and often do not have high coverage probability. Our results highlight the importance of increased fossil sampling and improved statistical approaches to generating calibrations, which should incorporate the non-uniform nature of ecological and temporal fossil species distributions. PMID:28637852
Erickson, Harold P.
2009-01-01
Summary The eukaryotic cytoskeleton appears to have evolved from ancestral precursors related to prokaryotic FtsZ and MreB. FtsZ and MreB show 40−50% sequence identity across different bacterial and archaeal species. Here I suggest that this represents the limit of divergence that is consistent with maintaining their functions for cytokinesis and cell shape. Previous analyses have noted that tubulin and actin are highly conserved across eukaryotic species, but so divergent from their prokaryotic relatives as to be hardly recognizable from sequence comparisons. One suggestion for this extreme divergence of tubulin and actin is that it occurred as they evolved very different functions from FtsZ and MreB. I will present new arguments favoring this suggestion, and speculate on pathways. Moreover, the extreme conservation of tubulin and actin across eukaryotic species is not due to an intrinsic lack of variability, but is attributed to their acquisition of elaborate mechanisms for assembly dynamics and their interactions with multiple motor and binding proteins. A new structure-based sequence alignment identifies amino acids that are conserved from FtsZ to tubulins. The highly conserved amino acids are not those forming the subunit core or protofilament interface, but those involved in binding and hydrolysis of GTP. PMID:17563102
Mitochondrial genomes reveal the extinct Hippidion as an outgroup to all living equids.
Der Sarkissian, Clio; Vilstrup, Julia T; Schubert, Mikkel; Seguin-Orlando, Andaine; Eme, David; Weinstock, Jacobo; Alberdi, Maria Teresa; Martin, Fabiana; Lopez, Patricio M; Prado, Jose L; Prieto, Alfredo; Douady, Christophe J; Stafford, Tom W; Willerslev, Eske; Orlando, Ludovic
2015-03-01
Hippidions were equids with very distinctive anatomical features. They lived in South America 2.5 million years ago (Ma) until their extinction approximately 10 000 years ago. The evolutionary origin of the three known Hippidion morphospecies is still disputed. Based on palaeontological data, Hippidion could have diverged from the lineage leading to modern equids before 10 Ma. In contrast, a much later divergence date, with Hippidion nesting within modern equids, was indicated by partial ancient mitochondrial DNA sequences. Here, we characterized eight Hippidion complete mitochondrial genomes at 3.4-386.3-fold coverage using target-enrichment capture and next-generation sequencing. Our dataset reveals that the two morphospecies sequenced (H. saldiasi and H. principale) formed a monophyletic clade, basal to extant and extinct Equus lineages. This contrasts with previous genetic analyses and supports Hippidion as a distinct genus, in agreement with palaeontological models. We date the Hippidion split from Equus at 5.6-6.5 Ma, suggesting an early divergence in North America prior to the colonization of South America, after the formation of the Panamanian Isthmus 3.5 Ma and the Great American Biotic Interchange. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Mitochondrial genomes reveal the extinct Hippidion as an outgroup to all living equids
Der Sarkissian, Clio; Vilstrup, Julia T.; Schubert, Mikkel; Seguin-Orlando, Andaine; Eme, David; Weinstock, Jacobo; Alberdi, Maria Teresa; Martin, Fabiana; Lopez, Patricio M.; Prado, Jose L.; Prieto, Alfredo; Douady, Christophe J.; Stafford, Tom W.; Willerslev, Eske; Orlando, Ludovic
2015-01-01
Hippidions were equids with very distinctive anatomical features. They lived in South America 2.5 million years ago (Ma) until their extinction approximately 10 000 years ago. The evolutionary origin of the three known Hippidion morphospecies is still disputed. Based on palaeontological data, Hippidion could have diverged from the lineage leading to modern equids before 10 Ma. In contrast, a much later divergence date, with Hippidion nesting within modern equids, was indicated by partial ancient mitochondrial DNA sequences. Here, we characterized eight Hippidion complete mitochondrial genomes at 3.4–386.3-fold coverage using target-enrichment capture and next-generation sequencing. Our dataset reveals that the two morphospecies sequenced (H. saldiasi and H. principale) formed a monophyletic clade, basal to extant and extinct Equus lineages. This contrasts with previous genetic analyses and supports Hippidion as a distinct genus, in agreement with palaeontological models. We date the Hippidion split from Equus at 5.6–6.5 Ma, suggesting an early divergence in North America prior to the colonization of South America, after the formation of the Panamanian Isthmus 3.5 Ma and the Great American Biotic Interchange. PMID:25762573
Smith, M. Alex; Fisher, Brian L; Hebert, Paul D.N
2005-01-01
The role of DNA barcoding as a tool to accelerate the inventory and analysis of diversity for hyperdiverse arthropods is tested using ants in Madagascar. We demonstrate how DNA barcoding helps address the failure of current inventory methods to rapidly respond to pressing biodiversity needs, specifically in the assessment of richness and turnover across landscapes with hyperdiverse taxa. In a comparison of inventories at four localities in northern Madagascar, patterns of richness were not significantly different when richness was determined using morphological taxonomy (morphospecies) or sequence divergence thresholds (Molecular Operational Taxonomic Unit(s); MOTU). However, sequence-based methods tended to yield greater richness and significantly lower indices of similarity than morphological taxonomy. MOTU determined using our molecular technique were a remarkably local phenomenon—indicative of highly restricted dispersal and/or long-term isolation. In cases where molecular and morphological methods differed in their assignment of individuals to categories, the morphological estimate was always more conservative than the molecular estimate. In those cases where morphospecies descriptions collapsed distinct molecular groups, sequence divergences of 16% (on average) were contained within the same morphospecies. Such high divergences highlight taxa for further detailed genetic, morphological, life history, and behavioral studies. PMID:16214741
Xiang, Xian-ling; Xi, Yi-long; Wen, Xin-li; Zhang, Gen; Wang, Jin-xia; Hu, Ke
2011-05-01
Elucidating the evolutionary patterns and processes of extant species is an important objective of any research program that seeks to understand population divergence and, ultimately, speciation. The island-like nature and temporal fluctuation of limnetic habitats create opportunities for genetic differentiation in rotifers through space and time. To gain further understanding of spatio-temporal patterns of genetic differentiation in rotifers other than the well-studied Brachionus plicatilis complex in brackish water, a total of 318 nrDNA ITS sequences from the B. calyciflorus complex in freshwater were analysed using phylogenetic and phylogeographic methods. DNA taxonomy conducted by both the sequence divergence and the GMYC model suggested the occurrence of six potential cryptic species, supported also by reproductive isolation among the tested lineages. The significant genetic differentiation and non-significant correlation between geographic and genetic distances existed in the most abundant cryptic species, BcI-W and Bc-SW. The large proportion of genetic variability for cryptic species Bc-SW was due to differences between sampling localities within seasons, rather than between different seasons. Nested Clade Analysis suggested allopatric or past fragmentation, contiguous range expansion and long-distance colonization possibly coupled with subsequent fragmentation as the probable main forces shaping the present-day phylogeographic structure of the B. calyciflorus species complex. Copyright © 2011 Elsevier Inc. All rights reserved.
A greedy, graph-based algorithm for the alignment of multiple homologous gene lists.
Fostier, Jan; Proost, Sebastian; Dhoedt, Bart; Saeys, Yvan; Demeester, Piet; Van de Peer, Yves; Vandepoele, Klaas
2011-03-15
Many comparative genomics studies rely on the correct identification of homologous genomic regions using accurate alignment tools. In such case, the alphabet of the input sequences consists of complete genes, rather than nucleotides or amino acids. As optimal multiple sequence alignment is computationally impractical, a progressive alignment strategy is often employed. However, such an approach is susceptible to the propagation of alignment errors in early pairwise alignment steps, especially when dealing with strongly diverged genomic regions. In this article, we present a novel accurate and efficient greedy, graph-based algorithm for the alignment of multiple homologous genomic segments, represented as ordered gene lists. Based on provable properties of the graph structure, several heuristics are developed to resolve local alignment conflicts that occur due to gene duplication and/or rearrangement events on the different genomic segments. The performance of the algorithm is assessed by comparing the alignment results of homologous genomic segments in Arabidopsis thaliana to those obtained by using both a progressive alignment method and an earlier graph-based implementation. Especially for datasets that contain strongly diverged segments, the proposed method achieves a substantially higher alignment accuracy, and proves to be sufficiently fast for large datasets including a few dozens of eukaryotic genomes. http://bioinformatics.psb.ugent.be/software. The algorithm is implemented as a part of the i-ADHoRe 3.0 package.
Studying the genetic basis of speciation in high gene flow marine invertebrates
2016-01-01
A growing number of genes responsible for reproductive incompatibilities between species (barrier loci) exhibit the signals of positive selection. However, the possibility that genes experiencing positive selection diverge early in speciation and commonly cause reproductive incompatibilities has not been systematically investigated on a genome-wide scale. Here, I outline a research program for studying the genetic basis of speciation in broadcast spawning marine invertebrates that uses a priori genome-wide information on a large, unbiased sample of genes tested for positive selection. A targeted sequence capture approach is proposed that scores single-nucleotide polymorphisms (SNPs) in widely separated species populations at an early stage of allopatric divergence. The targeted capture of both coding and non-coding sequences enables SNPs to be characterized at known locations across the genome and at genes with known selective or neutral histories. The neutral coding and non-coding SNPs provide robust background distributions for identifying FST-outliers within genes that can, in principle, identify specific mutations experiencing diversifying selection. If natural hybridization occurs between species, the neutral coding and non-coding SNPs can provide a neutral admixture model for genomic clines analyses aimed at finding genes exhibiting strong blocks to introgression. Strongylocentrotid sea urchins are used as a model system to outline the approach but it can be used for any group that has a complete reference genome available. PMID:29491951
Nougairède, Antoine; Joffret, Marie-Line; Deshpande, Jagadish M.; Dubot-Pérès, Audrey; Héraud, Jean-Michel
2014-01-01
Most circulating strains of Human enterovirus 71 (EV-A71) have been classified primarily into three genogroups (A to C) on the basis of genetic divergence between the 1D gene, which encodes the VP1 capsid protein. The aim of the present study was to provide further insights into the diversity of the EV-A71 genogroups following the recent description of highly divergent isolates, in particular those from African countries, including Madagascar. We classified recent EV-A71 isolates by a large comparison of 3,346 VP1 nucleotidic sequences collected from GenBank. Analysis of genetic distances and phylogenetic investigations indicated that some recently-reported isolates did not fall into the genogroups A-C and clustered into three additional genogroups, including one Indian genogroup (genogroup D) and 2 African ones (E and F). Our Bayesian phylogenetic analysis provided consistent data showing that the genogroup D isolates share a recent common ancestor with the members of genogroup E, while the isolates of genogroup F evolved from a recent common ancestor shared with the members of the genogroup B. Our results reveal the wide diversity that exists among EV-A71 isolates and suggest that the number of circulating genogroups is probably underestimated, particularly in developing countries where EV-A71 epidemiology has been poorly studied. PMID:24598878
A DNA Barcode Library for North American Pyraustinae (Lepidoptera: Pyraloidea: Crambidae).
Yang, Zhaofu; Landry, Jean-François; Hebert, Paul D N
2016-01-01
Although members of the crambid subfamily Pyraustinae are frequently important crop pests, their identification is often difficult because many species lack conspicuous diagnostic morphological characters. DNA barcoding employs sequence diversity in a short standardized gene region to facilitate specimen identifications and species discovery. This study provides a DNA barcode reference library for North American pyraustines based upon the analysis of 1589 sequences recovered from 137 nominal species, 87% of the fauna. Data from 125 species were barcode compliant (>500bp, <1% n), and 99 of these taxa formed a distinct cluster that was assigned to a single BIN. The other 26 species were assigned to 56 BINs, reflecting frequent cases of deep intraspecific sequence divergence and a few instances of barcode sharing, creating a total of 155 BINs. Two systems for OTU designation, ABGD and BIN, were examined to check the correspondence between current taxonomy and sequence clusters. The BIN system performed better than ABGD in delimiting closely related species, while OTU counts with ABGD were influenced by the value employed for relative gap width. Different species with low or no interspecific divergence may represent cases of unrecognized synonymy, whereas those with high intraspecific divergence require further taxonomic scrutiny as they may involve cryptic diversity. The barcode library developed in this study will also help to advance understanding of relationships among species of Pyraustinae.
Hemispheric Connectivity and the Visual-Spatial Divergent-Thinking Component of Creativity
ERIC Educational Resources Information Center
Moore, Dana W.; Bhadelia, Rafeeque A.; Billings, Rebecca L.; Fulwiler, Carl; Heilman, Kenneth M.; Rood, Kenneth M. J.; Gansler, David A.
2009-01-01
Background/hypothesis: Divergent thinking is an important measurable component of creativity. This study tested the postulate that divergent thinking depends on large distributed inter- and intra-hemispheric networks. Although preliminary evidence supports increased brain connectivity during divergent thinking, the neural correlates of this…
Phylogenetic relationships of cone snails endemic to Cabo Verde based on mitochondrial genomes.
Abalde, Samuel; Tenorio, Manuel J; Afonso, Carlos M L; Uribe, Juan E; Echeverry, Ana M; Zardoya, Rafael
2017-11-25
Due to their great species and ecological diversity as well as their capacity to produce hundreds of different toxins, cone snails are of interest to evolutionary biologists, pharmacologists and amateur naturalists alike. Taxonomic identification of cone snails still relies mostly on the shape, color, and banding patterns of the shell. However, these phenotypic traits are prone to homoplasy. Therefore, the consistent use of genetic data for species delimitation and phylogenetic inference in this apparently hyperdiverse group is largely wanting. Here, we reconstruct the phylogeny of the cones endemic to Cabo Verde archipelago, a well-known radiation of the group, using mitochondrial (mt) genomes. The reconstructed phylogeny grouped the analyzed species into two main clades, one including Kalloconus from West Africa sister to Trovaoconus from Cabo Verde and the other with a paraphyletic Lautoconus due to the sister group relationship of Africonus from Cabo Verde and Lautoconus ventricosus from Mediterranean Sea and neighboring Atlantic Ocean to the exclusion of Lautoconus endemic to Senegal (plus Lautoconus guanche from Mauritania, Morocco, and Canary Islands). Within Trovaoconus, up to three main lineages could be distinguished. The clade of Africonus included four main lineages (named I to IV), each further subdivided into two monophyletic groups. The reconstructed phylogeny allowed inferring the evolution of the radula in the studied lineages as well as biogeographic patterns. The number of cone species endemic to Cabo Verde was revised under the light of sequence divergence data and the inferred phylogenetic relationships. The sequence divergence between continental members of the genus Kalloconus and island endemics ascribed to the genus Trovaoconus is low, prompting for synonymization of the latter. The genus Lautoconus is paraphyletic. Lautoconus ventricosus is the closest living sister group of genus Africonus. Diversification of Africonus was in allopatry due to the direct development nature of their larvae and mainly triggered by eustatic sea level changes during the Miocene-Pliocene. Our study confirms the diversity of cone endemic to Cabo Verde but significantly reduces the number of valid species. Applying a sequence divergence threshold, the number of valid species within the sampled Africonus is reduced to half.
Liu, Yang; Liao, Lihuan; Zhang, Xiuyue; Yue, Bisong
2012-01-01
The southeastern margin of the Tibetan Plateau (SEMTP) is a particularly interesting region due to its topographic complexity and unique geologic history, but phylogeographic studies that focus on this region are rare. In this study, we investigated the phylogeography of the South China field mouse, Apodemus draco, in order to assess the role of geologic and climatic events on the Tibetan Plateau in shaping its genetic structure. We sequenced mitochondrial cytochrome b (cyt b) sequences in 103 individuals from 47 sampling sites. In addition, 23 cyt b sequences were collected from GenBank for analyses. Phylogenetic, demographic and landscape genetic methods were conducted. Seventy-six cyt b haplotypes were found and the genetic diversity was extremely high (π = 0.0368; h = 0.989). Five major evolutionary clades, based on geographic locations, were identified. Demographic analyses implied subclade 1A and subclade 1B experienced population expansions at about 0.052-0.013 Mya and 0.014-0.004 Mya, respectively. The divergence time analysis showed that the split between clade 1 and clade 2 occurred 0.26 Mya, which fell into the extensive glacial period (EGP, 0.5-0.17 Mya). The divergence times of other main clades (2.20-0.55 Mya) were congruent with the periods of the Qingzang Movement (3.6-1.7 Mya) and the Kun-Huang Movement (1.2-0.6 Mya), which were known as the most intense uplift events in the Tibetan Plateau. Our study supported the hypothesis that the SEMTP was a large late Pleistocene refugium, and further inferred that the Gongga Mountain Region and Hongya County were glacial refugia for A. draco in clade 1. We hypothesize that the evolutionary history of A. draco in the SEMTP primarily occurred in two stages. First, an initial divergence would have been shaped by uplift events of the Tibetan Plateau. Then, major glaciations in the Pleistocene added complexity to its demographic history and genetic structure. PMID:22666478
DeBoy, Robert T; Mongodin, Emmanuel F; Emerson, Joanne B; Nelson, Karen E
2006-04-01
In the present study, the chromosomes of two members of the Thermotogales were compared. A whole-genome alignment of Thermotoga maritima MSB8 and Thermotoga neapolitana NS-E has revealed numerous large-scale DNA rearrangements, most of which are associated with CRISPR DNA repeats and/or tRNA genes. These DNA rearrangements do not include the putative origin of DNA replication but move within the same replichore, i.e., the same replicating half of the chromosome (delimited by the replication origin and terminus). Based on cumulative GC skew analysis, both the T. maritima and T. neapolitana lineages contain one or two major inverted DNA segments. Also, based on PCR amplification and sequence analysis of the DNA joints that are associated with the major rearrangements, the overall chromosome architecture was found to be conserved at most DNA joints for other strains of T. neapolitana. Taken together, the results from this analysis suggest that the observed chromosomal rearrangements in the Thermotogales likely occurred by successive inversions after their divergence from a common ancestor and before strain diversification. Finally, sequence analysis shows that size polymorphisms in the DNA joints associated with CRISPRs can be explained by expansion and possibly contraction of the DNA repeat and spacer unit, providing a tool for discerning the relatedness of strains from different geographic locations.
Garzón-Orduña, Ivonne J; Menchaca-Armenta, Imelda; Contreras-Ramos, Atilano; Liu, Xingyue; Winterton, Shaun L
2016-09-20
The last time the phylogenetic relationships among members of the family Hemerobiidae were studied quantitatively was over 12 years ago and based exclusively on morphology. Our study builds upon this morphological evidence by adding sequence data from three gene loci to provide a total evidence phylogeny of brown lacewings (Neuroptera: Hemerobiidae). Thirty-seven species representing nineteen Hemerobiidae genera were compared with outgroups from the families Ithonidae, Psychopsidae and Chrysopidae in Bayesian and parsimony analyses using a single nuclear gene (CAD) and two mitochondrial (16S rDNA and Cytochrome Oxidase I) genes. We compare divergence time estimates of Hemerobiidae cladogenesis under the two most commonly used relaxed clock models and discuss the evolution of wing venation in the family. We recovered a phylogeny largely incongruent with previously published morphological studies, although all but two subfamilies (i.e., Notiobiellinae and Drepanacrinae) were recovered as monophyletic. We found the subfamily Drepanacrinae paraphyletic with respect to Psychobiellinae, and Notiobiellinae to be polyphyletic. We thus offer a revised concept of Notiobiellinae, comprising only Notiobiella Banks, and erect a new subfamily Zachobiellinae including the remaining genera previously placed in Notiobiellinae. Psychobiellinae is synonymized with Drepanacrinae. Unlike the previous hypothesis that proposed a remarkably laddered topology, our tree suggests that hemerobiids diverged as three main clades. Moreover, in contrast to the vein proliferation hypothesis, we found that hemerobiids have instead undergone multiple reductions in the number of radial veins, this scenario questions the relevance of this character as diagnostic of various subfamilies Our phylogenetic hypothesis and divergence times analysis suggest that extant hemerobiids originated around the end of the Triassic and evolved as three distinct clades that diverged from one another during the Late Jurassic to Early Cretaceous. Contrary to earlier phylogenetic hypotheses, Carobius Banks (Carobiinae) is sister to the previously unplaced genus Notherobius New in a clade more closely related to Sympherobiinae, Megalominae and Zachobiellinae subfam. nov. The addition of taxa which are not available for DNA sequencing should be the focus of future studies, especially Adelphohemerobius Oswald, which is particularly important to test our inferences regarding the evolution of wing venation in Hemerobiidae.
Biedrzycka, Aleksandra; O'Connor, Emily; Sebastian, Alvaro; Migalska, Magdalena; Radwan, Jacek; Zając, Tadeusz; Bielański, Wojciech; Solarz, Wojciech; Ćmiel, Adam; Westerdahl, Helena
2017-07-05
Recent work suggests that gene duplications may play an important role in the evolution of immunity genes. Passerine birds, and in particular Sylvioidea warblers, have highly duplicated major histocompatibility complex (MHC) genes, which are key in immunity, compared to other vertebrates. However, reasons for this high MHC gene copy number are yet unclear. High-throughput sequencing (HTS) allows MHC genotyping even in individuals with extremely duplicated genes. This HTS data can reveal evidence of selection, which may help to unravel the putative functions of different gene copies, i.e. neofunctionalization. We performed exhaustive genotyping of MHC class I in a Sylvioidea warbler, the sedge warbler, Acrocephalus schoenobaenus, using the Illumina MiSeq technique on individuals from a wild study population. The MHC diversity in 863 genotyped individuals by far exceeds that of any other bird species described to date. A single individual could carry up to 65 different alleles, a large proportion of which are expressed (transcribed). The MHC alleles were of three different lengths differing in evidence of selection, diversity and divergence within our study population. Alleles without any deletions and alleles containing a 6 bp deletion showed characteristics of classical MHC genes, with evidence of multiple sites subject to positive selection and high sequence divergence. In contrast, alleles containing a 3 bp deletion had no sites subject to positive selection and had low divergence. Our results suggest that sedge warbler MHC alleles that either have no deletion, or contain a 6 bp deletion, encode classical antigen presenting MHC molecules. In contrast, MHC alleles containing a 3 bp deletion may encode molecules with a different function. This study demonstrates that highly duplicated MHC genes can be characterised with HTS and that selection patterns can be useful for revealing neofunctionalization. Importantly, our results highlight the need to consider the putative function of different MHC genes in future studies of MHC in relation to disease resistance and fitness.
NASA Technical Reports Server (NTRS)
Battistuzzi, Fabia U.; Feijao, Andreia; Hedges, S. Blair
2004-01-01
BACKGROUND: The timescale of prokaryote evolution has been difficult to reconstruct because of a limited fossil record and complexities associated with molecular clocks and deep divergences. However, the relatively large number of genome sequences currently available has provided a better opportunity to control for potential biases such as horizontal gene transfer and rate differences among lineages. We assembled a data set of sequences from 32 proteins (approximately 7600 amino acids) common to 72 species and estimated phylogenetic relationships and divergence times with a local clock method. RESULTS: Our phylogenetic results support most of the currently recognized higher-level groupings of prokaryotes. Of particular interest is a well-supported group of three major lineages of eubacteria (Actinobacteria, Deinococcus, and Cyanobacteria) that we call Terrabacteria and associate with an early colonization of land. Divergence time estimates for the major groups of eubacteria are between 2.5-3.2 billion years ago (Ga) while those for archaebacteria are mostly between 3.1-4.1 Ga. The time estimates suggest a Hadean origin of life (prior to 4.1 Ga), an early origin of methanogenesis (3.8-4.1 Ga), an origin of anaerobic methanotrophy after 3.1 Ga, an origin of phototrophy prior to 3.2 Ga, an early colonization of land 2.8-3.1 Ga, and an origin of aerobic methanotrophy 2.5-2.8 Ga. CONCLUSIONS: Our early time estimates for methanogenesis support the consideration of methane, in addition to carbon dioxide, as a greenhouse gas responsible for the early warming of the Earths' surface. Our divergence times for the origin of anaerobic methanotrophy are compatible with highly depleted carbon isotopic values found in rocks dated 2.8-2.6 Ga. An early origin of phototrophy is consistent with the earliest bacterial mats and structures identified as stromatolites, but a 2.6 Ga origin of cyanobacteria suggests that those Archean structures, if biologically produced, were made by anoxygenic photosynthesizers. The resistance to desiccation of Terrabacteria and their elaboration of photoprotective compounds suggests that the common ancestor of this group inhabited land. If true, then oxygenic photosynthesis may owe its origin to terrestrial adaptations.
Lappin, Fiona M; Shaw, Rebecca L; Macqueen, Daniel J
2016-12-01
High-throughput sequencing has revolutionised comparative and evolutionary genome biology. It has now become relatively commonplace to generate multiple genomes and/or transcriptomes to characterize the evolution of large taxonomic groups of interest. Nevertheless, such efforts may be unsuited to some research questions or remain beyond the scope of some research groups. Here we show that targeted high-throughput sequencing offers a viable alternative to study genome evolution across a vertebrate family of great scientific interest. Specifically, we exploited sequence capture and Illumina sequencing to characterize the evolution of key components from the insulin-like growth (IGF) signalling axis of salmonid fish at unprecedented phylogenetic resolution. The IGF axis represents a central governor of vertebrate growth and its core components were expanded by whole genome duplication in the salmonid ancestor ~95Ma. Using RNA baits synthesised to genes encoding the complete family of IGF binding proteins (IGFBP) and an IGF hormone (IGF2), we captured, sequenced and assembled orthologous and paralogous exons from species representing all ten salmonid genera. This approach generated 299 novel sequences, most as complete or near-complete protein-coding sequences. Phylogenetic analyses confirmed congruent evolutionary histories for all nineteen recognized salmonid IGFBP family members and identified novel salmonid-specific IGF2 paralogues. Moreover, we reconstructed the evolution of duplicated IGF axis paralogues across a replete salmonid phylogeny, revealing complex historic selection regimes - both ancestral to salmonids and lineage-restricted - that frequently involved asymmetric paralogue divergence under positive and/or relaxed purifying selection. Our findings add to an emerging literature highlighting diverse applications for targeted sequencing in comparative-evolutionary genomics. We also set out a viable approach to obtain large sets of nuclear genes for any member of the salmonid family, which should enable insights into the evolutionary role of whole genome duplication before additional nuclear genome sequences become available. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Gabrieli, Paolo; Gomulski, Ludvik M.; Bonomi, Angelica; Siciliano, Paolo; Scolari, Francesca; Franz, Gerald; Jessup, Andrew; Malacrida, Anna R.; Gasperi, Giuliano
2011-01-01
Background Diptera have an extraordinary variety of sex determination mechanisms, and Drosophila melanogaster is the paradigm for this group. However, the Drosophila sex determination pathway is only partially conserved and the family Tephritidae affords an interesting example. The tephritid Y chromosome is postulated to be necessary to determine male development. Characterization of Y sequences, apart from elucidating the nature of the male determining factor, is also important to understand the evolutionary history of sex chromosomes within the Tephritidae. We studied the Y sequences from the olive fly, Bactrocera oleae. Its Y chromosome is minute and highly heterochromatic, and displays high heteromorphism with the X chromosome. Methodology/Principal Findings A combined Representational Difference Analysis (RDA) and fluorescence in-situ hybridization (FISH) approach was used to investigate the Y chromosome to derive information on its sequence content. The Y chromosome is strewn with repetitive DNA sequences, the majority of which are also interdispersed in the pericentromeric regions of the autosomes. The Y chromosome appears to have accumulated small and large repetitive interchromosomal duplications. The large interchromosomal duplications harbour an importin-4-like gene fragment. Apart from these importin-4-like sequences, the other Y repetitive sequences are not shared with the X chromosome, suggesting molecular differentiation of these two chromosomes. Moreover, as the identified Y sequences were not detected on the Y chromosomes of closely related tephritids, we can infer divergence in the repetitive nature of their sequence contents. Conclusions/Significance The identification of Y-linked sequences may tell us much about the repetitive nature, the origin and the evolution of Y chromosomes. We hypothesize how these repetitive sequences accumulated and were maintained on the Y chromosome during its evolutionary history. Our data reinforce the idea that the sex chromosomes of the Tephritidae may have distinct evolutionary origins with respect to those of the Drosophilidae and other Dipteran families. PMID:21408187
Dennenmoser, Stefan; Vamosi, Steven M; Nolte, Arne W; Rogers, Sean M
2017-01-01
Understanding the genomic basis of adaptive divergence in the presence of gene flow remains a major challenge in evolutionary biology. In prickly sculpin (Cottus asper), an abundant euryhaline fish in northwestern North America, high genetic connectivity among brackish-water (estuarine) and freshwater (tributary) habitats of coastal rivers does not preclude the build-up of neutral genetic differentiation and emergence of different life history strategies. Because these two habitats present different osmotic niches, we predicted high genetic differentiation at known teleost candidate genes underlying salinity tolerance and osmoregulation. We applied whole-genome sequencing of pooled DNA samples (Pool-Seq) to explore adaptive divergence between two estuarine and two tributary habitats. Paired-end sequence reads were mapped against genomic contigs of European Cottus, and the gene content of candidate regions was explored based on comparisons with the threespine stickleback genome. Genes showing signals of repeated differentiation among brackish-water and freshwater habitats included functions such as ion transport and structural permeability in freshwater gills, which suggests that local adaptation to different osmotic niches might contribute to genomic divergence among habitats. Overall, the presence of both repeated and unique signatures of differentiation across many loci scattered throughout the genome is consistent with polygenic adaptation from standing genetic variation and locally variable selection pressures in the early stages of life history divergence. © 2016 John Wiley & Sons Ltd.
Kim, Young-Kyu; Park, Chong-wook; Kim, Ki-Joong
2009-03-31
The chloroplast DNA sequences of Megaleranthis saniculifolia, an endemic and monotypic endangered plant species, were completed in this study (GenBank FJ597983). The genome is 159,924 bp in length. It harbors a pair of IR regions consisting of 26,608 bp each. The lengths of the LSC and SSC regions are 88,326 bp and 18,382 bp, respectively. The structural organizations, gene and intron contents, gene orders, AT contents, codon usages, and transcription units of the Megaleranthis chloroplast genome are similar to those of typical land plant cp DNAs. However, the detailed features of Megaleranthis chloroplast genomes are substantially different from that of Ranunculus, which belongs to the same family, the Ranunculaceae. First, the Megaleranthis cp DNA was 4,797 bp longer than that of Ranunculus due to an expanded IR region into the SSC region and duplicated sequence elements in several spacer regions of the Megaleranthis cp genome. Second, the chloroplast genomes of Megaleranthis and Ranunculus evidence 5.6% sequence divergence in the coding regions, 8.9% sequence divergence in the intron regions, and 18.7% sequence divergence in the intergenic spacer regions, respectively. In both the coding and noncoding regions, average nucleotide substitution rates differed markedly, depending on the genome position. Our data strongly implicate the positional effects of the evolutionary modes of chloroplast genes. The genes evidencing higher levels of base substitutions also have higher incidences of indel mutations and low Ka/Ks ratios. A total of 54 simple sequence repeat loci were identified from the Megaleranthis cp genome. The existence of rich cp SSR loci in the Megaleranthis cp genome provides a rare opportunity to study the population genetic structures of this endangered species. Our phylogenetic trees based on the two independent markers, the nuclear ITS and chloroplast matK sequences, strongly support the inclusion of the Megaleranthis to the Trollius. Therefore, our molecular trees support Ohwi's original treatment of Megaleranthis saniculiforia to Trollius chosenensis Ohwi.
Kibenge, Molly J T; Iwamoto, Tokinori; Wang, Yingwei; Morton, Alexandra; Godoy, Marcos G; Kibenge, Frederick S B
2013-07-11
Piscine reovirus (PRV) is a newly discovered fish reovirus of anadromous and marine fish ubiquitous among fish in Norwegian salmon farms, and likely the causative agent of heart and skeletal muscle inflammation (HSMI). HSMI is an increasingly economically significant disease in Atlantic salmon (Salmo salar) farms. The nucleotide sequence data available for PRV are limited, and there is no genetic information on this virus outside of Norway and none from wild fish. RT-PCR amplification and sequencing were used to obtain the complete viral genome of PRV (10 segments) from western Canada and Chile. The genetic diversity among the PRV strains and their relationship to Norwegian PRV isolates were determined by phylogenetic analyses and sequence identity comparisons. PRV is distantly related to members of the genera Orthoreovirus and Aquareovirus and an unambiguous new genus within the family Reoviridae. The Canadian and Norwegian PRV strains are most divergent in the segment S1 and S4 encoded proteins. Phylogenetic analysis of PRV S1 sequences, for which the largest number of complete sequences from different "isolates" is available, grouped Norwegian PRV strains into a single genotype, Genotype I, with sub-genotypes, Ia and Ib. The Canadian PRV strains matched sub-genotype Ia and Chilean PRV strains matched sub-genotype Ib. PRV should be considered as a member of a new genus within the family Reoviridae with two major Norwegian sub-genotypes. The Canadian PRV diverged from Norwegian sub-genotype Ia around 2007 ± 1, whereas the Chilean PRV diverged from Norwegian sub-genotype Ib around 2008 ± 1.
2013-01-01
Background Piscine reovirus (PRV) is a newly discovered fish reovirus of anadromous and marine fish ubiquitous among fish in Norwegian salmon farms, and likely the causative agent of heart and skeletal muscle inflammation (HSMI). HSMI is an increasingly economically significant disease in Atlantic salmon (Salmo salar) farms. The nucleotide sequence data available for PRV are limited, and there is no genetic information on this virus outside of Norway and none from wild fish. Methods RT-PCR amplification and sequencing were used to obtain the complete viral genome of PRV (10 segments) from western Canada and Chile. The genetic diversity among the PRV strains and their relationship to Norwegian PRV isolates were determined by phylogenetic analyses and sequence identity comparisons. Results PRV is distantly related to members of the genera Orthoreovirus and Aquareovirus and an unambiguous new genus within the family Reoviridae. The Canadian and Norwegian PRV strains are most divergent in the segment S1 and S4 encoded proteins. Phylogenetic analysis of PRV S1 sequences, for which the largest number of complete sequences from different “isolates” is available, grouped Norwegian PRV strains into a single genotype, Genotype I, with sub-genotypes, Ia and Ib. The Canadian PRV strains matched sub-genotype Ia and Chilean PRV strains matched sub-genotype Ib. Conclusions PRV should be considered as a member of a new genus within the family Reoviridae with two major Norwegian sub-genotypes. The Canadian PRV diverged from Norwegian sub-genotype Ia around 2007 ± 1, whereas the Chilean PRV diverged from Norwegian sub-genotype Ib around 2008 ± 1. PMID:23844948
Sikorav, J L; Duval, N; Anselmet, A; Bon, S; Krejci, E; Legay, C; Osterlund, M; Reimund, B; Massoulié, J
1988-01-01
In this paper, we show the existence of alternative splicing in the 3' region of the coding sequence of Torpedo acetylcholinesterase (AChE). We describe two cDNA structures which both diverge from the previously described coding sequence of the catalytic subunit of asymmetric (A) forms (Schumacher et al., 1986; Sikorav et al., 1987). They both contain a coding sequence followed by a non-coding sequence and a poly(A) stretch. Both of these structures were shown to exist in poly(A)+ RNAs, by S1 mapping experiments. The divergent region encoded by the first sequence corresponds to the precursor of the globular dimeric form (G2a), since it contains the expected C-terminal amino acids, Ala-Cys. These amino acids are followed by a 29 amino acid extension which contains a hydrophobic segment and must be replaced by a glycolipid in the mature protein. Analyses of intact G2a AChE showed that the common domain of the protein contains intersubunit disulphide bonds. The divergent region of the second type of cDNA consists of an adjacent genomic sequence, which is removed as an intron in A and Ga mRNAs, but may encode a distinct, less abundant catalytic subunit. The structures of the cDNA clones indicate that they are derived from minor mRNAs, shorter than the three major transcripts which have been described previously (14.5, 10.5 and 5.5 kb). Oligonucleotide probes specific for the asymmetric and globular terminal regions hybridize with the three major transcripts, indicating that their size is determined by 3'-untranslated regions which are not related to the differential splicing leading to A and Ga forms. Images PMID:3181125
Jiang, Yuan; Yang, Zhongqi; Wang, Xiaoyi; Hou, Yuxia
2015-01-01
The species belonging to Sclerodermus (Hymenoptera: Bethylidae) are currently the most important insect natural enemies of wood borer pests, mainly buprestid and cerambycid beetles, in China. However, some sibling species of this genus are very difficult to distinguish because of their similar morphological features. To address this issue, we conducted phylogenetic and genetic analyses of cytochrome oxidase subunit I (COI) and 28S RNA gene sequences from eight species of Sclerodermus reared from different wood borer pests. The eight sibling species were as follows: S. guani Xiao et Wu, S. sichuanensis Xiao, S. pupariae Yang et Yao, and Sclerodermus spp. (Nos. 1–5). A 594-bp fragment of COI and 750-bp fragment of 28S were subsequently sequenced. For COI, the G-C content was found to be low in all the species, averaging to about 30.0%. Sequence divergences (Kimura-2-parameter distances) between congeneric species averaged to 4.5%, and intraspecific divergences averaged to about 0.09%. Further, the maximum sequence divergences between congeneric species and Sclerodermus sp. (No. 5) averaged to about 16.5%. All 136 samples analyzed were included in six reciprocally monophyletic clades in the COI neighbor-joining (NJ) tree. The NJ tree inferred from the 28S rRNA sequence yielded almost identical results, but the samples from S. guani, S. sichuanensis, S. pupariae, and Sclerodermus spp. (Nos. 1–4) clustered together and only Sclerodermus sp. (No. 5) clustered separately. Our findings indicate that the standard barcode region of COI can be efficiently used to distinguish morphologically similar Sclerodermus species. Further, we speculate that Sclerodermus sp. (No. 5) might be a new species of Sclerodermus. PMID:25782000
Identification of a divergent genotype of equine arteritis virus from South American donkeys.
Rivas, J; Neira, V; Mena, J; Brito, B; Garcia, A; Gutierrez, C; Sandoval, D; Ortega, R
2017-12-01
A novel equine arteritis virus (EAV) was isolated and sequenced from feral donkeys in Chile. Phylogenetic analysis indicates that the new virus and South African asinine strains diverged at least 100 years from equine EAV strains. The results indicate that asinine strains belonged to a different EAV genotype. © 2017 Blackwell Verlag GmbH.
Medzihradszky, K F; Gibson, B W; Kaur, S; Yu, Z H; Medzihradszky, D; Burlingame, A L; Bass, N M
1992-02-01
The primary structure of a fatty-acid-binding protein (FABP) isolated from the liver of the nurse shark (Ginglymostoma cirratum) was determined by high-performance tandem mass spectrometry (employing multichannel array detection) and Edman degradation. Shark liver FABP consists of 132 amino acids with an acetylated N-terminal valine. The chemical molecular mass of the intact protein determined by electrospray ionization mass spectrometry (Mr = 15124 +/- 2.5) was in good agreement with that calculated from the amino acid sequence (Mr = 15121.3). The amino acid sequence of shark liver FABP displays significantly greater similarity to the FABP expressed in mammalian heart, peripheral nerve myelin and adipose tissue (61-53% sequence similarity) than to the FABP expressed in mammalian liver (22% similarity). Phylogenetic trees derived from the comparison of the shark liver FABP amino acid sequence with the members of the mammalian fatty-acid/retinoid-binding protein gene family indicate the initial divergence of an ancestral gene into two major subfamilies: one comprising the genes for mammalian liver FABP and gastrotropin, the other comprising the genes for mammalian cellular retinol-binding proteins I and II, cellular retinoic-acid-binding protein myelin P2 protein, adipocyte FABP, heart FABP and shark liver FABP, the latter having diverged from the ancestral gene that ultimately gave rise to the present day mammalian heart-FABP, adipocyte FABP and myelin P2 protein sequences. The sequence for intestinal FABP from the rat could be assigned to either subfamily, depending on the approach used for phylogenetic tree construction, but clearly diverged at a relatively early evolutionary time point. Indeed, sequences proximately ancestral or closely related to mammalian intestinal FABP, liver FABP, gastrotropin and the retinoid-binding group of proteins appear to have arisen prior to the divergence of shark liver FABP and should therefore also be present in elasmobranchs. The presence in shark liver of an FABP which differs substantially in primary structure from mammalian liver FABP, while being closely related to the FABP expressed in mammalian heart muscle, peripheral nerve myelin and adipocytes, opens a further dimension regarding the question of the existence of structure-dependent and tissue-specific specialization of FABP function in lipid metabolism.
Hass-Jacobus, Barbara L; Futrell-Griggs, Montona; Abernathy, Brian; Westerman, Rick; Goicoechea, Jose-Luis; Stein, Joshua; Klein, Patricia; Hurwitz, Bonnie; Zhou, Bin; Rakhshan, Fariborz; Sanyal, Abhijit; Gill, Navdeep; Lin, Jer-Young; Walling, Jason G; Luo, Mei Zhong; Ammiraju, Jetty Siva S; Kudrna, Dave; Kim, Hye Ran; Ware, Doreen; Wing, Rod A; Miguel, Phillip San; Jackson, Scott A
2006-01-01
Background With the completion of the genome sequence for rice (Oryza sativa L.), the focus of rice genomics research has shifted to the comparison of the rice genome with genomes of other species for gene cloning, breeding, and evolutionary studies. The genus Oryza includes 23 species that shared a common ancestor 8–10 million years ago making this an ideal model for investigations into the processes underlying domestication, as many of the Oryza species are still undergoing domestication. This study integrates high-throughput, hybridization-based markers with BAC end sequence and fingerprint data to construct physical maps of rice chromosome 1 orthologues in two wild Oryza species. Similar studies were undertaken in Sorghum bicolor, a species which diverged from cultivated rice 40–50 million years ago. Results Overgo markers, in conjunction with fingerprint and BAC end sequence data, were used to build sequence-ready BAC contigs for two wild Oryza species. The markers drove contig merges to construct physical maps syntenic to rice chromosome 1 in the wild species and provided evidence for at least one rearrangement on chromosome 1 of the O. sativa versus Oryza officinalis comparative map. When rice overgos were aligned to available S. bicolor sequence, 29% of the overgos aligned with three or fewer mismatches; of these, 41% gave positive hybridization signals. Overgo hybridization patterns supported colinearity of loci in regions of sorghum chromosome 3 and rice chromosome 1 and suggested that a possible genomic inversion occurred in this syntenic region in one of the two genomes after the divergence of S. bicolor and O. sativa. Conclusion The results of this study emphasize the importance of identifying conserved sequences in the reference sequence when designing overgo probes in order for those probes to hybridize successfully in distantly related species. As interspecific markers, overgos can be used successfully to construct physical maps in species which diverged less than 8 million years ago, and can be used in a more limited fashion to examine colinearity among species which diverged as much as 40 million years ago. Additionally, overgos are able to provide evidence of genomic rearrangements in comparative physical mapping studies. PMID:16895597
Suchan, Tomasz; Espíndola, Anahí; Rutschmann, Sereina; Emerson, Brent C; Gori, Kevin; Dessimoz, Christophe; Arrigo, Nils; Ronikier, Michał; Alvarez, Nadir
2017-09-01
Determining phylogenetic relationships among recently diverged species has long been a challenge in evolutionary biology. Cytoplasmic DNA markers, which have been widely used, notably in the context of molecular barcoding, have not always proved successful in resolving such phylogenies. However, with the advent of next-generation-sequencing technologies and associated techniques of reduced genome representation, phylogenies of closely related species have been resolved at a much higher detail in the last couple of years. Here we examine the potential and limitations of one of such techniques-Restriction-site Associated DNA (RAD) sequencing, a method that produces thousands of (mostly) anonymous nuclear markers, in disentangling the phylogeny of the fly genus Chiastocheta (Diptera: Anthomyiidae). In Europe, this genus encompasses seven species of seed predators, which have been widely studied in the context of their ecological and evolutionary interactions with the plant Trollius europaeus (Ranunculaceae). So far, phylogenetic analyses using mitochondrial markers failed to resolve monophyly of most of the species from this recently diversified genus, suggesting that their taxonomy may need a revision. However, relying on a single, non-recombining marker and ignoring potential incongruences between mitochondrial and nuclear loci may provide an incomplete account of the lineage history. In this study, we applied both classical Sanger sequencing of three mtDNA regions and RAD-sequencing, for reconstructing the phylogeny of the genus. Contrasting with results based on mitochondrial markers, RAD-sequencing analyses retrieved the monophyly of all seven species, in agreement with the morphological species assignment. We found robust nuclear-based species assignment of individual samples, and low levels of estimated contemporary gene flow among them. However, despite recovering species' monophyly, interspecific relationships varied depending on the set of RAD loci considered, producing contradictory topologies. Moreover, coalescence-based phylogenetic analyses revealed low supports for most of the interspecific relationships. Our results indicate that despite the higher performance of RAD-sequencing in terms of species trees resolution compared to cytoplasmic markers, reconstructing inter-specific relationships among recently-diverged lineages may lie beyond the possibilities offered by large sets of RAD-sequencing markers in cases of strong gene tree incongruence. Copyright © 2017 Elsevier Inc. All rights reserved.
Detection of a divergent variant of grapevine virus F by next-generation sequencing.
Molenaar, Nicholas; Burger, Johan T; Maree, Hans J
2015-08-01
The complete genome sequence of a South African isolate of grapevine virus F (GVF) is presented. It was first detected by metagenomic next-generation sequencing of field samples and validated through direct Sanger sequencing. The genome sequence of GVF isolate V5 consists of 7539 nucleotides and contains a poly(A) tail. It has a typical vitivirus genome arrangement that comprises five open reading frames (ORFs), which share only 88.96 % nucleotide sequence identity with the existing complete GVF genome sequence (JX105428).
[A study on identification of edible bird's nests by DNA barcodes].
Chen, Yue-Juan; Liu, Wen-Jian; Chen, Dan-Na; Chieng, Sing-Hock; Jiang, Lin
2017-12-01
To provide theoretical basis for the traceability and quality evaluation of edible bird's nests (EBNs), the Cytb sequence was applied to identify the origin of EBNs. A total of 39 experiment samples were collected from Malaysia, Indonesia, Vietnam and Thailand. Genomic DNA was extracted for the PCR reaction. The amplified products were sequenced. 36 sequences were downloaded from Gen Bank including edible nest swiftlet, black nest swiftlet, mascarene swiftlet, pacific swiftlet and germain's swiftlet. MEGA 7.0 was used to analyze the distinction of sequences by the method of calculating the distances in intraspecific and interspecific divergences and constructing NJ and UPMGA phylogenetic tree based on Kimera-2-parameter model. The results showed that 39 samples were from three kinds of EBNs. Interspecific divergences were significantly greater than the intraspecific one. Samples could be successfully distinguished by NJ and UPMGA phylogenetic tree. In conclusion, Cytb sequence could be used to distinguish the origin of EBNs and it is efficient for tracing the origin species of EBNs. Copyright© by the Chinese Pharmaceutical Association.
Fernandes, A P; Nelson, K; Beverley, S M
1993-01-01
Molecular evolutionary relationships within the protozoan order Kinetoplastida were deduced from comparisons of the nuclear small and large subunit ribosomal RNA (rRNA) gene sequences. These studies show that relationships among the trypanosomatid protozoans differ from those previously proposed from studies of organismal characteristics or mitochondrial rRNAs. The genera Leishmania, Endotrypanum, Leptomonas, and Crithidia form a closely related group, which shows progressively more distant relationships to Phytomonas and Blastocrithidia, Trypanosoma cruzi, and lastly Trypanosoma brucei. The rooting of the trypanosomatid tree was accomplished by using Bodo caudatus (family Bodonidae) as an outgroup, a status confirmed by molecular comparisons with other eukaryotes. The nuclear rRNA tree agrees well with data obtained from comparisons of other nuclear genes. Differences with the proposed mitochondrial rRNA tree probably reflect the lack of a suitable outgroup for this tree, as the topologies are otherwise similar. Small subunit rRNA divergences within the trypanosomatids are large, approaching those among plants and animals, which underscores the evolutionary antiquity of the group. Analysis of the distribution of different parasitic life-styles of these species in conjunction with a probable timing of evolutionary divergences suggests that vertebrate parasitism arose multiple times in the trypanosomatids. PMID:8265597
African genetic diversity provides novel insights into evolutionary history and local adaptations.
Choudhury, Ananyo; Aron, Shaun; Sengupta, Dhriti; Hazelhurst, Scott; Ramsay, Michèle
2018-05-08
Genetic variation and susceptibility to disease are shaped by human demographic history. We can now study the genomes of extant Africans and uncover traces of population migration, admixture, assimilation and selection by applying sophisticated computational algorithms. There are four major ethnolinguistic divisions among present day Africans: Hunter-gatherer populations in southern and central Africa; Nilo-Saharan speakers from north and northeast Africa; Afro-Asiatic speakers from east Africa; and Niger-Congo speakers who are the predominant ethnolinguistic group spread across most of sub-Saharan Africa. The enormous ethnolinguistic diversity in sub-Saharan African populations is largely paralleled by extensive genetic diversity and until a decade ago, little was known about the origins and divergence of these groups. Results from large-scale population genetic studies, and more recently whole genome sequence data, are unraveling the critical role of events like migration and admixture and environment factors including diet, infectious diseases and climatic conditions in shaping current population diversity. It is now possible to start providing quantitative estimates of divergence times, population size and dynamic processes that have affected populations and their genetic risk for disease. Finally, the availability of ancient genomes from Africa is providing historical insights of unprecedented depth. In this review, we highlight some key interpretations that have emerged from recent African genome studies.
Comparing and combining distance-based and character-based approaches for barcoding turtles.
Reid, B N; LE, M; McCord, W P; Iverson, J B; Georges, A; Bergmann, T; Amato, G; Desalle, R; Naro-Maciel, E
2011-11-01
Molecular barcoding can serve as a powerful tool in wildlife forensics and may prove to be a vital aid in conserving organisms that are threatened by illegal wildlife trade, such as turtles (Order Testudines). We produced cytochrome oxidase subunit one (COI) sequences (650 bp) for 174 turtle species and combined these with publicly available sequences for 50 species to produce a data set representative of the breadth of the order. Variability within the barcode region was assessed, and the utility of both distance-based and character-based methods for species identification was evaluated. For species in which genetic material from more than one individual was available (n = 69), intraspecific divergences were 1.3% on average, although divergences greater than the customary 2% barcode threshold occurred within 15 species. High intraspecific divergences could indicate species with a high degree of internal genetic structure or possibly even cryptic species, although introgression is also probable in some of these taxa. Divergences between species of the same genus were 6.4% on average; however, 49 species were <2% divergent from congeners. Low levels of interspecific divergence could be caused by recent evolutionary radiations coupled with the low rates of mtDNA evolution previously observed in turtles. Complementing distance-based barcoding with character-based methods for identifying diagnostic sets of nucleotides provided better resolution in several cases where distance-based methods failed to distinguish species. An online identification engine was created to provide character-based identifications. This study constitutes the first comprehensive barcoding effort for this seriously threatened order. © 2011 Blackwell Publishing Ltd.
Zill, Oliver A.; Scannell, Devin R.; Kuei, Jeffrey; Sadhu, Meru; Rine, Jasper
2012-01-01
The genetic bases for species-specific traits are widely sought, but reliable experimental methods with which to identify functionally divergent genes are lacking. In the Saccharomyces genus, interspecies complementation tests can be used to evaluate functional conservation and divergence of biological pathways or networks. Silent information regulator (SIR) proteins in S. bayanus provide an ideal test case for this approach because they show remarkable divergence in sequence and paralog number from those found in the closely related S. cerevisiae. We identified genes required for silencing in S. bayanus using a genetic screen for silencing-defective mutants. Complementation tests in interspecies hybrids identified an evolutionarily conserved Sir-protein-based silencing machinery, as defined by two interspecies complementation groups (SIR2 and SIR3). However, recessive mutations in S. bayanus SIR4 isolated from this screen could not be complemented by S. cerevisiae SIR4, revealing species-specific functional divergence in the Sir4 protein despite conservation of the overall function of the Sir2/3/4 complex. A cladistic complementation series localized the occurrence of functional changes in SIR4 to the S. cerevisiae and S. paradoxus branches of the Saccharomyces phylogeny. Most of this functional divergence mapped to sequence changes in the Sir4 PAD. Finally, a hemizygosity modifier screen in the interspecies hybrids identified additional genes involved in S. bayanus silencing. Thus, interspecies complementation tests can be used to identify (1) mutations in genetically underexplored organisms, (2) loci that have functionally diverged between species, and (3) evolutionary events of functional consequence within a genus. PMID:22923378
Chakraborty, Ujani; George, Carolyn M.; Lyndaker, Amy M.; Alani, Eric
2016-01-01
Single-strand annealing (SSA) is an important homologous recombination mechanism that repairs DNA double strand breaks (DSBs) occurring between closely spaced repeat sequences. During SSA, the DSB is acted upon by exonucleases to reveal complementary sequences that anneal and are then repaired through tail clipping, DNA synthesis, and ligation steps. In baker’s yeast, the Msh DNA mismatch recognition complex and the Sgs1 helicase act to suppress SSA between divergent sequences by binding to mismatches present in heteroduplex DNA intermediates and triggering a DNA unwinding mechanism known as heteroduplex rejection. Using baker’s yeast as a model, we have identified new factors and regulatory steps in heteroduplex rejection during SSA. First we showed that Top3-Rmi1, a topoisomerase complex that interacts with Sgs1, is required for heteroduplex rejection. Second, we found that the replication processivity clamp proliferating cell nuclear antigen (PCNA) is dispensable for heteroduplex rejection, but is important for repairing mismatches formed during SSA. Third, we showed that modest overexpression of Msh6 results in a significant increase in heteroduplex rejection; this increase is due to a compromise in Msh2-Msh3 function required for the clipping of 3′ tails. Thus 3′ tail clipping during SSA is a critical regulatory step in the repair vs. rejection decision; rejection is favored before the 3′ tails are clipped. Unexpectedly, Msh6 overexpression, through interactions with PCNA, disrupted heteroduplex rejection between divergent sequences in another recombination substrate. These observations illustrate the delicate balance that exists between repair and replication factors to optimize genome stability. PMID:26680658
A Comparative Encyclopedia of DNA Elements in the Mouse Genome
Yue, Feng; Cheng, Yong; Breschi, Alessandra; Vierstra, Jeff; Wu, Weisheng; Ryba, Tyrone; Sandstrom, Richard; Ma, Zhihai; Davis, Carrie; Pope, Benjamin D.; Shen, Yin; Pervouchine, Dmitri D.; Djebali, Sarah; Thurman, Bob; Kaul, Rajinder; Rynes, Eric; Kirilusha, Anthony; Marinov, Georgi K.; Williams, Brian A.; Trout, Diane; Amrhein, Henry; Fisher-Aylor, Katherine; Antoshechkin, Igor; DeSalvo, Gilberto; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Zaleski, Chris; Dobin, Alex; Prieto, Pablo; Lagarde, Julien; Bussotti, Giovanni; Tanzer, Andrea; Denas, Olgert; Li, Kanwei; Bender, M. A.; Zhang, Miaohua; Byron, Rachel; Groudine, Mark T.; McCleary, David; Pham, Long; Ye, Zhen; Kuan, Samantha; Edsall, Lee; Wu, Yi-Chieh; Rasmussen, Matthew D.; Bansal, Mukul S.; Keller, Cheryl A.; Morrissey, Christapher S.; Mishra, Tejaswini; Jain, Deepti; Dogan, Nergiz; Harris, Robert S.; Cayting, Philip; Kawli, Trupti; Boyle, Alan P.; Euskirchen, Ghia; Kundaje, Anshul; Lin, Shin; Lin, Yiing; Jansen, Camden; Malladi, Venkat S.; Cline, Melissa S.; Erickson, Drew T.; Kirkup, Vanessa M; Learned, Katrina; Sloan, Cricket A.; Rosenbloom, Kate R.; de Sousa, Beatriz Lacerda; Beal, Kathryn; Pignatelli, Miguel; Flicek, Paul; Lian, Jin; Kahveci, Tamer; Lee, Dongwon; Kent, W. James; Santos, Miguel Ramalho; Herrero, Javier; Notredame, Cedric; Johnson, Audra; Vong, Shinny; Lee, Kristen; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Canfield, Theresa; Sabo, Peter J.; Wilken, Matthew S.; Reh, Thomas A.; Giste, Erika; Shafer, Anthony; Kutyavin, Tanya; Haugen, Eric; Dunn, Douglas; Reynolds, Alex P.; Neph, Shane; Humbert, Richard; Hansen, R. Scott; De Bruijn, Marella; Selleri, Licia; Rudensky, Alexander; Josefowicz, Steven; Samstein, Robert; Eichler, Evan E.; Orkin, Stuart H.; Levasseur, Dana; Papayannopoulou, Thalia; Chang, Kai-Hsin; Skoultchi, Arthur; Gosh, Srikanta; Disteche, Christine; Treuting, Piper; Wang, Yanli; Weiss, Mitchell J.; Blobel, Gerd A.; Good, Peter J.; Lowdon, Rebecca F.; Adams, Leslie B.; Zhou, Xiao-Qiao; Pazin, Michael J.; Feingold, Elise A.; Wold, Barbara; Taylor, James; Kellis, Manolis; Mortazavi, Ali; Weissman, Sherman M.; Stamatoyannopoulos, John; Snyder, Michael P.; Guigo, Roderic; Gingeras, Thomas R.; Gilbert, David M.; Hardison, Ross C.; Beer, Michael A.; Ren, Bing
2014-01-01
Summary As the premier model organism in biomedical research, the laboratory mouse shares the majority of protein-coding genes with humans, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications, and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of other sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases. PMID:25409824
A comparative encyclopedia of DNA elements in the mouse genome.
Yue, Feng; Cheng, Yong; Breschi, Alessandra; Vierstra, Jeff; Wu, Weisheng; Ryba, Tyrone; Sandstrom, Richard; Ma, Zhihai; Davis, Carrie; Pope, Benjamin D; Shen, Yin; Pervouchine, Dmitri D; Djebali, Sarah; Thurman, Robert E; Kaul, Rajinder; Rynes, Eric; Kirilusha, Anthony; Marinov, Georgi K; Williams, Brian A; Trout, Diane; Amrhein, Henry; Fisher-Aylor, Katherine; Antoshechkin, Igor; DeSalvo, Gilberto; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Zaleski, Chris; Dobin, Alex; Prieto, Pablo; Lagarde, Julien; Bussotti, Giovanni; Tanzer, Andrea; Denas, Olgert; Li, Kanwei; Bender, M A; Zhang, Miaohua; Byron, Rachel; Groudine, Mark T; McCleary, David; Pham, Long; Ye, Zhen; Kuan, Samantha; Edsall, Lee; Wu, Yi-Chieh; Rasmussen, Matthew D; Bansal, Mukul S; Kellis, Manolis; Keller, Cheryl A; Morrissey, Christapher S; Mishra, Tejaswini; Jain, Deepti; Dogan, Nergiz; Harris, Robert S; Cayting, Philip; Kawli, Trupti; Boyle, Alan P; Euskirchen, Ghia; Kundaje, Anshul; Lin, Shin; Lin, Yiing; Jansen, Camden; Malladi, Venkat S; Cline, Melissa S; Erickson, Drew T; Kirkup, Vanessa M; Learned, Katrina; Sloan, Cricket A; Rosenbloom, Kate R; Lacerda de Sousa, Beatriz; Beal, Kathryn; Pignatelli, Miguel; Flicek, Paul; Lian, Jin; Kahveci, Tamer; Lee, Dongwon; Kent, W James; Ramalho Santos, Miguel; Herrero, Javier; Notredame, Cedric; Johnson, Audra; Vong, Shinny; Lee, Kristen; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Canfield, Theresa; Sabo, Peter J; Wilken, Matthew S; Reh, Thomas A; Giste, Erika; Shafer, Anthony; Kutyavin, Tanya; Haugen, Eric; Dunn, Douglas; Reynolds, Alex P; Neph, Shane; Humbert, Richard; Hansen, R Scott; De Bruijn, Marella; Selleri, Licia; Rudensky, Alexander; Josefowicz, Steven; Samstein, Robert; Eichler, Evan E; Orkin, Stuart H; Levasseur, Dana; Papayannopoulou, Thalia; Chang, Kai-Hsin; Skoultchi, Arthur; Gosh, Srikanta; Disteche, Christine; Treuting, Piper; Wang, Yanli; Weiss, Mitchell J; Blobel, Gerd A; Cao, Xiaoyi; Zhong, Sheng; Wang, Ting; Good, Peter J; Lowdon, Rebecca F; Adams, Leslie B; Zhou, Xiao-Qiao; Pazin, Michael J; Feingold, Elise A; Wold, Barbara; Taylor, James; Mortazavi, Ali; Weissman, Sherman M; Stamatoyannopoulos, John A; Snyder, Michael P; Guigo, Roderic; Gingeras, Thomas R; Gilbert, David M; Hardison, Ross C; Beer, Michael A; Ren, Bing
2014-11-20
The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
Molecular phylogeny of the Drusinae (Trichoptera: Limnephilidae): preliminary results
NASA Astrophysics Data System (ADS)
Pauls, S.; Lumbsch, T.; Haase, P.
2005-05-01
We examine the phylogenetic relationships within the subfamily of the Drusinae using molecular markers. Sequence data from two mitochondrial loci (mitochondrial cytochrome oxidase I, mitochondrial ribosomal large subunit) are used to infer the relationships within and among the genera of the Drusinae. Sequence data were generated for 21 taxa from five genera from the subfamily. The molecular data were analyzed using a Bayesian Markov Chain Monte Carlo and a Maximum Parsimony approach for both single gene and combined data sets. Several hypotheses of relationships previously inferred based on morphological characters were tested. The study revealed a very close relationship between Drusus discolor and D. romanicus suggesting that divergence between these two species occurred recently. The relationships inferred by molecular data suggest that larval morphology may be an important taxonomic character, which has often been neglected. The data also indicate that the genera Ecclisopteryx and Drusus are polyphyletic with respect to one another.
McKeon, Sascha Naomi; Moreno, Marta; Sallum, Maria Anise; Povoa, Marinete Marins; Conn, Jan Evelyn
2013-01-01
To evaluate whether environmental heterogeneity contributes to the genetic heterogeneity in Anopheles triannulatus, larval habitat characteristics across the Brazilian states of Roraima and Pará and genetic sequences were examined. A comparison with Anopheles goeldii was utilised to determine whether high genetic diversity was unique to An. triannulatus. Student t test and analysis of variance found no differences in habitat characteristics between the species. Analysis of population structure of An. triannulatus and An. goeldii revealed distinct demographic histories in a largely overlapping geographic range. Cytochrome oxidase I sequence parsimony networks found geographic clustering for both species; however nuclear marker networks depicted An. triannulatus with a more complex history of fragmentation, secondary contact and recent divergence. Evidence of Pleistocene expansions suggests both species are more likely to be genetically structured by geographic and ecological barriers than demography. We hypothesise that niche partitioning is a driving force for diversity, particularly in An. triannulatus. PMID:23903977
Saisawang, Chonticha; Ketterman, Albert J.
2014-01-01
Glutathione transferases (GST) are an ancient superfamily comprising a large number of paralogous proteins in a single organism. This multiplicity of GSTs has allowed the copies to diverge for neofunctionalization with proposed roles ranging from detoxication and oxidative stress response to involvement in signal transduction cascades. We performed a comparative genomic analysis using FlyBase annotations and Drosophila melanogaster GST sequences as templates to further annotate the GST orthologs in the 12 Drosophila sequenced genomes. We found that GST genes in the Drosophila subgenera have undergone repeated local duplications followed by transposition, inversion, and micro-rearrangements of these copies. The colinearity and orientations of the orthologous GST genes appear to be unique in many of the species which suggests that genomic rearrangement events have occurred multiple times during speciation. The high micro-plasticity of the genomes appears to have a functional contribution utilized for evolution of this gene family. PMID:25310450
DRS is far less divergent than streptococcal inhibitor of complement of group A streptococcus.
Sagar, Vivek; Kumar, Rajesh; Ganguly, Nirmal K; Menon, Thangam; Chakraborti, Anuradha
2007-04-01
When 100 group A streptococcus isolates were screened, drs, a variant of sic, was identified in emm12 and emm55 isolates. Molecular characterization showed that the drs gene sequence is highly conserved, unlike the sic gene sequence. However, the variation in gene size observed was due to the presence of extra internal repeat sequences.
DRS Is Far Less Divergent than Streptococcal Inhibitor of Complement of Group A Streptococcus▿
Sagar, Vivek; Kumar, Rajesh; Ganguly, Nirmal K.; Menon, Thangam; Chakraborti, Anuradha
2007-01-01
When 100 group A streptococcus isolates were screened, drs, a variant of sic, was identified in emm12 and emm55 isolates. Molecular characterization showed that the drs gene sequence is highly conserved, unlike the sic gene sequence. However, the variation in gene size observed was due to the presence of extra internal repeat sequences. PMID:17237170
Deep Sequencing Reveals a Divergent Ugandan cassava brown streak virus Isolate from Malawi
Winter, Stephan; Mukasa, Settumba; Tairo, Fred; Sseruwagi, Peter; Ndunguru, Joseph; Duffy, Siobain
2017-01-01
ABSTRACT Illumina sequencing of RNA from a cassava cutting from northern Malawi produced a genome of Ugandan cassava brown streak virus (UCBSV-MW-NB7_2013). Sequence comparisons revealed stronger similarity to an isolate from nearby Tanzania (93.4% pairwise nucleotide identity) than to those previously reported from Malawi (86.9 to 87.0%). PMID:28818908
Diversity and phylogenetic relationships among Bartonella strains from Thai bats.
McKee, Clifton D; Kosoy, Michael Y; Bai, Ying; Osikowicz, Lynn M; Franka, Richard; Gilbert, Amy T; Boonmar, Sumalee; Rupprecht, Charles E; Peruski, Leonard F
2017-01-01
Bartonellae are phylogenetically diverse, intracellular bacteria commonly found in mammals. Previous studies have demonstrated that bats have a high prevalence and diversity of Bartonella infections globally. Isolates (n = 42) were obtained from five bat species in four provinces of Thailand and analyzed using sequences of the citrate synthase gene (gltA). Sequences clustered into seven distinct genogroups; four of these genogroups displayed similarity with Bartonella spp. sequences from other bats in Southeast Asia, Africa, and Eastern Europe. Thirty of the isolates representing these seven genogroups were further characterized by sequencing four additional loci (ftsZ, nuoG, rpoB, and ITS) to clarify their evolutionary relationships with other Bartonella species and to assess patterns of diversity among strains. Among the seven genogroups, there were differences in the number of sequence variants, ranging from 1-5, and the amount of nucleotide divergence, ranging from 0.035-3.9%. Overall, these seven genogroups meet the criteria for distinction as novel Bartonella species, with sequence divergence among genogroups ranging from 6.4-15.8%. Evidence of intra- and intercontinental phylogenetic relationships and instances of homologous recombination among Bartonella genogroups in related bat species were found in Thai bats.
Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso
2016-01-01
Protein–protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein–protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein–protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein–protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach. PMID:27965389
Rodriguez-Rivas, Juan; Marsili, Simone; Juan, David; Valencia, Alfonso
2016-12-27
Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.
Janes, Holly; Frahm, Nicole; DeCamp, Allan; Rolland, Morgane; Gabriel, Erin; Wolfson, Julian; Hertz, Tomer; Kallas, Esper; Goepfert, Paul; Friedrich, David P.; Corey, Lawrence; Mullins, James I.; McElrath, M. Juliana; Gilbert, Peter
2012-01-01
Background The sieve analysis for the Step trial found evidence that breakthrough HIV-1 sequences for MRKAd5/HIV-1 Gag/Pol/Nef vaccine recipients were more divergent from the vaccine insert than placebo sequences in regions with predicted epitopes. We linked the viral sequence data with immune response and acute viral load data to explore mechanisms for and consequences of the observed sieve effect. Methods Ninety-one male participants (37 placebo and 54 vaccine recipients) were included; viral sequences were obtained at the time of HIV-1 diagnosis. T-cell responses were measured 4 weeks post-second vaccination and at the first or second week post-diagnosis. Acute viral load was obtained at RNA-positive and antibody-negative visits. Findings Vaccine recipients had a greater magnitude of post-infection CD8+ T cell response than placebo recipients (median 1.68% vs 1.18%; p = 0·04) and greater breadth of post-infection response (median 4.5 vs 2; p = 0·06). Viral sequences for vaccine recipients were marginally more divergent from the insert than placebo sequences in regions of Nef targeted by pre-infection immune responses (p = 0·04; Pol p = 0·13; Gag p = 0·89). Magnitude and breadth of pre-infection responses did not correlate with distance of the viral sequence to the insert (p>0·50). Acute log viral load trended lower in vaccine versus placebo recipients (estimated mean 4·7 vs 5·1) but the difference was not significant (p = 0·27). Neither was acute viral load associated with distance of the viral sequence to the insert (p>0·30). Interpretation Despite evidence of anamnestic responses, the sieve effect was not well explained by available measures of T-cell immunogenicity. Sequence divergence from the vaccine was not significantly associated with acute viral load. While point estimates suggested weak vaccine suppression of viral load, the result was not significant and more viral load data would be needed to detect suppression. PMID:22952672
NASA Astrophysics Data System (ADS)
Juchum, Fabrício Sacramento; Costa, Marco Antônio; Amorim, André Márcio; Corrêa, Ronan Xavier
2008-11-01
Caesalpinia echinata (brazilwood or Pernambuco wood) comprises a complex of three morphological leaf variants, characterized by differences in the number and size of the pinnae and leaflets, and occurring in allopatric and sympatric populations. The present study evaluates the utility of the chloroplast DNA trnL intron in a phylogenetic analysis of the three leaf variants along with other species of Caesalpinia and generic relatives. Our study supports the hypothesis that the name C. echinata designates a species complex and provides evidence that one of the forms, the highly divergent C. echinata large-leafleted variant, represents a distinct taxon.
The augmentation algorithm and molecular phylogenetic trees
NASA Technical Reports Server (NTRS)
Holmquist, R.
1978-01-01
Moore's (1977) augmentation procedure is discussed, and it is concluded that the procedure is valid for obtaining estimates of the total number of fixed nucleotide substitutions both theoretically and in practice, for both simulated and real data, and in agreement, for experimentally dense data sets, with stochastic estimates of the divergence, provided the restrictions on codon mutability resulting from natural selection are explicitly allowed for. Tateno and Nei's (1978) critique that the augmentation procedure has a systematic bias toward overestimation of the total number of nucleotide replacements is disputed, and a data analysis suggests that ancestral sequences inferred by the method of parsimony contain a large number of incorrectly assigned nucleotides.
Garcia-Cisneros, Alex; Palacín, Creu; Ventura, Carlos Renato Rezende; Feital, Barbara; Paiva, Paulo Cesar; Pérez-Portela, Rocío
2018-02-01
Intraspecific genetic diversity and divergence have a large influence on the adaption and evolutionary potential of species. The widely distributed starfish, Coscinasterias tenuispina, combines sexual reproduction with asexual reproduction via fission. Here we analyse the phylogeography of this starfish to reveal historical and contemporary processes driving its intraspecific genetic divergence. We further consider whether asexual reproduction is the most important method of propagation throughout the distribution range of this species. Our study included 326 individuals from 16 populations, covering most of the species' distribution range. A total of 12 nuclear microsatellite loci and sequences of the mitochondrial cytochrome c oxidase subunit I (COI) gene were analysed. COI and microsatellites were clustered in two isolated lineages: one found along the southwestern Atlantic and the other along the northeastern Atlantic and Mediterranean Sea. This suggests the existence of two different evolutionary units. Marine barriers along the European coast would be responsible for population clustering: the Almeria-Oran Front that limits the entrance of migrants from the Atlantic to the Mediterranean, and the Siculo-Tunisian strait that divides the two Mediterranean basins. The presence of identical genotypes was detected in all populations, although two monoclonal populations were found in two sites where annual mean temperatures and minimum values were the lowest. Our results based on microsatellite loci showed that intrapopulation genetic diversity was significantly affected by clonality whereas it had lower effect for the global phylogeography of the species, although still some impact on populations' genetic divergence could be observed between some populations. © 2017 John Wiley & Sons Ltd.
A Mitogenomic Phylogeny of Living Primates
Finstermeier, Knut; Zinner, Dietmar; Brameier, Markus; Meyer, Matthias; Kreuz, Eva; Hofreiter, Michael; Roos, Christian
2013-01-01
Primates, the mammalian order including our own species, comprise 480 species in 78 genera. Thus, they represent the third largest of the 18 orders of eutherian mammals. Although recent phylogenetic studies on primates are increasingly built on molecular datasets, most of these studies have focused on taxonomic subgroups within the order. Complete mitochondrial (mt) genomes have proven to be extremely useful in deciphering within-order relationships even up to deep nodes. Using 454 sequencing, we sequenced 32 new complete mt genomes adding 20 previously not represented genera to the phylogenetic reconstruction of the primate tree. With 13 new sequences, the number of complete mt genomes within the parvorder Platyrrhini was widely extended, resulting in a largely resolved branching pattern among New World monkey families. We added 10 new Strepsirrhini mt genomes to the 15 previously available ones, thus almost doubling the number of mt genomes within this clade. Our data allow precise date estimates of all nodes and offer new insights into primate evolution. One major result is a relatively young date for the most recent common ancestor of all living primates which was estimated to 66-69 million years ago, suggesting that the divergence of extant primates started close to the K/T-boundary. Although some relationships remain unclear, the large number of mt genomes used allowed us to reconstruct a robust primate phylogeny which is largely in agreement with previous publications. Finally, we show that mt genomes are a useful tool for resolving primate phylogenetic relationships on various taxonomic levels. PMID:23874967
HIV populations are large and accumulate high genetic diversity in a nonlinear fashion.
Maldarelli, Frank; Kearney, Mary; Palmer, Sarah; Stephens, Robert; Mican, JoAnn; Polis, Michael A; Davey, Richard T; Kovacs, Joseph; Shao, Wei; Rock-Kress, Diane; Metcalf, Julia A; Rehm, Catherine; Greer, Sarah E; Lucey, Daniel L; Danley, Kristen; Alter, Harvey; Mellors, John W; Coffin, John M
2013-09-01
HIV infection is characterized by rapid and error-prone viral replication resulting in genetically diverse virus populations. The rate of accumulation of diversity and the mechanisms involved are under intense study to provide useful information to understand immune evasion and the development of drug resistance. To characterize the development of viral diversity after infection, we carried out an in-depth analysis of single genome sequences of HIV pro-pol to assess diversity and divergence and to estimate replicating population sizes in a group of treatment-naive HIV-infected individuals sampled at single (n = 22) or multiple, longitudinal (n = 11) time points. Analysis of single genome sequences revealed nonlinear accumulation of sequence diversity during the course of infection. Diversity accumulated in recently infected individuals at rates 30-fold higher than in patients with chronic infection. Accumulation of synonymous changes accounted for most of the diversity during chronic infection. Accumulation of diversity resulted in population shifts, but the rates of change were low relative to estimated replication cycle times, consistent with relatively large population sizes. Analysis of changes in allele frequencies revealed effective population sizes that are substantially higher than previous estimates of approximately 1,000 infectious particles/infected individual. Taken together, these observations indicate that HIV populations are large, diverse, and slow to change in chronic infection and that the emergence of new mutations, including drug resistance mutations, is governed by both selection forces and drift.
Early Evolution of Conserved Regulatory Sequences Associated with Development in Vertebrates
McEwen, Gayle K.; Goode, Debbie K.; Parker, Hugo J.; Woolfe, Adam; Callaway, Heather; Elgar, Greg
2009-01-01
Comparisons between diverse vertebrate genomes have uncovered thousands of highly conserved non-coding sequences, an increasing number of which have been shown to function as enhancers during early development. Despite their extreme conservation over 500 million years from humans to cartilaginous fish, these elements appear to be largely absent in invertebrates, and, to date, there has been little understanding of their mode of action or the evolutionary processes that have modelled them. We have now exploited emerging genomic sequence data for the sea lamprey, Petromyzon marinus, to explore the depth of conservation of this type of element in the earliest diverging extant vertebrate lineage, the jawless fish (agnathans). We searched for conserved non-coding elements (CNEs) at 13 human gene loci and identified lamprey elements associated with all but two of these gene regions. Although markedly shorter and less well conserved than within jawed vertebrates, identified lamprey CNEs are able to drive specific patterns of expression in zebrafish embryos, which are almost identical to those driven by the equivalent human elements. These CNEs are therefore a unique and defining characteristic of all vertebrates. Furthermore, alignment of lamprey and other vertebrate CNEs should permit the identification of persistent sequence signatures that are responsible for common patterns of expression and contribute to the elucidation of the regulatory language in CNEs. Identifying the core regulatory code for development, common to all vertebrates, provides a foundation upon which regulatory networks can be constructed and might also illuminate how large conserved regulatory sequence blocks evolve and become fixed in genomic DNA. PMID:20011110
Transcriptome characterisation of Pinus tabuliformis and evolution of genes in the Pinus phylogeny
2013-01-01
Background The Chinese pine (Pinus tabuliformis) is an indigenous conifer species in northern China but is relatively underdeveloped as a genomic resource; thus, limiting gene discovery and breeding. Large-scale transcriptome data were obtained using a next-generation sequencing platform to compensate for the lack of P. tabuliformis genomic information. Results The increasing amount of transcriptome data on Pinus provides an excellent resource for multi-gene phylogenetic analysis and studies on how conserved genes and functions are maintained in the face of species divergence. The first P. tabuliformis transcriptome from a normalised cDNA library of multiple tissues and individuals was sequenced in a full 454 GS-FLX run, producing 911,302 sequencing reads. The high quality overlapping expressed sequence tags (ESTs) were assembled into 46,584 putative transcripts, and more than 700 SSRs and 92,000 SNPs/InDels were characterised. Comparative analysis of the transcriptome of six conifer species yielded 191 orthologues, from which we inferred a phylogenetic tree, evolutionary patterns and calculated rates of gene diversion. We also identified 938 fast evolving sequences that may be useful for identifying genes that perhaps evolved in response to positive selection and might be responsible for speciation in the Pinus lineage. Conclusions A large collection of high-quality ESTs was obtained, de novo assembled and characterised, which represents a dramatic expansion of the current transcript catalogues of P. tabuliformis and which will gradually be applied in breeding programs of P. tabuliformis. Furthermore, these data will facilitate future studies of the comparative genomics of P. tabuliformis and other related species. PMID:23597112
Phylogeny and divergence of the pinnipeds (Carnivora: Mammalia) assessed using a multigene dataset
Higdon, Jeff W; Bininda-Emonds, Olaf RP; Beck, Robin MD; Ferguson, Steven H
2007-01-01
Background Phylogenetic comparative methods are often improved by complete phylogenies with meaningful branch lengths (e.g., divergence dates). This study presents a dated molecular supertree for all 34 world pinniped species derived from a weighted matrix representation with parsimony (MRP) supertree analysis of 50 gene trees, each determined under a maximum likelihood (ML) framework. Divergence times were determined by mapping the same sequence data (plus two additional genes) on to the supertree topology and calibrating the ML branch lengths against a range of fossil calibrations. We assessed the sensitivity of our supertree topology in two ways: 1) a second supertree with all mtDNA genes combined into a single source tree, and 2) likelihood-based supermatrix analyses. Divergence dates were also calculated using a Bayesian relaxed molecular clock with rate autocorrelation to test the sensitivity of our supertree results further. Results The resulting phylogenies all agreed broadly with recent molecular studies, in particular supporting the monophyly of Phocidae, Otariidae, and the two phocid subfamilies, as well as an Odobenidae + Otariidae sister relationship; areas of disagreement were limited to four more poorly supported regions. Neither the supertree nor supermatrix analyses supported the monophyly of the two traditional otariid subfamilies, supporting suggestions for the need for taxonomic revision in this group. Phocid relationships were similar to other recent studies and deeper branches were generally well-resolved. Halichoerus grypus was nested within a paraphyletic Pusa, although relationships within Phocina tend to be poorly supported. Divergence date estimates for the supertree were in good agreement with other studies and the available fossil record; however, the Bayesian relaxed molecular clock divergence date estimates were significantly older. Conclusion Our results join other recent studies and highlight the need for a re-evaluation of pinniped taxonomy, especially as regards the subfamilial classification of otariids and the generic nomenclature of Phocina. Even with the recent publication of new sequence data, the available genetic sequence information for several species, particularly those in Arctocephalus, remains very limited, especially for nuclear markers. However, resolution of parts of the tree will probably remain difficult, even with additional data, due to apparent rapid radiations. Our study addresses the lack of a recent pinniped phylogeny that includes all species and robust divergence dates for all nodes, and will therefore prove indispensable to comparative and macroevolutionary studies of this group of carnivores. PMID:17996107
Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing
Balmaseda, Angel; Harris, Eva; DeRisi, Joseph L.
2012-01-01
Dengue virus is an emerging infectious agent that infects an estimated 50–100 million people annually worldwide, yet current diagnostic practices cannot detect an etiologic pathogen in ∼40% of dengue-like illnesses. Metagenomic approaches to pathogen detection, such as viral microarrays and deep sequencing, are promising tools to address emerging and non-diagnosable disease challenges. In this study, we used the Virochip microarray and deep sequencing to characterize the spectrum of viruses present in human sera from 123 Nicaraguan patients presenting with dengue-like symptoms but testing negative for dengue virus. We utilized a barcoding strategy to simultaneously deep sequence multiple serum specimens, generating on average over 1 million reads per sample. We then implemented a stepwise bioinformatic filtering pipeline to remove the majority of human and low-quality sequences to improve the speed and accuracy of subsequent unbiased database searches. By deep sequencing, we were able to detect virus sequence in 37% (45/123) of previously negative cases. These included 13 cases with Human Herpesvirus 6 sequences. Other samples contained sequences with similarity to sequences from viruses in the Herpesviridae, Flaviviridae, Circoviridae, Anelloviridae, Asfarviridae, and Parvoviridae families. In some cases, the putative viral sequences were virtually identical to known viruses, and in others they diverged, suggesting that they may derive from novel viruses. These results demonstrate the utility of unbiased metagenomic approaches in the detection of known and divergent viruses in the study of tropical febrile illness. PMID:22347512
Coulthart, Michael B; Posada, David; Crandall, Keith A; Dekaban, Gregory A
2006-03-01
Recently, the putative finding of ancient human T cell leukemia virus type 1 (HTLV-1) long terminal repeat (LTR) DNA sequences in association with a 1500-year-old Chilean mummy has stirred vigorous debate. The debate is based partly on the inherent uncertainties associated with phylogenetic reconstruction when only short sequences of closely related genotypes are available. However, a full analysis of what phylogenetic information is present in the mummy data has not previously been published, leaving open the question of what precisely is the range of admissible interpretation. To fulfill this need, we re-analyzed the mummy data in a new way. We first performed phylogenetic analysis of 188 published LTR DNA sequences from extant strains belonging to the HTLV-1 Cosmopolitan clade, using the method of statistical parsimony which is designed both to optimize phylogenetic resolution among sequences with little evolutionary divergence, and to permit precise mapping of individual sequence mutations onto branches of a divergence network. We then deduced possible phylogenetic positions for the two main categories of published Chilean mummy sequences, based on their published 157-nucleotide LTR sequences. The possible phylogenetic placements for one of the mummy sequence categories are consistent with a modern origin. However, one of these placements for the other mummy sequence category falls very close to the root of the Cosmopolitan clade, consistent with an ancient origin for both this mummy sequence and the Cosmopolitan clade.
Navigating the tip of the genomic iceberg: Next-generation sequencing for plant systematics.
Straub, Shannon C K; Parks, Matthew; Weitemier, Kevin; Fishbein, Mark; Cronn, Richard C; Liston, Aaron
2012-02-01
Just as Sanger sequencing did more than 20 years ago, next-generation sequencing (NGS) is poised to revolutionize plant systematics. By combining multiplexing approaches with NGS throughput, systematists may no longer need to choose between more taxa or more characters. Here we describe a genome skimming (shallow sequencing) approach for plant systematics. Through simulations, we evaluated optimal sequencing depth and performance of single-end and paired-end short read sequences for assembly of nuclear ribosomal DNA (rDNA) and plastomes and addressed the effect of divergence on reference-guided plastome assembly. We also used simulations to identify potential phylogenetic markers from low-copy nuclear loci at different sequencing depths. We demonstrated the utility of genome skimming through phylogenetic analysis of the Sonoran Desert clade (SDC) of Asclepias (Apocynaceae). Paired-end reads performed better than single-end reads. Minimum sequencing depths for high quality rDNA and plastome assemblies were 40× and 30×, respectively. Divergence from the reference significantly affected plastome assembly, but relatively similar references are available for most seed plants. Deeper rDNA sequencing is necessary to characterize intragenomic polymorphism. The low-copy fraction of the nuclear genome was readily surveyed, even at low sequencing depths. Nearly 160000 bp of sequence from three organelles provided evidence of phylogenetic incongruence in the SDC. Adoption of NGS will facilitate progress in plant systematics, as whole plastome and rDNA cistrons, partial mitochondrial genomes, and low-copy nuclear markers can now be efficiently obtained for molecular phylogenetics studies.
The Evolution of Ribosomal DNA: Divergent Paralogues and Phylogenetic Implications
Buckler-IV, E. S.; Ippolito, A.; Holtsford, T. P.
1997-01-01
Although nuclear ribosomal DNA (rDNA) repeats evolve together through concerted evolution, some genomes contain a considerable diversity of paralogous rDNA. This diversity includes not only multiple functional loci but also putative pseudogenes and recombinants. We examined the occurrence of divergent paralogues and recombinants in Gossypium, Nicotiana, Tripsacum, Winteraceae, and Zea ribosomal internal transcribed spacer (ITS) sequences. Some of the divergent paralogues are probably rDNA pseudogenes, since they have low predicted secondary structure stability, high substitution rates, and many deamination-driven substitutions at methylation sites. Under standard PCR conditions, the low stability paralogues amplified well, while many high-stability paralogues amplified poorly. Under highly denaturing PCR conditions (i.e., with dimethylsulfoxide), both low- and high-stability paralogues amplified well. We also found recombination between divergent paralogues. For phylogenetics, divergent ribosomal paralogues can aid in reconstructing ancestral states and thus serve as good outgroups. Divergent paralogues can also provide companion rDNA phylogenies. However, phylogeneticists must discriminate among families of divergent paralogues and recombinants or suffer from muddled and inaccurate organismal phylogenies. PMID:9055091
Auguste, Albert J.; Liria, Jonathan; Forrester, Naomi L.; Giambalvo, Dileyvic; Moncada, Maria; Long, Kanya C.; Morón, Dulce; de Manzione, Nuris; Tesh, Robert B.; Halsey, Eric S.; Kochel, Tadeusz J.; Hernandez, Rosa; Navarro, Juan-Carlos
2015-01-01
In 2010, an outbreak of febrile illness with arthralgic manifestations was detected at La Estación village, Portuguesa State, Venezuela. The etiologic agent was determined to be Mayaro virus (MAYV), a reemerging South American alphavirus. A total of 77 cases was reported and 19 were confirmed as seropositive. MAYV was isolated from acute-phase serum samples from 6 symptomatic patients. We sequenced 27 complete genomes representing the full spectrum of MAYV genetic diversity, which facilitated detection of a new genotype, designated N. Phylogenetic analysis of genomic sequences indicated that etiologic strains from Venezuela belong to genotype D. Results indicate that MAYV is highly conserved genetically, showing ≈17% nucleotide divergence across all 3 genotypes and 4% among genotype D strains in the most variable genes. Coalescent analyses suggested genotypes D and L diverged ≈150 years ago and genotype diverged N ≈250 years ago. This virus commonly infects persons residing near enzootic transmission foci because of anthropogenic incursions. PMID:26401714
Auguste, Albert J; Liria, Jonathan; Forrester, Naomi L; Giambalvo, Dileyvic; Moncada, Maria; Long, Kanya C; Morón, Dulce; de Manzione, Nuris; Tesh, Robert B; Halsey, Eric S; Kochel, Tadeusz J; Hernandez, Rosa; Navarro, Juan-Carlos; Weaver, Scott C
2015-10-01
In 2010, an outbreak of febrile illness with arthralgic manifestations was detected at La Estación village, Portuguesa State, Venezuela. The etiologic agent was determined to be Mayaro virus (MAYV), a reemerging South American alphavirus. A total of 77 cases was reported and 19 were confirmed as seropositive. MAYV was isolated from acute-phase serum samples from 6 symptomatic patients. We sequenced 27 complete genomes representing the full spectrum of MAYV genetic diversity, which facilitated detection of a new genotype, designated N. Phylogenetic analysis of genomic sequences indicated that etiologic strains from Venezuela belong to genotype D. Results indicate that MAYV is highly conserved genetically, showing ≈17% nucleotide divergence across all 3 genotypes and 4% among genotype D strains in the most variable genes. Coalescent analyses suggested genotypes D and L diverged ≈150 years ago and genotype diverged N ≈250 years ago. This virus commonly infects persons residing near enzootic transmission foci because of anthropogenic incursions.
Park, D-S; Suh, S-J; Hebert, P D N; Oh, H-W; Hong, K-J
2011-08-01
Although DNA barcode coverage has grown rapidly for many insect orders, there are some groups, such as scale insects, where sequence recovery has been difficult. However, using a recently developed primer set, we recovered barcode records from 373 specimens, providing coverage for 75 species from 31 genera in two families. Overall success was >90% for mealybugs and >80% for armored scale species. The G·C content was very low in most species, averaging just 16.3%. Sequence divergences (K2P) between congeneric species averaged 10.7%, while intra-specific divergences averaged 0.97%. However, the latter value was inflated by high intra-specific divergence in nine taxa, cases that may indicate species overlooked by current taxonomic treatments. Our study establishes the feasibility of developing a comprehensive barcode library for scale insects and indicates that its construction will both create an effective system for identifying scale insects and reveal taxonomic situations worthy of deeper analysis.
Nishimura, Nicole; Heins, David C.; Andersen, Ryan O.; Barber, Iain; Cresko, William A.
2011-01-01
Parasitic interactions are often part of complex networks of interspecific relationships that have evolved in biological communities. Despite many years of work on the evolution of parasitism, the likelihood that sister taxa of parasites can co-evolve with their hosts to specifically infect two related lineages, even when those hosts occur sympatrically, is still unclear. Furthermore, when these specific interactions occur, the molecular and physiological basis of this specificity is still largely unknown. The presence of these specific parasitic relationships can now be tested using molecular markers such as DNA sequence variation. Here we test for specific parasitic relationships in an emerging host-parasite model, the stickleback-Schistocephalus system. Threespine and ninespine stickleback fish are intermediate hosts for Schistocephalus cestode parasites that are phenotypically very similar and have nearly identical life cycles through plankton, stickleback, and avian hosts. We analyzed over 2000 base pairs of COX1 and NADH1 mitochondrial DNA sequences in 48 Schistocephalus individuals collected from threespine and ninespine stickleback hosts from disparate geographic regions distributed across the Northern Hemisphere. Our data strongly support the presence of two distinct clades of Schistocephalus, each of which exclusively infects either threespine or ninespine stickleback. These clades most likely represent different species that diverged soon after the speciation of their stickleback hosts. In addition, genetic structuring exists among Schistocephalus taken from threespine stickleback hosts from Alaska, Oregon and Wales, although it is much less than the divergence between hosts. Our findings emphasize that biological communities may be even more complex than they first appear, and beg the question of what are the ecological, physiological, and genetic factors that maintain the specificity of the Schistocephalus parasites and their stickleback hosts. PMID:21811623
Glinsky, Gennadi V.
2016-01-01
Abstract Thousands of candidate human-specific regulatory sequences (HSRS) have been identified, supporting the hypothesis that unique to human phenotypes result from human-specific alterations of genomic regulatory networks. Collectively, a compendium of multiple diverse families of HSRS that are functionally and structurally divergent from Great Apes could be defined as the backbone of human-specific genomic regulatory networks. Here, the conservation patterns analysis of 18,364 candidate HSRS was carried out requiring that 100% of bases must remap during the alignments of human, chimpanzee, and bonobo sequences. A total of 5,535 candidate HSRS were identified that are: (i) highly conserved in Great Apes; (ii) evolved by the exaptation of highly conserved ancestral DNA; (iii) defined by either the acceleration of mutation rates on the human lineage or the functional divergence from non-human primates. The exaptation of highly conserved ancestral DNA pathway seems mechanistically distinct from the evolution of regulatory DNA segments driven by the species-specific expansion of transposable elements. Genome-wide proximity placement analysis of HSRS revealed that a small fraction of topologically associating domains (TADs) contain more than half of HSRS from four distinct families. TADs that are enriched for HSRS and termed rapidly evolving in humans TADs (revTADs) comprise 0.8–10.3% of 3,127 TADs in the hESC genome. RevTADs manifest distinct correlation patterns between placements of human accelerated regions, human-specific transcription factor-binding sites, and recombination rates. There is a significant enrichment within revTAD boundaries of hESC-enhancers, primate-specific CTCF-binding sites, human-specific RNAPII-binding sites, hCONDELs, and H3K4me3 peaks with human-specific enrichment at TSS in prefrontal cortex neurons (P < 0.0001 in all instances). Present analysis supports the idea that phenotypic divergence of Homo sapiens is driven by the evolution of human-specific genomic regulatory networks via at least two mechanistically distinct pathways of creation of divergent sequences of regulatory DNA: (i) recombination-associated exaptation of the highly conserved ancestral regulatory DNA segments; (ii) human-specific insertions of transposable elements. PMID:27503290
Roessler, Christian G.; Hall, Branwen M.; Anderson, William J.; Ingram, Wendy M.; Roberts, Sue A.; Montfort, William R.; Cordes, Matthew H. J.
2008-01-01
Proteins that share common ancestry may differ in structure and function because of divergent evolution of their amino acid sequences. For a typical diverse protein superfamily, the properties of a few scattered members are known from experiment. A satisfying picture of functional and structural evolution in relation to sequence changes, however, may require characterization of a larger, well chosen subset. Here, we employ a “stepping-stone” method, based on transitive homology, to target sequences intermediate between two related proteins with known divergent properties. We apply the approach to the question of how new protein folds can evolve from preexisting folds and, in particular, to an evolutionary change in secondary structure and oligomeric state in the Cro family of bacteriophage transcription factors, initially identified by sequence-structure comparison of distant homologs from phages P22 and λ. We report crystal structures of two Cro proteins, Xfaso 1 and Pfl 6, with sequences intermediate between those of P22 and λ. The domains show 40% sequence identity but differ by switching of α-helix to β-sheet in a C-terminal region spanning ≈25 residues. Sedimentation analysis also suggests a correlation between helix-to-sheet conversion and strengthened dimerization. PMID:18227506
Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen).
Rambaut, Andrew; Lam, Tommy T; Max Carvalho, Luiz; Pybus, Oliver G
2016-01-01
Gene sequences sampled at different points in time can be used to infer molecular phylogenies on a natural timescale of months or years, provided that the sequences in question undergo measurable amounts of evolutionary change between sampling times. Data sets with this property are termed heterochronous and have become increasingly common in several fields of biology, most notably the molecular epidemiology of rapidly evolving viruses. Here we introduce the cross-platform software tool, TempEst (formerly known as Path-O-Gen), for the visualization and analysis of temporally sampled sequence data. Given a molecular phylogeny and the dates of sampling for each sequence, TempEst uses an interactive regression approach to explore the association between genetic divergence through time and sampling dates. TempEst can be used to (1) assess whether there is sufficient temporal signal in the data to proceed with phylogenetic molecular clock analysis, and (2) identify sequences whose genetic divergence and sampling date are incongruent. Examination of the latter can help identify data quality problems, including errors in data annotation, sample contamination, sequence recombination, or alignment error. We recommend that all users of the molecular clock models implemented in BEAST first check their data using TempEst prior to analysis.
Identification of three duplicated Spin genes in medaka (Oryzias latipes).
Wang, Xiao-Lei; Mei, Jie; Sun, Min; Hong, Yun-Han; Gui, Jian-Fang
2005-05-09
Gene and genomic duplications are very important and frequent events in fish evolution, and the divergence of duplicated genes in sequences and functions is a focus of research on gene evolution. Here, we report the identification and characterization of three duplicated Spindlin (Spin) genes from medaka (Oryzias latipes): OlSpinA, OlSpinB, and OlSpinC. Molecular cloning, genomic DNA Blast analysis and phylogenetic relationship analysis demonstrated that the three duplicated OlSpin genes should belong to gene duplication. Furthermore, Western blot analysis revealed significant expression differences of the three OlSpins among different tissues and during embryogenesis in medaka, and suggested that sequence and functional divergence might have occurred in evolution among them.